Google PageRank: Introduction

Google PageRank is a link analysis algorithm used by Google search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents with the purpose of "measuring" its relative importance within the set. It was originally created by Larry Page and Sergey Brin at Stanford University in 1985 as part of a research project studying a new kind of search engine.

Google PageRank is based on the premise prevalents in the world of academia, that the importance of a research paper can be judged by the number of citations the paper has from other research papers. Larry Page and Sergey Brin transferred this premise to the World Wide Web, the importance of a web page can be judged by the number of hyperlinks pointing to it from other web pages.

Google PageRank is an algorithm patented by Google that measures a particular page's importance relative to other pages included in the search engine's index. It was invented in the late 1990s by Larry Page and Sergey Brin at Stanford University. Google PageRank implements the concept of link equity as a ranking factor, it assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web (W3C), with the purpose of "measuring" its relative importance within the set. The algorithm may be applied to any collection of entities with reciprocal quotations and references.

Google PageRank approximates the likelihood that a user, randomly clicking links throughout the Internet, will arrive at that particular page. A page that is arrived at more often is likely more important - and has a higher Google PageRank. Each page linking to another page increases the Google PageRank of that other page. Pages with higher Google PageRank typically increase the Google PageRank of the other page more on that basis.

To view a site's Google PageRank, install the Google Toolbar ( and enable the Google PageRank feature. One thing to note, however, is that the Google PageRank indicated by Google is a cached value, and is usually out of date.

Google PageRank is just one factor in the collective algorithm Google uses when building search enine results pages (SERPs). It is still possible that a page with a lower Google PageRank ranks above one with a higher Google PageRank for a particular query. Google PageRank is also relevance agnostic, in that it measures overall popularity using links, and not the subject shrouding them. Google currently also investigates the relevance of links when calculating search rankings, therefore Google PageRank should not be the sole focus of a search engine marketer. Building relevant links will naturally contribute to a higher Google PageRank. Furthermore, building too many irrelevant links solely for the purpose of increasing Google PageRank may actually hurt the ranking of a site, because Google attempts to detect and devalue irrelevant links that are presumably used to manipulate it.

Google PageRank is also widely regarded by users as a trust-building factor, because users will tend to perceive sites with a high value as more reputable or authoritative. Indeed, this is what Google PageRank is designed to indicate. This perception is encouraged by the fact that Google penalizes spam or irrelevant sites (or individual pages) by reducing or zeroing their Google PageRank.

The name "PageRank" is a trademark of Google, and the PageRank process has been patented (U.S. Patent 6,285,999). However, the patent is assigned to Stanford University and not to Google. Google has exclusive license rights on the patent from Stanford University. Stanford University received 1.8 million shares of Google in exchange for use of the patent; the shares were sold in 2005 for $336 million.

Google PageRank: History

Google PageRank was developed as part of a research project about a new kind of search engine by Larry Page (Lawrence Page) and Sergey Brin (Sergey Mikhaylovich Brin) at Stanford University in 1996. Sergey Brin had the idea that information on the web could be ordered in a hierarchy by "link popularity": a page is ranked higher as there are more links to it. It was co-authored by Rajeev Motwani and Terry Winograd. The first paper about the project, describing PageRank and the initial prototype of the Google search engine, was published in 1998: shortly after, Larry Page and Sergey Brin founded Google Inc., the company behind the Google search engine. While just one of many factors that determine the ranking of Google search results, Google PageRank continues to provide the basis for all of Google's web search tools.

Google PageRank has been influenced by citation analysis, early developed by Eugene Garfield in the 1950s at the University of Pennsylvania, and by Hyper Search, developed by Massimo Marchiori at the University of Padua. In the same year PageRank was introduced (1998), Jon Kleinberg published his important work on HITS. Google's founders cite Eugene Garfield, Massimo Marchiori, an d Jon Kleinberg in their original paper.

A small search engine called "RankDex" from IDD Information Services designed by Robin Li was, since 1996, already exploring a similar strategy for site-scoring and page ranking. The technology in RankDex would be patented by 1999 and used later when Robin Li founded Baidu in China. Robin Li's work would be referenced by some of Larry Page's U.S. patents for his Google search methods.

Google PageRank: Algorithm

Academic citation literature has been applied to the web, largely by counting citations or backlinks to a given page. This gives some approximation of a page's importance or quality. Google PageRank extends this idea by not counting links from all pages equally, and by normalizing by the number of links on a page. Google PageRank is defined as follows:

We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

Note that the Google PageRanks form a probability distribution over web pages, so the sum of all web pages' Google PageRanks will be one.

Google PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web.

For the detail information about the Algorithm of Google PageRank, we will discuss in the Search Engine Algorithm Tutorial and Course.

Google PageRank: Features

Actual Google PageRank vs. Google Toolbar PageRank

When people talk about Google PageRank, they're usually talking about its visible artifact: what's sometimes called "Google Toolbar PageRank". Actual Google PageRank is a score from 0 to 100, and changes from moment to moment as links are published and new pages are indexed and de-indexed, all of which happens behind the scenes.

The Google PageRank that most people are familiar with is a score of 0 to 10, which gets updated on irregular intervals ranging from a few weeks to a few months. One of the first utilities that allowed users to look at the PageRank of a page was the Google Toolbar, which is no longer being developed. But there are many browser extensions that can be used for measure Google PageRank.

If you install one of these extensions, the Google PageRank (PR) of the current page you're on will be displayed as a small green bar graph. As you move from one page to another, you'll see the Google PageRank change accordingly. This graphically illustrates a previously mentioned point that's worth reiterating. Google PageRank measures pages, not sites. When people refer to a so-called "Google PageRank 8 website", they're actually talking about the site's home page, which has received the majority of the site's links. You can test this yourself using the extensions: go to any site's home page, then go to some post in the site's archives, then go to the site's About or Contact page. The home page will almost always have the highest Google PageRank (PR).

This is one reason why getting links from high Google PageRank websites is somewhat overrated. A blog article that posts a link to your site from its Google PR 8 home page will soon fall into the archive and settle in as a Google PR 0, 1 or 2 page. Those are still good links, but they don't have the same "Link Juice" as links from Google PR 8 pages.

In addition to gaining some Google PageRank from backlinks, pages lose a little Google PageRank with each link they give to another website. This makes some webmasters unwilling to link out to other sites. However, sites that never link out to other websites look suspicious to Google, and the amount of Link Juice lost from an individual link is negligible. The problem comes when you have dozens of links on a page: long blogrolls, crowded headers and footers, social link buttons, and non-essential links. Some writers will add a link to Wikipedia every time they use an uncommon word. This is not only bad SEO, but it's bad user experience that disrupts the flow of text.

Google PageRank: Facts

Each year, Google changes its search engine algorithm up to 500 - 600 times. While most of these changes are minor, every few months Google rolls out a "major" algorithmic update that affect search results in significant ways.

For search engine marketers, knowing the dates of these Google updates can help explain changes in rankings and organic website traffic. Below, we've listed the major algorithmic changes that made the biggest impacts on search. Understanding these updates can help with search engine optimization (SEO).

