The Effect of Outbound Links |
|
Since PageRank is based on the linking
structure of the whole web, it is inescapable that if the
inbound links of a page influence its PageRank, its outbound
links do also have some impact. To illustrate the effects of
outbound links, we take a look at a simple example.
We regard a web consisting of
to websites, each having two web pages. One site consists of
pages A and B, the other constists of pages C and D.
Initially, both pages of each site solely link to each other.
It is obvious that each page then has a PageRank of one. Now
we add a link which points from page A to page C. At a damping
factor of 0.75, we therefore get the following equations for
the single pages' PageRank values: PR(A) = 0.25 + 0.75 PR(B) PR(B) = 0.25 + 0.375 PR(A) PR(C) = 0.25 + 0.75 PR(D) + 0.375 PR(A) PR(D) = 0.25 + 0.75 PR(C) Solving the equations gives us the following PageRank values for the first site: PR(A) = 14/23 PR(B) = 11/23 We therefore get an accumulated PageRank of 25/23 for the first site. The PageRank values of the second site are given by PR(C) = 35/23 PR(D) = 32/23 So, the accumulated PageRank of the second site is 67/23. The total PageRank for both sites is 92/23 = 4. Hence, adding a link has no effect on the total PageRank of the web. Additionally, the PageRank benefit for one site equals the PageRank loss of the other. |
|
The Actual Effect of Outbound Links |
|
| As it has already been shown, the PageRank benefit for
a closed system of web pages by an additional inbound link is
given by (d / (1-d)) ? (PR(X) / C(X)), where X is the linking page, PR(X) is its PageRank and C(X) is the number of its outbound links. Hence, this value also represents the PageRank loss of a formerly closed system of web pages, when a page X within this system of pages now points by a link to an external page. The validity of the above formula requires that the page which receives the link from the formerly closed system of pages does not link back to that system, since it otherwise gains back some of the lost PageRank. Of course, this effect may also occur when not the page that receives the link from the formerly closed system of pages links back directly, but another page which has an inbound link from that page. Indeed, this effect may be disregarded because of the damping factor, if there are enough other web pages in-between the link-recursion. The validity of the formula also requires that the linking site has no other external outbound links. If it has other external outbound links, the loss of PageRank of the regarded site diminishes and the pages already receiving a link from that page lose PageRank accordingly. Even if the actual PageRank values for the pages of an existing web site were known, it would not be possible to calculate to which extend an added outbound link diminishes the PageRank loss of the site, since the above presented formula regards the status after adding the link. |
|
Intuitive Justification of the Effect of Outbound Links |
|
| The intuitive justification for the loss of PageRank by
an additional external outbound link according to the Random
Surfer Modell is that by adding an external outbound link to
one page the surfer will less likely follow an internal link
on that page. So, the probability for the surfer reaching
other pages within a site diminishes. If those other pages of
the site have links back to the page to which the external
outbound link has been added, also this page's PageRank will
deplete. We can conclude that external outbound links diminish the totalized PageRank of a site and probably also the PageRank of each single page of a site. But, since links between web sites are the fundament of PageRank and indespensable for its functioning, there is the possibility that outbound links have positive effects within other parts of Google's ranking criteria. Lastly, relevant outbound links do constitute the quality of a web page and a webmaster who points to other pages integrates their content in some way into his own site. |
|
Dangling Links |
|
An important aspect of outbound links is the lack of
them on web pages. When a web page has no outbound links, its
PageRank cannot be distributed to other pages. Lawrence Page
and Sergey Brin characterise links to those pages as dangling
links. The effect of dangling links
shall be illustrated by a small example website. We take a
look at a site consisting of three pages A, B and C. In our
example, the pages A and B link to each other. Additionally,
page A links to page C. Page C itself has no outbound links to
other pages. At a damping factor of 0.75, we get the following
equations for the single pages' PageRank values: PR(A) = 0.25 + 0.75 PR(B) PR(B) = 0.25 + 0.375 PR(A) PR(C) = 0.25 + 0.375 PR(A) Solving the equations gives us the following PageRank values: PR(A) = 14/23 PR(B) = 11/23 PR(C) = 11/23 So, the accumulated PageRank of all three pages is 36/23 which is just over half the value that we could have expected if page A had links to one of the other pages. According to Page and Brin, the number of dangling links in Google's index is fairly high. A reason therefore is that many linked pages are not indexed by Google, for example because indexing is disallowed by a robots.txt file. Additionally, Google meanwhile indexes several file types and not HTML only. PDF or Word files do not really have outbound links and, hence, dangling links could have major impacts on PageRank. In order to prevent PageRank
from the negative effects of dangling links, pages wihout
outbound links have to be removed from the database until the
PageRank values are computed. According to Page and Brin, the
number of outbound links on pages with dangling links is
thereby normalised. As shown in our illustration, removing one
page can cause new dangling links and, hence, removing pages
has to be an iterative process. After the PageRank calculation
is finished, PageRank can be assigned to the formerly removed
pages based on the PageRank algorithm. Therefore, as many
iterations are needed as for removing the pages. Regarding our
illustration, page C could be processed before page B. At that
point, page B has no PageRank yet and, so, page C will not
receive any either. Then, page B receives PageRank from page A
and during the second iteration, also page C gets its
PageRank. Regarding our example website for dangling links, removing page C from the database results in page A and B each having a PageRank of 1. After the calculations, page C is assigned a PageRank of 0.25 + 0.375 PR(A) = 0.625. So, the accumulated PageRank does not equal the number of pages, but at least all pages which have outbound links are not harmed from the danging links problem. By removing dangling links from the database, they do not have any negative effects on the PageRank of the rest of the web. Since PDF files are dangling links, links to PDF files do not diminish the PageRank of the linking page or site. So, PDF files can be a good means of search engine optimisation for Google. |
|
We regard a web consisting of
to websites, each having two web pages. One site consists of
pages A and B, the other constists of pages C and D.
Initially, both pages of each site solely link to each other.
It is obvious that each page then has a PageRank of one. Now
we add a link which points from page A to page C. At a damping
factor of 0.75, we therefore get the following equations for
the single pages' PageRank values:
The effect of dangling links
shall be illustrated by a small example website. We take a
look at a site consisting of three pages A, B and C. In our
example, the pages A and B link to each other. Additionally,
page A links to page C. Page C itself has no outbound links to
other pages. At a damping factor of 0.75, we get the following
equations for the single pages' PageRank values:
In order to prevent PageRank
from the negative effects of dangling links, pages wihout
outbound links have to be removed from the database until the
PageRank values are computed. According to Page and Brin, the
number of outbound links on pages with dangling links is
thereby normalised. As shown in our illustration, removing one
page can cause new dangling links and, hence, removing pages
has to be an iterative process. After the PageRank calculation
is finished, PageRank can be assigned to the formerly removed
pages based on the PageRank algorithm. Therefore, as many
iterations are needed as for removing the pages. Regarding our
illustration, page C could be processed before page B. At that
point, page B has no PageRank yet and, so, page C will not
receive any either. Then, page B receives PageRank from page A
and during the second iteration, also page C gets its
PageRank.