Pages

Search This Blog

Monday 17 November 2014

Implementation Of Page Rank in PHP

<?php
/*
 *
 * @Author: Zuhair Mirza
 * @Implementation Of Page Rank in PHP
 * @Date : 18-November-2014
 *
<!--
 PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google:PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website   is.The underlying assumption is that more important websites are  likely to receive more links from other websites.
-->

$links = array(
    1 => array(5),
    2 => array(4, 7, 8, 9),
    3 => array(1, 3, 7),
    4 => array(1, 2, 4, 8),
    5 => array(1, 6, 7, 9),
    6 => array(1, 5),
    8 => array(3, 4, 6, 8),
    9 => array(1, 8)
);

$pageRank = calculatePageRank($links);

var_dump($pageRank);
?>

<!--If we look at this link graph, a few things stand out. Page 1 is pointed to by almost everything, so is presumably very good. Page 4 is also very popular, so should be expected to rank well. We can then calculate the PageRank by running our iterative process.

We've got a hard limit of 100 iterations, but we also keep track of the amount of change between one set of PageRanks and the next.

If that change drops below a set level (here 0.00005) the process decides it's stable enough and stops.-->

<?php

function calculatePageRank($linkGraph, $dampingFactor = 0.15) {

 
    $pageRank = array();
    $tempRank = array();
    $nodeCount = count($linkGraph);

 
    // initialise the PR as 1/n
    foreach ($linkGraph as $node => $outbound) {
        $pageRank[$node] = 1 / $nodeCount;
        $tempRank[$node] = 0;
    }


    $change = 1;
    $i = 0;
    while ($change > 0.00005 && $i < 100) {
        $change = 0;
        $i++;

        // distribute the PR of each page
        foreach ($linkGraph as $node => $outbound) {
            $outboundCount = count($outbound);
            foreach ($outbound as $link) {
                $tempRank[$link] += $pageRank[$node] / $outboundCount;
            }
        }

        $total = 0;
        // calculate the new PR using the damping factor
        foreach ($linkGraph as $node => $outbound) {
            $tempRank[$node] = ($dampingFactor / $nodeCount) + (1 - $dampingFactor) * $tempRank[$node];
            $change += abs($pageRank[$node] - $tempRank[$node]);
            $pageRank[$node] = $tempRank[$node];
            $tempRank[$node] = 0;
            $total += $pageRank[$node];
        }

        // Normalise the page ranks so it's all a proportion 0-1
        foreach ($pageRank as $node => $score) {
            $pageRank[$node] /= $total;
        }
    }

    return $pageRank;
}
?>

2 comments:

  1. Thank you again for all the knowledge you distribute,Good post. I was very interested in the article, it's quite inspiring I should admit. I like visiting you site since I always come across interesting articles like this one.Great Job, I greatly appreciate that.Do Keep sharing! Regards,
    check pagerank

    ReplyDelete
  2. You’ve got some interesting points in this article. I would have never considered any of these if I didn’t come across this. Thanks!.
    check pr

    ReplyDelete