Math 445 lab 6: Google Page Rank

Problem 1: Hamster dynamics. Suppose you have 100 hamsters living in the following set of eight hamster houses connected by one-way tunnels.

Suppose that every minute at sound of a bell, each hamster runs down one of the tunnels in its present hamster house and appears in the house it connects to.

(a) Let $p_i^n$ represent the fraction of hamsters in the $i$ th house in the $n$ th minute. Write down a set of eight equations for $p^{n+1}_1$ through $p^{n+1}_8$ in terms of $p^{n}_1$ through $p^{n}_8$ .

(b) Let $p^n$ be the vector whose elements are $p^n_1, p^n_2, \ldots, p^n_8$ . Write the set of equations from (a) in terms of a matrix-vector multiplication, $p^{n+1} = A p^n$ .

(c) Write Matlab code to compute the steady-state distribution of hamsters by computing $p^n$ for some large value of $n$ , say $n=100$ . For $p^0$ , start with either the same fraction of hamsters in each house or a random set of fractions that totals to 1.

(d) Use Matlab's sort function to produce a list of the houses ranked in order of their steady-state populations.

Challenge: Revise your code show that uses Matlab's bar function to show a bar chart of the hamster distribution at each time step through the loop. Use Matlab's pause function to pause a short while on each step, so that you have time to see each bar chart.

Problem 2: Google Page Rank.

Compute the Google Page Rank for a network of 100 pages within www.unh.edu.

(a) Download the surfer.m Matlab script from Numerical Computing with Matlab by Clive Moler. Set M=100 and run the command [w,L] = surfer('http://www.unh.edu',M);. This might take a few minutes.

(b) After some time this will generate a vector $w$ of websites and a matrix $L$ of links between them. All the entries of $L$ should be either 0 or 1. But $L$ will have 100^2 == 10,000 entries, so you can't check this by eye. Write a short piece of Matlab code that double-checks that all entries of $L$ are either 0 or 1.

(c) Now from the matrix $L$ , generate a transition matrix $A$ that governs the dynamics of people clicking on random links, according to $p^{n+1} = A p^n$ . Similar to the hamster dynamics problem above, the $i$ th component of the vector $p^n$ is the fraction of websurfers visiting the $i$ th webpage at time $n$ . You will probably run into problem, namely a webpage might contain no links or only links outside the set of pages we are working with. In that case we will just assume that the person randomly selects a new website among all the $M$ pages in our set. Revise your calculation of $A$ so that if a column of $L$ sums to zero, then each of the entries in the same column of $A$ is $1/M$ .

(d) Suppose that instead of always clicking on a random link within a page, a web surfer sometimes (with probability $\alpha$ ) jumps to a random page chosen from all the pages in the network. We can model this behavior by using a modified transition matrix $B$ and dynamics $p^{n+1} = B p^n$ , as follows

$\begin{equation*} B = (1-\alpha) A + \alpha [1]/M \end{equation*}$

where $[1]$ is an $M \times M$ matrix of ones. Rumor has it that Google uses $\alpha = 0.15$ , so use that value and calculate $B$ .

(e) Double-check that the sum of each column of $B$ is 1. Again, be clever and get Matlab to do the work, rather than listing out the sums of all the columns and verifying manually that each of the 100 numbers is 1!

(f) Let’s assume that all websurfers start at the first webpage, p=zeros(M,1); p(1)=1;. If we applying B to p many times (say 40 times), the resulting vector will give the probability that we end up on a particular website and after a long session of random web surfing. We can do this with a for-loop

N=40;
for n=1:N;
  p=B*p;
end

After that is complete the elements of p give the probabilities that a random web surfer will end up at the web pages listed in w, and you can rank the pages in w according to their probabilities in p. Look up the sort command in the Help menu and find a way to print the list of websites in order of their rank. Turn in the list of the top 10, the fraction of the time a random web surfer would spend in each one of those 10, and your code to complete the Lab.

(g) If you are interested: Our calculation involved two free parameters: the probability $\alpha$ of jumping to an entirely random page in the network, and $N$ , the number that specifies the length of our long session of random web surfing. How robust is the page rank algorithm with respect to these two parameters? If you change $\alpha$ to 0.1 or 0.05, do the top 10 pages change? How about if you change M to 100 or 1,000?

Problem 3: Suppose that, instead of representing links between web pages, you have a directed graph that tells who likes whom among all students at UNH. You could then use the same algorithm to compute a numerical measure of each UNH student's likeability, and an overall ranking of the popularity of all UNH students.

What do you think of this proposal? Would the numerical values produced be meaningful measures of likeability or popularity? Why or why not? What does your answer here say about Google Page Rank as a method of ranking web pages?

Note: There is no single clear answer to this question. Just be thoughtful about it and express yourself clearly.

channelflow.org

User Tools

Site Tools

Math 445 lab 6: Google Page Rank

Page Tools