top of page

MACHINE LEARNING

PAGERANK

The PageRank algorithm is an algorithm used by Google. It is named after one of its inventors Larry Page, who also cofounded Google. In the theory the PageRank algorithm calculates the probability that a random surfer (someone randomly clicking links) will get to a certain page. The more incoming links the page has from other popular pages the bigger the probability is that someone will end up on that page by chance.

 

In this analysis the webpages and links have been replaced with cities and rail connections in Europe. Using the algorithm it has been possible to calculate which cities were most likely to be visited on Interrail in 2012.

 

The higher the PageRank of the city is the greater the chances are for people to visit the city. Below there is a chart showing the Top 20 city of highest PageRank compared to how many Interrail travelers actually visited that city in 2012. 

Top 10 - PageRank 2012

Top 10 - Most visited cities in 2012

1. Paris - 35.39

2. Munich - 29.57

3. Zurich - 28.07

4. Berlin - 25.42

5. London - 24.18

6. Budapest - 22.34

7. Copenhagen - 22.18

8. Hamburg - 21.00

9. Wien - 21.00

10. Prague - 19.42

1. Wien - 13245

2. Paris - 12802

3. Munich - 12619

4. Berlin - 12291

5. Copenhagen - 11512

6. Hamburg - 11071

7. Budapest - 10826

8. Prague - 10754

9. Amsterdam - 9924

10. Milan - 9836

COMPARISON OF PAGERANK 2012 VS. NUMBER OF VISITORS 2012

Below is the correlation between all the cities’ pagerank and the number of Interrail visitors the city had in 2012. 

The data shows that while Paris has the highest PageRank Vienna is the most visited city in 2012. Vienna only has the 9th highest PageRank. 

Zurich has the 3rd highest PageRank but is not in Top 10 for most visited cities. The same goes for London which is number 5 when it comes to PageRank but is not in Top 10 for most visited cities. 

 

The reason that there is a difference between the PageRank Top 10 and the number of visitors Top 10 is that PageRank calculations are based on how many other cities have a rail connection to that city and how popular these other cities are. The number of visitors is based on where people selectively want to go. 

 

The Interrail travelers might be heading for some end station which can influence the way they travel and which rail connections they use. When Interrail travelers are planning to go to a new place they evaluate where they want to go and what they want to see. They might also evaluate on which kind of people they want to meet, what kind of food they want to taste or what culture they want to experience. The factor of how many possible rail connections from other cities you can take to go somewhere is tiny if not insignificant since the rail network is so expanded over Europe that it is easy to go anywhere. 

 

The principle of the random traveler that the PageRank algorithm uses does not evaluate the cities before it visits them. It travels the rail network randomly. 

 

 

 

 

The correlation has a tendency to be linear which is not very surprising. If a city has a high PageRank there is a great probability that the city has been visited many times as well. However some cities are far from the line since they have high PageRank but has not been visited as much as other cities with similar PageRanks or the other way around that they have had many visitors but doesn't have a PageRank as high as the cities with the same number of visitors. The city with the highest number of visitors (which we learnt above is Vienna) has a PageRank that is lower than 7 of the other cities' PageRank. This tells something about that the city has been popular in 2012, but not as many other popular cities has rail connections to Vienna as they have to some of the other cities. It confirms the above that people choose their routes out from other factors than which city has most trains from other popular cities coming into it. 

© 2014 by EurailDTU.

bottom of page