Implement the Pagerank Algorithm in Hadoop to retrieve top-100 pages
The Pre-processing job includes a Map-Reduce (to get all pages including dangling nodes and the adjacency lists) and Map job (initialize all pages with rank as 1/numberOfPages)
The Parser.java file is a standalone program to parse input files and print in human-readable form and create a graph from the wiki dump.
Issues:
The pagerank operation consists of 10 iterations of Map – Reduce and a final Map job to distribute delta values across all pageranks
Each Mapper sends the local top 100 pages with high pagerank values. The number of reducers is set to 1 to compute the global top 100 pages.