项目作者: MrYawe

项目描述 :
PageRank computation of Wikipedia's articles using Hadoop.
高级语言: Java
项目地址: git://github.com/MrYawe/wiki-pagerank-hadoop.git
创建时间: 2017-05-12T22:24:57Z
项目社区:https://github.com/MrYawe/wiki-pagerank-hadoop

开源协议:

下载


wiki-pagerank-hadoop

Starting

1) Download these files:

  • page.sql.gz
  • pagelinks.sql.gz
    2) Create the input_pages and input_links folders at the root of the project.
    3) Put frwiki-latest-page.sql.gz in input_pages and frwiki-latest-pagelinks.sql.gz in input_pagelinks.
    4) Download dependencies with mvn install
    5) You can run the jar in the target folder with 3 args: “input_pagelinks input_pages final_result”. The final_result folder will be created automatically and musn’t exist at start.