[ARCHIVED] Python script that analyzes and rates Wikipedia pages using PageRank algorithm
Python script that ranks Wikipedia pages using PageRank algorithm
This program analyzes all pages on Wikipedia of a certain language, scraping the
urls from Special:AllPages. It then analyzes all the crosslinks between pages
and calculates the rank of every page using the PageRank algorithm (20
iterations).
The speed of the algorithm depends on data size and your internet connection.
On my 60 Mbit/s network fetching and analyzing takes around 0.65 seconds per
page. Calculating rank takes about 1 millisecond per page on a MacBook Pro 15
2016.
python3 -m pip install -r requirements.txt
to install dependenciesconfig.yml
wiki_analysis.py LANG
where LANG
is the language prefix of Wikipedia