School project for Parallel algorithms. Text similarity. Written in Python with usage of MPI.
School project for subject Parallel algorithms. Text similarity. Written in Python with usage of MPI.
python mainsp.py filelist.txt
mpiexec -n 5 python mainmpi.py filelist.txt
Input file not specified trying with default:
Usage: file.py filelist.txt
Data load time --- 0.040 seconds ---
Loaded files:
File 1: lotr1.txt
File 2: lotr2.txt
File 3: twk.txt
File 4: Dune.txt
Text 1-1: 1.000
Text 1-2: 0.991
Text 1-3: 0.904
Text 1-4: 0.906
Text 2-1: 0.991
Text 2-2: 1.000
Text 2-3: 0.906
Text 2-4: 0.904
Text 3-1: 0.904
Text 3-2: 0.906
Text 3-3: 1.000
Text 3-4: 0.903
Text 4-1: 0.906
Text 4-2: 0.904
Text 4-3: 0.903
Text 4-4: 1.000
TF compute time --- 0.996 seconds ---
IDF compute time --- 0.066 seconds ---
TF*IDF compute time --- 0.029 seconds ---
Cos Sim compute time --- 0.171 seconds ---
Complete compute time --- 1.313 seconds ---
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
https://en.wikipedia.org/wiki/Cosine_similarity
https://janav.wordpress.com/2013/10/27/tf-idf-and-cosine-similarity/