项目作者: mlampros

项目描述 :
fuzzy string matching in R
高级语言: R
项目地址: git://github.com/mlampros/fuzzywuzzyR.git
创建时间: 2017-04-13T16:33:31Z
项目社区:https://github.com/mlampros/fuzzywuzzyR

开源协议:

下载


tic
codecov.io
CRAN_Status_Badge
Downloads
Buy Me A Coffee
Dependencies

fuzzywuzzyR


The fuzzywuzzyR package is a fuzzy string matching implementation of the fuzzywuzzy python package. It uses the Levenshtein Distance to calculate the differences between sequences. More details on the functionality of fuzzywuzzyR can be found in the blog-post and in the package Vignette.


UPDATE 26-07-2018: A Singularity image file is available in case that someone intends to run fuzzywuzzyR on Ubuntu Linux (locally or in a cloud instance) with all package requirements pre-installed. This allows the user to utilize the fuzzywuzzyR package without having to spend time on the installation process.


System Requirements


  • Python (>= 2.4)

  • difflib

  • fuzzywuzzy ( >=0.15.0 )

  • python-Levenshtein ( >=0.12.0, optional, provides a 4-10x speedup in String Matching, though may result in differing results for certain cases)


Before the installation of any python modules one should check the python-configuration using :


  1. reticulate::py_config()


All modules should be installed in the default python configuration (the configuration that the R-session displays as default), otherwise errors will occur during package installation.


Debian/Ubuntu/Fedora


Python2

  1. sudo apt-get install python-pip
  2. sudo pip install --upgrade pip
  3. pip install fuzzywuzzy
  4. pip install python-Levenshtein


Python 3

  1. sudo apt-get install python3-pip
  2. sudo pip3 install --upgrade pip
  3. pip3 install fuzzywuzzy
  4. pip3 install python-Levenshtein



Macintosh OSX


  1. sudo easy_install pip
  2. sudo pip install fuzzywuzzy
  3. sudo pip install python-Levenshtein


Windows OS


  • Download of get-pip.py
  • Update of the Environment variables ( Control Panel >> System and Security >> System >> Advanced system settings >> Environment variables >> System variables >> Path >> Edit ) by adding ( for instance in case of python 2.7 ) :

    1. C:\Python27;C:\Python27\Scripts
  • Install the Build Tools for Visual Studio

  • Open the Command prompt and use the following commands:
    1. pip install fuzzywuzzy
    2. pip install python-Levenshtein


Installation of the fuzzywuzzyR package


To install the package from CRAN use,

  1. install.packages('fuzzywuzzyR')


and to download the latest version from Github use the install_github function of the devtools package,


  1. devtools::install_github(repo = 'mlampros/fuzzywuzzyR')



Use the following link to report bugs/issues,


https://github.com/mlampros/fuzzywuzzyR/issues


Citation:

If you use the code of this repository in your paper or research please cite both fuzzywuzzyR and the original software https://CRAN.R-project.org/package=fuzzywuzzyR/citation.html:


  1. @Manual{,
  2. title = {{fuzzywuzzyR}: Fuzzy String Matching in R},
  3. author = {Lampros Mouselimis},
  4. year = {2021},
  5. note = {R package version 1.0.5},
  6. url = {https://CRAN.R-project.org/package=fuzzywuzzyR},
  7. }