项目作者: sdam-au

项目描述 :
Cleaning of epigraphic texts for further text mining and analysis
高级语言: HTML
项目地址: git://github.com/sdam-au/epigraphic_cleaning.git
创建时间: 2020-05-07T09:39:19Z
项目社区:https://github.com/sdam-au/epigraphic_cleaning

开源协议:Creative Commons Attribution Share Alike 4.0 International

下载


Cleaning models for epigraphic texts

ETL


Purpose

The main aim of this repository is to clean any Greek or Latin epigraphic text of an inscription for further text mining, using R and Regular Expressions.
Currently the clenaing functions are designed for PHI Greek Inscriptions, but in the near futire cleaning scripts for EDH will be implemented.


Authors

License

CC-BY-SA 4.0, see attached License.md

DOI

[Here will be DOI or some other identifier once we have it]

References

[Here will go related articles or other sources we will publish/create]


How to use this repository

Sources and prerequisites

[Describe the provenance of data used in the scripts contained and clarify how it is harvested and what other prerequisites are required to get the scripts working. In case of pure tool attribute any reused scripts to source, etc., license and specify any prerequisites or technical requirements.]

Data

1) The cleaning scripts are designed to work with the structure of PHI Greek Searchable Inscriptions

Software

  1. R, v.4.0
  2. RStudio (optional)
  3. Some knowledge of Regular Expressions

Registered account

  1. NA

Hardware

  1. Computer with large enough RAM

Installation

[Describe the steps necessary to install the tool/package; example: https://gist.github.com/PurpleBooth/109311bb0361f32d87a2]


Instructions

See scripts/R/Epigraphic_cleaning_models.Rmd script with instructions.

Screenshots

Example screenshot