项目作者: Xennis

项目描述 :
Parser for EpiDoc (Epigraphic Documents in TEI XML)
高级语言: Python
项目地址: git://github.com/Xennis/epidoc-parser.git
创建时间: 2020-04-05T21:38:05Z
项目社区:https://github.com/Xennis/epidoc-parser

开源协议:

下载


EpiDoc Parser

Python

Python parser for EpiDoc (epigraphic documents in TEI XML).

For example idp.data-sheet uses the parser to generate a single CSV sheet of the Papyri.info Integrating Digital Papyrology data.

Usage

Installation

Install the package

  1. pip install git+https://github.com/Xennis/epidoc-parser

Load a document

Load a document from a file

  1. import epidoc
  2. with open("my-epidoc.xml") as f:
  3. doc = epidoc.load(f)

Load a document from a string

  1. import epidoc
  2. my_epidoc = """<?xml version="1.0" encoding="UTF-8"?>
  3. <?xml-model href="http://www.stoa.org/epidoc/schema/8.13/tei-epidoc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
  4. <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="hgv74005">
  5. [...]
  6. </TEI>
  7. """
  8. doc = epidoc.loads(my_epidoc)

Get data from a document

Call the attributes, for example

  1. >>> doc.title
  2. "Ordre de paiement"
  3. >>> doc.material
  4. "ostrakon"
  5. >>> doc.languages
  6. {"en": "Englisch", "la": "Latein", "el": "Griechisch"}
  7. >>> [t.get("text") for t in doc.terms]
  8. ["Anweisung", "Zahlung", "Getreide"]
  9. >>> doc.origin_place.get("text")
  10. "Kysis (Oasis Magna)"
  11. >>> doc.origin_dates[0]
  12. {"notbefore": "0301", "notafter": "0425", "precision": "low", "text": "IV - Anfang V"}

Documentation

Field EpiDoc source element (XPath)
commentary //body/div[@type='commentary' and @subtype='general']
edition_foreign_languages //body/div[@type='edition']//foreign/@xml:lang
edition_language //body/div[@type='edition']/@xml:lang
idno //teiHeader/fileDesc/publicationStmt/idno
authority //teiHeader/fileDesc/publicationStmt/authority
availability //teiHeader/fileDesc/publicationStmt/availability
languages //teiHeader/profileDesc/langUsage/language
material //teiHeader/fileDesc/sourceDesc/msDesc/physDesc/objectDesc//support/material
origin_dates //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origDate
origin_place //teiHeader/fileDesc/sourceDesc/msDesc/history/origin/origPlace
provenances //teiHeader/fileDesc/sourceDesc/msDesc/history/provenance
reprint_from //body/ref[@type='reprint-from']
reprint_in //body/ref[@type='reprint-in']
terms //teiHeader/profileDesc/textClass//term
title //teiHeader/fileDesc/titleStmt/title

Development

Create a virtual environment, enable it and install the dependencies

  1. python3 -m venv venv
  2. . venv/bin/activate
  3. pip install --requirement requirements.txt

Run the test

  1. make unittest

LICENSE

Code

see LICENSE

Test data

The test data in this project is from the project idp.data by Papyri.info. This data is made available under a Creative Commons Attribution 3.0 License, with copyright and attribution to the respective projects.