项目作者: emadRad

项目描述 :
Reading more RDF serialization format for SANSA-RDF
高级语言: Scala
项目地址: git://github.com/emadRad/SANSA-RDF-Reader.git
创建时间: 2017-07-23T16:46:37Z
项目社区:https://github.com/emadRad/SANSA-RDF-Reader

开源协议:

下载


SANSA RDF Reader

Description

SANSA RDF is a library to read RDF files into Spark. SANSA RDF Reader is an extension of io package of SANSA RDF Reader for reading N-Quads, Turtle and RDF/XML serialization formats of RDF.

This package reads N-Quads, Turtle and RDF/XML files and loads them into RDD, DataFrame and GraphX‘s Graph of Spark.

SANSA RDF Spark

The main application class is sansa_rdf.App.
The application requires as application argument:

  • path to the input folder containing the data as .nq, .rdf or .ttl (e.g. data/stw.rdf)

Running the application on a Spark

To run the application on a standalone Spark cluster

  1. Setup a Spark cluster
  2. Build the application with Maven

    1. cd /path/to/application
    2. mvn clean package
  3. Submit the application to the Spark cluster

    1. spark-submit \
    2. --class sansa_rdf.App \
    3. --master spark://spark-master:7077 \
    4. target/RDF_Reader-1.0-SNAPSHOT.jar \
    5. /data/input

    and for running each object individually replace the value of —class with one of sansa_rdf.io.NQuadReader, sansa_rdf.io.TurtleReader or sansa_rdf.io.XmlReader.