项目作者: FHOOEAIST

项目描述 :
Creates a simple neo4j database based on the imdb dataset
高级语言: Java
项目地址: git://github.com/FHOOEAIST/neo4j-imdb.git
创建时间: 2020-08-13T12:52:08Z
项目社区:https://github.com/FHOOEAIST/neo4j-imdb

开源协议:Mozilla Public License 2.0

下载


neo4j-imdb

Imports the IMDB dataset to a neo4j database.

How to use

To be able to use this project, the following steps have to be done in the given order.

Requirements

The project was tested with the following software requirements:

Please ensure that the required software is available on your system.

IMDB-data

According to the license model of the imdb dataset, the used source files are not included in this repository. To be
able to use the provided importer the mentioned files (name.basics.tsv, title.principals.tsv, title.basics.tsv)
have to be included in the resources folder of this project. Then you can specify the number of relations to be imported
by changing the maxRelations-parameter of importer bean in the neo4j-config. Like this you can import
only a subset of the whole dataset.

The imdb data is available under the following link https://datasets.imdbws.com/, and
the documentation to the interfaces can be found here https://www.imdb.com/interfaces/.

Neo4j database

In order to import the data a neo4j database has to be running on your system. The configuration of the connection
must be adapted in the resources/neo4j-config.xml file.

For importing the data into the database a csv bulk import statement is used. To perform this statement the csv files
are exported in the database’s import directory. For this, the neo4j installation directory has to be defined in the
neo4j-config.xml as a constructor argument (neo4jDatabasePath) of the TSV2CSV class.

How to run

To start the import process, run the main function in the Neo4jImdbMain class.

Contact

If you have any questions please contact us: contact@aist.science.

Scientific Work

If you are using this repository inside a research publication, we would ask you to cite us:

DOI