项目作者: DARIAH-ERIC

项目描述 :
PDF → GROBID = bibliographic metadata → BibSonomy
高级语言: Java
项目地址: git://github.com/DARIAH-ERIC/DESIR-CodeSprint-TrackB-BibliographicMetadata.git
创建时间: 2018-06-29T09:31:11Z
项目社区:https://github.com/DARIAH-ERIC/DESIR-CodeSprint-TrackB-BibliographicMetadata

开源协议:Apache License 2.0

下载


PDF → GROBID = bibliographic metadata → BibSonomy

This tool allows to extract bibliographical metadata from PDF files
using GROBID and to store it in
BibSonomy. The tool was developed during
the DESIR Code Sprint in
Berlin (31.7.-2.8.2018).

Online Version

There is an online version of this tool available here: http://track-b.desir.dariah.eu/

Installation and Setup

The tool consists of a Java-based backend (server) and a Node.js-based
frontend (client).

Frontend Setup

Manually with npm

  1. # install dependencies
  2. npm install
  3. # serve dev version with hot reload at localhost:8080
  4. npm run dev
  5. # build for production with minification
  6. npm run build
  7. # build for production and view the bundle analyzer report
  8. npm run build --report

Via asdf

First install asdf, see installation

  1. # install asdf plugin for nodejs
  2. asdf plugin-add asdf-vm/asdf-nodejs
  3. # build for production with minification
  4. npm run build

Backend Setup

GROBID can be used with a local installation or using the REST-based
web api.

When using GROBID locally (by default it uses a public API)

For a local installation the GROBID model files must be downloaded
(e.g.,
https://dl.bintray.com/rookies/maven/org/grobid/grobid-home/0.5.5/grobid-home-0.5.5.zip)
and placed into an appropriate folder which is configured via the
option grobid.home.path in application.properties.

Copy the file install-files/application.properties into your
application root and set the correct paths and keys:

  1. grobid.home.path=/Users/YourUserName/Work/Grobid/grobid-home/

Starting

To start the application use

  1. mvn spring-boot:run

Or (if you want to use your local installation of GROBID):

  1. mvn -Dspring.config.location=file:/....../DESIR-CodeSprint/trackB/backend/application.properties spring-boot:run

where you replace
file:/....../DESIR-CodeSprint/trackB/backend/application.properties
with the path to your local configuration file.

Usage (as a service)

Extra configuration file

Install folder example: /opt/trackB/

Make a copy of the configuration template install-files/trackB.conf
and add it to the install folder. This is in order to let the
init.d script use extra property files for your server.

Create executable

In the build folder:

  1. mvn clean package

Or (if you want to use your local installation of GROBID):

  1. mvn -Dspring.config.location=file:/....../DESIR-CodeSprint/trackB/backend/application.properties clean package

Copy the executable (.jar) to the installation folder.

Centos 7

Create a symbolic link (ln -s) from /opt/trackB/trackB.jar to
/etc/init.d/trackB to be able to launch the tool as a service
(usable for CentOS 6.x servers for example).

Ubuntu 18.04

Set the owner of the files (for simplicity, I use the same user as for apache2 on Ubuntu):
sudo chown -R www-data:www-data /opt/trackB
Make a copy of the configuration file install-files/trackB.service to /etc/systemd/system/. Of course, change
the path to the jar file and the correct user to launch the command.

Start the service

service trackB start

service httpd restart

The server should now listen on the port 8080 by default:

http://localhost:8080/trackB/

Redirect from Apache HTTPD to our own Service

Here is an example of a conf file for Apache httpd using SSL and redirection from the port 443 (SSL) to our application running on port 8080.
The port 80 is also redirected to 443 and therefore to 8443 when used.
(Example using a server: trackB.dariah.eu)

  1. NameVirtualHost *:80
  2. NameVirtualHost *:443
  3. <VirtualHost *:443>
  4. SSLEngine on
  5. SSLProxyEngine On
  6. SSLCertificateFile /etc/letsencrypt/live/trackB.dariah.eu/cert.pem
  7. SSLCertificateKeyFile /etc/letsencrypt/live/trackB.dariah.eu/privkey.pem
  8. SSLCertificateChainFile /etc/letsencrypt/live/trackB.dariah.eu/chain.pem
  9. ServerName https://trackB.dariah.eu/
  10. Redirect / https://trackB.dariah.eu/trackB/
  11. ProxyPass /trackB/ http://localhost:8080/trackB/
  12. ProxyPassReverse /trackB/ http://localhost:8080/trackB/
  13. </VirtualHost>
  14. <VirtualHost *:80>
  15. ServerName http://trackB.dariah.eu/
  16. DocumentRoot /var/www/
  17. ErrorLog /var/log/httpd/trackB_error_log
  18. CustomLog /var/log/httpd/trackB_access_log combined
  19. Redirect / https://trackB.dariah.eu/
  20. </VirtualHost>

Further Information

BibSonomy

BibSonomy is a social bookmarking system
that helps you to organize your scientific work. Use BibSonomy to
collect publications and bookmarks, to collaborate with your
colleagues, and to discover interesting researches for your daily
work.

You can get your BibSonomy API key from the settings
page
. Do not
put your API key into a public repository.

GROBID

GROBID is a machine learning
library for extracting, parsing and re-structuring raw documents, such
as PDF documents, into structured TEI-encoded ones.

Contributing

Contributions are welcome! Just fork and send your pull requests.

Credits

Created at the DESIR
CodeSprint
by
yoannspace,
rjoberon,
ChristophHubeL3S,
ctot-nondef, and
schmima. See
contributors.

License

This project is licensed under the Apache License 2.0 - see the
LICENSE.md
file for details.