项目作者: dongjinleekr

项目描述 :
A Java binding to Google SentencePiece
高级语言: C++
项目地址: git://github.com/dongjinleekr/beanpiece.git
创建时间: 2017-10-25T02:10:22Z
项目社区:https://github.com/dongjinleekr/beanpiece

开源协议:Apache License 2.0

下载


Beanpiece: A Java binding to Google SentencePiece

Build Status
codecov.io
Maven Central

SentencePiece is an unsupervised text tokenizer and detokenizer, developed by Google. Beanpiece provides a Java API to SentencePiece.

Compatibility

As of version 0.2, this library provides API compatibility to commit 1ff5904(Apr 1, 2018).

How to build

The following tools are required to build Beanpiece:

  • sbt
  • g++ compiler, which supports c++ 11.

To build the project, just give:

  1. sbt package

It will take all the tasks needed, from copying shared libraries from compiling, packaging the Java source code.

Note for Windows/Mac Users

As of version 0.2, the project only contains libsentencepiece.so for Linux (amd64) only. Because of that, the built jar will not run on osx or windows - they will be added at 0.3.

Until then, please build the sentencepiece shared library by yourself and copy them into:

  • windows: /library/windows/[i386|amd64|ppc]
  • osx: /library/windows/[i386|amd64|ppc]

After then, you can build the project as described above.