项目作者: taiao

项目描述 :
Converts Java Jupyter notebooks into Docker images.
高级语言: Java
项目地址: git://github.com/taiao/jnb2docker.git
创建时间: 2020-02-25T02:37:58Z
项目社区:https://github.com/taiao/jnb2docker

开源协议:Apache License 2.0

下载


jnb2docker

Converts Java Jupyter notebooks (using the IJava
kernel) into Docker images.

Coding conventions

Under the hood, JShell is
being used to execute the code from the notebook. However, JShell requires
a certain coding style for it to work, not just any Java code that can be
compiled with javac. Statements that normally don’t require surrounding in
curly brackets need to be coded with such, otherwise jshell won’t know
that there is more code to come.

This code works:

  1. if (condition) {
  2. dosomething;
  3. } else {
  4. dosomethingelse;
  5. }

This does not:

  1. if (condition)
  2. dosomething;
  3. else
  4. dosomethingelse;

This one does not work either:

  1. if (condition) {
  2. dosomething;
  3. }
  4. else {
  5. dosomethingelse;
  6. }

In order to extract dependencies, you can use the following line magics in
your Notebook:

  • %maven ... — for specifying a single maven dependency, e.g.:

    1. %maven nz.ac.waikato.cms.weka:weka-dev:3.9.4
  • %jars ... — for specifying external jars, e.g. a single one:

    1. %jars /some/where/multisearch-weka-package-2020.2.17.jar

    Or all jars in a directory:

    1. %jars C:/some/where/*.jar

Command-line

  1. Converts Java Jupyter notebooks into Docker images.
  2. Usage: [--help] [-m MAVEN_HOME] [-u MAVEN_USER_SETTINGS]
  3. [-j JAVA_HOME] [-v JVM...] -i INPUT
  4. -b DOCKER_BASE_IMAGE [-I DOCKER_INSTRUCTIONS]
  5. -o OUTPUT_DIR
  6. Options:
  7. -m, --maven_home MAVEN_HOME
  8. The directory with a local Maven installation to use instead of the
  9. bundled one.
  10. -u, --maven_user_settings MAVEN_USER_SETTINGS
  11. The file with the maven user settings to use other than
  12. $HOME/.m2/settings.xml.
  13. -j, --java_home JAVA_HOME
  14. The Java home to use for the Maven execution.
  15. -v, --jvm JVM
  16. The parameters to pass to the JVM before launching the application.
  17. -i, --input INPUT
  18. The Java Jupyter notebook to convert.
  19. -b, --docker_base_image DOCKER_BASE_IMAGE
  20. The docker base image to use, e.g. 'openjdk:11-jdk-slim-buster'.
  21. -I, --docker_instructions DOCKER_INSTRUCTIONS
  22. File with additional docker instructions to use for generating the
  23. Dockerfile.
  24. -o, --output_dir OUTPUT_DIR
  25. The directory to output the bootstrapped application, JShell script and
  26. Dockerfile in.

Example

For this example we use the weka_filter_pipeline.ipynb
notebook and the additional weka_filter_pipeline.dockerfile
Docker instructions. This notebook contains a simple Weka filter setup, using
the InterquartileRange
filter to remove outliers and extreme values from an input file and saving the cleaned
dataset as a new file.

The command-lines for this example assume this directory structure:

  1. /some/where
  2. |
  3. +- data
  4. | |
  5. | +- jnb2docker // contains the jar
  6. | |
  7. | +- notebooks
  8. | | |
  9. | | +- weka_filter_pipeline.ipynb // actual notebook
  10. | | |
  11. | | +- weka_filter_pipeline.dockerfile // additional Dockerfile instructions
  12. | |
  13. | +- in
  14. | | |
  15. | | +- bolts.arff // raw dataset to filter
  16. | |
  17. | +- out
  18. |
  19. +- output
  20. | |
  21. | +- wekaiqrcleaner // will contain all the generated data, including "Dockerfile"

For our Dockerfile, we use the openjdk:11-jdk-slim-buster base image (-b), which
contains an OpenJDK 11 installation on top of a Debian “buster”
image. The weka_filter_pipeline.ipynb notebook (-i) then gets turned into code
for JShell using the
following command-line:

  1. java -jar /some/where/data/jnb2docker/jnb2docker-0.0.3-spring-boot.jar \
  2. -i /some/where/data/notebooks/weka_filter_pipeline.ipynb \
  3. -o /some/where/output/wekaiqrcleaner \
  4. -b openjdk:11-jdk-slim-buster \
  5. -I /some/where/data/notebooks/weka_filter_pipeline.dockerfile

Now we build the docker image called wekaiqrcleaner from the Dockerfile
that has been generated in the output directory /some/where/output/wekaiqrcleaner
(-o option in previous command-line):

  1. cd /some/where/output/wekaiqrcleaner
  2. sudo docker build -t wekaiqrcleaner .

With the image built, we can now push the raw ARFF file through for cleaning.
For this to work, we map the in/out directories from our directory structure
into the Docker container (using the -v option) and we supply the input
and output files via the INPUT and OUTPUT environment variables (using
the -e option). In order to see a few more messages, we also turn on the
debugging output that is part of the notebook, using the VERBOSE environment
variable:

  1. sudo docker run -ti \
  2. -v /some/where/data/in:/data/in \
  3. -v /some/where/data/out:/data/out \
  4. -e INPUT=/data/in/bolts.arff \
  5. -e OUTPUT=/data/out/bolts-clean.arff \
  6. -e VERBOSE=true \
  7. wekaiqrcleaner

From the debugging messages you can see that the initial dataset with 40 rows
of data gets reduced to 36 rows.

Disclaimer: This is just a simple notebook tailored to the UCI dataset
bolts.arff.

Releases

Maven

  1. <dependency>
  2. <groupId>com.github.fracpete</groupId>
  3. <artifactId>jnb2docker</artifactId>
  4. <version>0.0.5</version>
  5. </dependency>