Converts Java Jupyter notebooks into Docker images.
Converts Java Jupyter notebooks (using the IJava
kernel) into Docker images.
Under the hood, JShell is
being used to execute the code from the notebook. However, JShell requires
a certain coding style for it to work, not just any Java code that can be
compiled with javac
. Statements that normally don’t require surrounding in
curly brackets need to be coded with such, otherwise jshell
won’t know
that there is more code to come.
This code works:
if (condition) {
dosomething;
} else {
dosomethingelse;
}
This does not:
if (condition)
dosomething;
else
dosomethingelse;
This one does not work either:
if (condition) {
dosomething;
}
else {
dosomethingelse;
}
In order to extract dependencies, you can use the following line magics in
your Notebook:
%maven ...
— for specifying a single maven dependency, e.g.:
%maven nz.ac.waikato.cms.weka
3.9.4
%jars ...
— for specifying external jars, e.g. a single one:
%jars /some/where/multisearch-weka-package-2020.2.17.jar
Or all jars in a directory:
%jars C:/some/where/*.jar
Converts Java Jupyter notebooks into Docker images.
Usage: [--help] [-m MAVEN_HOME] [-u MAVEN_USER_SETTINGS]
[-j JAVA_HOME] [-v JVM...] -i INPUT
-b DOCKER_BASE_IMAGE [-I DOCKER_INSTRUCTIONS]
-o OUTPUT_DIR
Options:
-m, --maven_home MAVEN_HOME
The directory with a local Maven installation to use instead of the
bundled one.
-u, --maven_user_settings MAVEN_USER_SETTINGS
The file with the maven user settings to use other than
$HOME/.m2/settings.xml.
-j, --java_home JAVA_HOME
The Java home to use for the Maven execution.
-v, --jvm JVM
The parameters to pass to the JVM before launching the application.
-i, --input INPUT
The Java Jupyter notebook to convert.
-b, --docker_base_image DOCKER_BASE_IMAGE
The docker base image to use, e.g. 'openjdk:11-jdk-slim-buster'.
-I, --docker_instructions DOCKER_INSTRUCTIONS
File with additional docker instructions to use for generating the
Dockerfile.
-o, --output_dir OUTPUT_DIR
The directory to output the bootstrapped application, JShell script and
Dockerfile in.
For this example we use the weka_filter_pipeline.ipynb
notebook and the additional weka_filter_pipeline.dockerfile
Docker instructions. This notebook contains a simple Weka filter setup, using
the InterquartileRange
filter to remove outliers and extreme values from an input file and saving the cleaned
dataset as a new file.
The command-lines for this example assume this directory structure:
/some/where
|
+- data
| |
| +- jnb2docker // contains the jar
| |
| +- notebooks
| | |
| | +- weka_filter_pipeline.ipynb // actual notebook
| | |
| | +- weka_filter_pipeline.dockerfile // additional Dockerfile instructions
| |
| +- in
| | |
| | +- bolts.arff // raw dataset to filter
| |
| +- out
|
+- output
| |
| +- wekaiqrcleaner // will contain all the generated data, including "Dockerfile"
For our Dockerfile
, we use the openjdk:11-jdk-slim-buster
base image (-b
), which
contains an OpenJDK 11 installation on top of a Debian “buster”
image. The weka_filter_pipeline.ipynb
notebook (-i
) then gets turned into code
for JShell using the
following command-line:
java -jar /some/where/data/jnb2docker/jnb2docker-0.0.3-spring-boot.jar \
-i /some/where/data/notebooks/weka_filter_pipeline.ipynb \
-o /some/where/output/wekaiqrcleaner \
-b openjdk:11-jdk-slim-buster \
-I /some/where/data/notebooks/weka_filter_pipeline.dockerfile
Now we build the docker image called wekaiqrcleaner
from the Dockerfile
that has been generated in the output directory /some/where/output/wekaiqrcleaner
(-o
option in previous command-line):
cd /some/where/output/wekaiqrcleaner
sudo docker build -t wekaiqrcleaner .
With the image built, we can now push the raw ARFF file through for cleaning.
For this to work, we map the in/out directories from our directory structure
into the Docker container (using the -v
option) and we supply the input
and output files via the INPUT
and OUTPUT
environment variables (using
the -e
option). In order to see a few more messages, we also turn on the
debugging output that is part of the notebook, using the VERBOSE
environment
variable:
sudo docker run -ti \
-v /some/where/data/in:/data/in \
-v /some/where/data/out:/data/out \
-e INPUT=/data/in/bolts.arff \
-e OUTPUT=/data/out/bolts-clean.arff \
-e VERBOSE=true \
wekaiqrcleaner
From the debugging messages you can see that the initial dataset with 40 rows
of data gets reduced to 36 rows.
Disclaimer: This is just a simple notebook tailored to the UCI dataset
bolts.arff.
<dependency>
<groupId>com.github.fracpete</groupId>
<artifactId>jnb2docker</artifactId>
<version>0.0.5</version>
</dependency>