项目作者: todesking

项目描述 :
Run Scala code, out ipynb
高级语言: Scala
项目地址: git://github.com/todesking/scalanb.git
创建时间: 2018-08-20T12:55:28Z
项目社区:https://github.com/todesking/scalanb

开源协议:MIT License

下载


scalanb: Scala notebook

Status: PoC

Installation

Scalanb is not published yet.

  1. // In build.sbt
  2. // To use batch notebook, you need macro paradise plugin and additional compiler options.
  3. addCompilerPlugin("org.scalamacros" % "paradise" % "2.1.0" cross CrossVersion.full)
  4. scalacOptions += "-Yrangepos"

Batch Notebook

  1. Setup dependencies, compiler plugin, scalac options in build.sbt
  2. Create notebook class with @Notebook annotation
  3. Run notebook (main method is automatically generated)
  4. .ipynb is saved in ~/.scalanb/hist(default)
  1. import com.todesking.{scalanb => nb}
  2. @nb.Notebook
  3. class MyNotebook {
  4. nb.markdown("# Example of scalanb")
  5. // add more code here
  6. }

and

  1. $ sbt 'runMain MyNotebook'

See Example1.scala and its output

To specify history location, use --out option.

  1. $ sbt 'runMain MyNotebook --out=file:path=./hist/'

Spark Batch Notebook

Use spark.Notebook annotation

  1. import com.todesking.{scalanb => nb}
  2. @nb.spark.Notebook
  3. class MyNotebook {
  4. // spark session available here
  5. val df = spark.read.csv("...")
  6. // Show dataframe as HTML tables via `nb.show` method
  7. df.nb.show(10)
  8. }
  1. $ sbt assembly # Make fatjar
  2. $ spark-submit --class MyNotebook myapp.jar

Save history to HDFS

Requirement: scalanb-spark

  1. $ sbt 'runMain MyNotebook --out=hdfs:path=/tmp/hist/'

Execution log

When --log option enabled, realtime log available.

  1. $ sbt 'runMain MyNotebook --log'
  1. # .scalanb/hist/{TIME}_{NOTE_NAME}.log
  2. [2018-08-21 21:46:48] > nb.setShowTimeMillis(100)
  3. [2018-08-21 21:46:48] > nb.markdown("# Scalanb Example")
  4. [2018-08-21 21:46:48] > val a = 1
  5. [2018-08-21 21:46:48] > val b = 2
  6. [2018-08-21 21:46:48] > a
  7. [2018-08-21 21:46:48] => 1
  8. [2018-08-21 21:46:48] > println(s"a = $a")
  9. [2018-08-21 21:46:48] stdout: a = 1

Caching

  1. import com.todesking.{scalanb => nb}
  2. @nb.Notebook
  3. class BigData {
  4. val cp = nb.checkpoint
  5. val rawLog = cp.nocache { loadData("data/raw.csv") }
  6. val count = cp.cache(rawLog) { rawLog => rawLog.count() }
  7. cp.unwrap(count) { count =>
  8. println(s"count = $count")
  9. }
  10. val userId = 10
  11. val theUsersLogs = cp.cache((rawLog, userId)) { case (rawLog, userId) =>
  12. rawLog.where('user_id === userId)
  13. }
  14. cp.unwrap(theUsersLogs) { theUsersLogs =>
  15. theUsersLogs.count()
  16. theUsersLogs.show()
  17. }
  18. }

Cache is based on value’s ID.
ID calculated from

  • val name
  • AST
  • Dependent values
  • Runtime value(if supported)
  1. // ID: rawLog-{ loadData("data/raw.csv") }
  2. val rawLog = cp.nocache { loadData("data/raw.csv") }
  3. // ID: count-{ rawLog => rawLog.count() }(rawLog-{ loadData("data/raw.csv") })
  4. val count = cp.cache(rawLog) { rawLog => rawLog.count() }
  5. // Primitive values could be dependent value.
  6. // ID: lit:10
  7. val userId = 10
  8. // ID: theUsersLogs-{ case (rawLog, userId) => rawLog.where('user_id === userId) }((rawLog-{ loadData("data/raw.csv") }, lit:10))
  9. val theUsersLogs = cp.cache((rawLog, userId)) { case (rawLog, userId) =>
  10. rawLog.where('user_id === userId)
  11. }

Cache location could specified by --cache option. Default is ~/.scalanb/cache/

  1. --cache=file:path=/path/to/cache
  2. --cache=hdfs:path=/path/to/cache # requires scalanb-spark

Cache file spec

  • {root}/{namespace}/{name}
    • `{hex digest}
      • cache.json: metadata(TODO)
      • data: Serialized data(Format is type specific)

Plot using evilplot

To integrate EvilPlot, use this snippet:

  1. import com.cibo.evilplot.plot
  2. import plot.aesthetics.DefaultTheme._
  3. implicit val plotFormat = nb.Format[plot.Plot] { plot =>
  4. val img = plot.render().asBufferedImage
  5. val buf = new java.io.ByteArrayOutputStream()
  6. val png = javax.imageio.ImageIO.write(img, "png", buf)
  7. buf.close()
  8. nb.Value.binary("image/png", buf.toByteArray)
  9. }

And you can embed plot in notebook:

  1. import com.cibo.evilplot.numeric.Point
  2. val data = (0.0 until 1.0 by 0.02).map { v =>
  3. (v, v * scala.util.Random.nextDouble)
  4. }.toSeq
  5. plot.LinePlot(data.map { case (x, y) => Point(x, y) })
  6. .xAxis()
  7. .yAxis()
  8. .frame()
  9. .xLabel("x")
  10. .yLabel("y")