readability4s

A Scala library to extract content from an article HTML: title, full text, favicon, image, etc.

This project is a scala port of Mozilla’s Readability.js with a few tweaks and improvements.
Scala version is 2.12.

Usage

Import the project with Maven as follows:

<dependency>
  <groupId>com.github.ghostdogpr</groupId>
  <artifactId>readability4s</artifactId>
  <version>1.0.9</version>
</dependency>

To parse a document, you must create a new Readability object from a URI string and an HTML string, and then call parse(). Here’s an example:

val article = Readability(url, htmlString).parse()

It returns an Option[Article].
It is either None when the article could not be processed, or an Article with the following properties: