项目作者: gromnitsky

项目描述 :
Filters out rss/atom feeds. Returns articles matching a pattern. The output is another valid xml feed.
高级语言: JavaScript
项目地址: git://github.com/gromnitsky/grepfeed.git
创建时间: 2016-01-18T01:08:06Z
项目社区:https://github.com/gromnitsky/grepfeed

开源协议:

下载


Filters out RSS/Atom feeds, returning articles that match a specified
pattern. The output is another valid XML feed.

What’s included

  • a cli util;
  • a standalone http server that shares the same engine w/ the cli util.
  • a web client that uses the included server as an intermediary and
    acts as a gui version of the cli util.

Requirements

  • node >= 20

Setup

  1. $ npm i -g grepfeed
  2. $ grepfeed-server

Open http://127.0.0.0:3000 in a browser.

How it works

lib/feed.js contains all the code that parses & transforms xml
feeds. Its core is Grep class—a Transform stream:

  1. readable_stream.pipe(<our filter>).pipe(writable_stream)

cli

cli/grepfeed.js extends Grep to override several methods where
it’s convenient to write the output in any format one wants. 3
interfaces are included: text-only (the default), json, xml. The
latter produces a valid rss 2.0 feed. E.g.

  1. $ curl http://example.com/rss | cli/grepfeed.js apple -d=2016 -x

parses the input feed, selects only articles written in 2016 or newer
that match the regexp pattern /apple/. -x means xml output.

  1. Usage: grepfeed.js [opt] [PATTERN] < xml
  2. -e print only articles w/ enclosures
  3. -n NUM number of articles to print
  4. -x xml output
  5. -j json output
  6. -m print only meta
  7. -V program version
  8. Filter by:
  9. -d [-]date[,date]
  10. -c categories
  11. Or/and search for a regexp PATTERN in each rss article & print the
  12. matching ones. The internal order of the search: title, summary,
  13. description, author.
  14. -v invert match

server

Acts as a proxy: downloads a requested feed & returns the filtered
xml. Query params match cli/grepfeed.js command line interface. To
start a server, run

  1. $ make
  2. $ server/index.js

(For a different host/port combination, use HOST & PORT env vars.)

This following example yields the same xml as in the cli/grepfeed.js
case, only does it through http:

  1. $ curl '127.0.0.1:3000/api/?_=apple&d=2016&url=http%3A%2F%2Fexample.com%2Frss'

Notice d means -d in the cli/grepfeed.js example, -x doesn’t make
sense here, _ means the 1st command line arg, apple in this
case. The server doesn’t invoke cli/grepfeed.js program; they both use
minimist to parse command options, thus the perceived similarity in
the behaviour.

caveats

A URL you’d like to filter must be reachable from within the machine
server/index.js is running on. This could pose a security risk or be
inconvenient if you want to filter XML from your LAN. In the latter
case run grepfeed-server on your local machine.

Bugs

  • All html tags in article titles are removed, even if a title is in
    plain text.
  • This should’ve been written in Rust or something similar, as Node is
    slow and memory hungry for this kind of tasks.

See also

itunesrss,
rss2mail

License

MIT.