项目作者: ninmonkey

项目描述 :
aggregates web comics
高级语言: Python
项目地址: git://github.com/ninmonkey/NewspaperWebComics.git
创建时间: 2018-05-10T01:23:49Z
项目社区:https://github.com/ninmonkey/NewspaperWebComics

开源协议:

下载


NewspaperWebComics : What is this?

This collects web-comics to one place.

Why?

I wanted a reader that was automatic, but does not require a webserver.

features

if this app breaks, edit config.json with new CSS selectors

  • add/remove comics by editing config.json , no DOM parsing needed.
  • fetch the last X number of comics for that site
  • caches requests. auto-deletes oldest when over max usage
  • requires no web server
  • uses threading to download files in parallel
    x displays

example config.js entry

  1. "xkcd": {
  2. "url": "https://xkcd.com/",
  3. "class": null,
  4. "selectors": {
  5. "image": "#comic img",
  6. "comic_title": "#ctitle",
  7. "prev": "a[rel='prev']"
  8. }
  9. }

required:

  1. image: CSS selector to grab `<img>` element

optional:

  1. comic_title: CSS selector to grab comic title element. Fallback to `image.alt`
  2. class: name of class in CSS for special markup on a single comic
  3. prev: CSS selector for url to prev page

bugfix

cache error somehow:

  1. - cache {} should be thread safe.
  2. - But even if not: threads should *never* intersect domains
  3. - FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\Users\\cppmo_000\\PycharmProjects\\NewspaperWebComics\\cache\\2018 05 19 - 16 50 00 896403'

todo

first:

  • randomize_comics
  • auto open browser
  • optionally clear cache by total size
  • optionally clear cache by age

  • show ‘new’ images based on

    • a local cookie or html5 storage
    • pass ‘date downloaded’ to js_vars
  • why did image[‘title’] fail?

  • screenshot for github
  • optionally: specify order of comics displayed
  • module js pattern

    • de-duplicate code in js init handlers
    • unused: handle_swap(), init()
  • generate_js

    • use more data: title, alt, src,
  • cache.py
    remove all print statements for a STDOUT logger

  • timed failure if thread is timing out eg. exception
  • horizontal image center?
  • move html output to /html/

  • threading

    • need request_cached or anything that touches cache.json ?
  • utilize srcset ?

  • default selectors if config fails

    • images:
      “#comic img”, “#cc-comic img”, “img#comic”, “img#cc-comic”
    • prev:
      “a[rel=’prev’]”, “a.navi-prev”, “a.prev”
  • optionally download only headers:
    http://docs.python-requests.org/en/master/user/advanced/#body-content-workflow

if static site

  • mark/hide read images next time.
  • use cookie or localstorage
    • mark keep track of individual read, then save to cookie
    • or just mark read based on last view date in cookie

if dynamic site

  • display ‘new comic’
    • hide read images next time.
    • auto-mark comics as read when scrolled to
      • read comics will auto-collapse or use lower opacity