项目作者: swinkelhofer

项目描述 :
No more paper. Archive all paper documents to allow full-text searching, tagging and Git based backup
高级语言: Vue
项目地址: git://github.com/swinkelhofer/paperless-office.git
创建时间: 2021-04-04T14:52:34Z
项目社区:https://github.com/swinkelhofer/paperless-office

开源协议:MIT License

下载


Paperless Office

The future is paperless. Unfortunately, most authorities (at least in Germany) still love paper, rendering ‘digitalization’ a foreign word.

To bypass this stale state, I started this project which allows scanned (and therefore digitized) documents to be…

  • … archived and stored savely
  • … (full-text) searched
  • … tagged

The goal for this project was not only environmental happiness, but also customer happiness.

Some of the key-features are:

  • Automatic OCR of scanned documents enables full-text search on any document.
  • Manual tagging improves searching the right document at the right time.
  • Automatic Git backups reduce waste of physical space when compared to paper backups.
  • An optional web viewer allows access to your documents. Anywhere, anytime.

Prerequisites

  • A (document) scanner (necessary)
  • A container engine like Docker (recommended)
  • docker-compose (recommended)
  • Access to any Git provider (e.g. GitLab or GitHub, recommeded)

Screenshots

How it works

  1. Scan a document to a PDF file
  2. Either upload the scanned PDF via the paperless-office Web UI or just dump it into the raw documents folder (see TODO: Configuration)
  3. Let the server side do the text recognition magic. Once finished, the new document is accessible via the Unconfirmed section via the Web UI
  4. Add some tags, double-check dates and recognized meta data (like URLs, e-mail addressess…)
  5. Save and confirm the document. Saving triggers pushing the file to the Git repository (if configured). From now on, the document is prepared to be found in your paperless office.

Setup

Using docker-compose is the most simple way to set up paperless-office.

  • Prepare your environment:

    1. cd /path/paperless-office-documents
    2. mkdir -p data/raw data/processed
    3. # BEGIN Optionally init git repository
    4. cd data/processed
    5. git init
    6. # At the moment only https basic auth for Git is supported
    7. git remote add origin https://username:password@your.git/username/paperless-office-documents.git
    8. # Create .gitlab-ci.yaml or GitHub actions workflow, depending on your Git provider. For a GitLab snippet see further below.
    9. touch .gitlab-ci.yaml
    10. # Get the webviewer.json and index.html
    11. wget https://github.com/swinkelhofer/paperless-office/releases/latest/download/webviewer.js
    12. wget https://github.com/swinkelhofer/paperless-office/releases/latest/download/index.html
    13. # Stage and push your initial changes
    14. git add -A
    15. git commit -am "Init"
    16. git push
    17. cd ../..
    18. # END Optionally init git repository
    19. # Display your user ID for configuration in the next step
    20. id -u
  • Save and adjust the following snippet to a file named docker-compose.yaml:
    1. version: "3.6"
    2. services:
    3. paperless-office:
    4. image: ghcr.io/swinkelhofer/paperless-office:latest
    5. # user must match the UID of the volumes' owner
    6. user: "1000:1000"
    7. ports:
    8. - "8000:8000"
    9. volumes:
    10. - /path/paperless-office-documents/data/processed:/srv/data/processed
    11. - /path/paperless-office-documents/raw:/srv/data/raw
    12. restart: always
  • Run docker-compose up -d to start paperless-office.
  • Configured with the snippet above, the Web UI will be available via browser on http://localhost:8000/.

Similar Projects

There are two other projects named Paperless and Mayan EDMS out there, that have technical overlap with paperless-office. In contrast to paperless-office, both are written in Python and do have a broader feature set (like document encryption). In favor, paperless-office brings a prettier UI, Git integration and a Webviewer allowing access to your documents via GitLab or GitHub pages.

Git integration

A simple Git integration can be extended by supplying a CI workflow to deploy the contents via GitLab Pages or GitHub Pages.

GitLab CI

gitlab-ci.yaml example configuration:

  1. pages:
  2. image: alpine:3.13
  3. script:
  4. - mkdir public
  5. - cp -rf * public/ || true
  6. artifacts:
  7. paths:
  8. - public

Each operation in paperless-office Web UI leads to a push to your Git repository. The CI pipeline will be triggered on each push, therefore re-deploying GitLab pages. The webviewer is then available via https://username.gitlab.io/paperless-office-documents

Contribution

See the contribution guidelines