项目作者: shadiakiki1986

项目描述 :
OCR + transliteration on arabic scanned images
高级语言: Python
项目地址: git://github.com/shadiakiki1986/ocr-arabic.git
创建时间: 2017-10-17T07:44:25Z
项目社区:https://github.com/shadiakiki1986/ocr-arabic

开源协议:GNU General Public License v3.0

下载


ocr-arabic

OCR + transliteration on arabic scanned images

Used specifically for Lebanese Government ID card scanned images

Installation

  • clone this repository
  • install jq, curl, direnv
  • copy .envrc.dist to .envrc and set Google vision API key into env var GOOGLE_VISION_API_KEY
    • get it from google cloud console
  • direnv allow .

Usage

Download example scanned ID

  1. wget https://www.tradearabia.com/source/2014/08/06/id.jpg -O images/id.jpg

Run OCR and transliteration

  1. ./ocr-arabic.sh images/id.jpg

Example input

example scanned ID

Example output

  1. Transliterated | OCR
  2. ---------------------------------------------------------------------------------
  3. United Arab Emirates o | o setarimE barA detinU
  4. . ldentity card | drac ytitnedl .
  5. dwlp Al<mArAt AlErbyp AlmtHdp |ةدحتملا ةيبرعلا تارامإلا ةلود
  6. bTAqp hwyp | ةيوه ةقاطب
  7. Number | rebmuN
  8. rqm Alhwyp / ID | DI / ةيوهلا مقر
  9. 784-1977-1234566-1 | 1-6654321-7791-487
  10. mn Al<sm: AHmd mHmd Ebd Allh |هللا دبع دمحم دمحا :مسإلا نم
  11. Name: Ahmed Mohamed Abdulla | alludbA demahoM demhA :emaN
  12. Aljnsyp: Al<mArAt AlErbyp AlmtHdp |ةدحتملا ةيبرعلا تارامإلا :ةيسنجلا
  13. Nationality: United Arab Emirates | setarimE barA detinU :ytilanoitaN