项目作者: adibyte95

项目描述 :
To predict text in captcha images using CNN
高级语言: Jupyter Notebook
项目地址: git://github.com/adibyte95/optical-character-recognition-OCR.git
创建时间: 2018-03-30T03:57:31Z
项目社区:https://github.com/adibyte95/optical-character-recognition-OCR

开源协议:MIT License

下载


HitCount

optical-character-recognition-OCR

Topic


this repository aims to convert simple images containing captchas into text.

Results


Now the question is how well it performs.It performs pretty good if the given captchas are like the ones the neural network is trained on. that is text on a white background. it fails to recognize some of the characters if the letters look different or if the background is of somewhat different colour. Also note that the dataset on which neural network is trained on does not have some characters like ‘L’ or ‘1’ etc so it will make wrong predictions on those.Also note that this has been trained on capital letters of english alphabet so it cannot detect small letters from the english alphabet.

here is an outcome


some of the errors here are due to absence of letters in the training set like absence of the letter ‘O’.others are due to different apperences of training set images which can be fixed due by some data augmentation

Note


please feel free the raise any issue. i am also open to suggestions to improve this project and pull requests

Credits


this repository is inspired from a medium post.read more about it @ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710" target="_blank">here. you can also download the dataset from this post or clone this repository.