Extraction of Text from Images

One of the most difficult project but very useful. If you can make software which can read text from images then it will be very useful to be used in robots eyes or making a machine who can read books and newspaper for us.


How can you build one of your own?

In this post we will use PHP. Though you can use any language of your choice but PHP is much easier to learn and setup so I choose that for our OCR Project.

As OCR requires AI and neural networks, we are going to use the library tesseract : https://github.com/tesseract-ocr/tesseract

Before we start, you will need the ubuntu installation to install this library easily. If you don't have ubuntu, you can use Vagrant to use ubuntu on your windows or mac computer.

Vagrant is awesome tool, when creating and using Virtual machine with couple of commands. It requires Virtual host installed on the computer before you can use it. 

If you need help with the installation of Vagrant. This page can help you with that: https://www.vagrantup.com/docs/installation/

Helpful Links :

What is OCR ?

OCR Handwriting

Related PDF :

Text information in Images ( PDF File 1.5 MB approx.)

Image Parsing to Text Description

Image to Text

Multiscale Edge based text extraction from complex Images

Text Segmentation from Images

A Density-based Approach for Text Extraction in Images

Related Presentation

Limits of OCR

Related Video :

Comments >>