What OCR options exist beyond Tesseract? [closed]

php python ruby ocr tesseract

I have successfully used GOCR in the past for small image OCR. I would say accuracy was around 85%, after getting the grayscale options set properly, on fairly regular fonts. It fails miserably when the fonts get complicated and has trouble with multiline layouts.

Also have a look at Ocropus, which is maintained by Google. Its related to Tesseract, but from what I understand, its OCR engine is different. With just the default models included, it achieves near 99% accuracy on high-quality images, handles layout pretty well and provides HTML output with information concerning formatting and lines. However, in my experience, its accuracy is very low when the image quality is not good enough. That being said, training is relatively simple and you might want to give it a try.

Both of them are easily callable from the command line. GOCR usage is very straightforward; just type gocr -h and you should have all the information you need. Ocropus is a bit more tricky; here's a usage example, in Ruby:

require 'fileutils'tmp = 'directory'file = 'file.png'`ocropus book2pages #{tmp}/out #{file}``ocropus pages2lines #{tmp}/out``ocropus lines2fsts #{tmp}/out``ocropus buildhtml #{tmp}/out > #{tmp}/output.html`text = File.read("#{tmp}/output.html")FileUtils.rm_rf(tmp)

php python ruby ocr tesseract

We use OCR XTR Lite from Vividata at my office. It uses the ScanSoft engine and is very accurate but isn't a free solution. Currently it is being scripted from bash and I process from 75,000 to 150,000 pages a day with it. Accuracy is almost perfect and it auto-rotates the images to determine the OCR orientation.

CodeHunter

What OCR options exist beyond Tesseract? [closed]

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last