Why Tesseract OCR library (iOS) cannot recognize text at all? Why Tesseract OCR library (iOS) cannot recognize text at all? objective-c objective-c

Why Tesseract OCR library (iOS) cannot recognize text at all?


You are using the option tessedit_char_whitelist with the value "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ" which limits the character recognition to this list only. However the image that you want to process contains lower case characters, if you want to use this option you will have to include lower cases char too.

[tesseractObject setVariableValue:@"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" forKey:@"tessedit_char_whitelist"];


Make sure you have the latest tessdata file from Google code

http://code.google.com/p/tesseract-ocr/downloads/list

This will provide you with a list of tessdata files that you need to download and include in your app if you haven't already. In your case you will need tesseract-ocr-3.02.eng.tar.gz as you are looking for the English language files

The following article will show you where you need to install it. I read through this tutorial when I built my first Tesseract project and found it really useful

http://lois.di-qual.net/blog/install-and-use-tesseract-on-ios-with-tesseract-ios/


Like Adam said, if you want good results, you'll have to do some image processing and configure some settings (white-listing certain characters, etc).

For anyone else stumbling upon this question, I've put together a sample project here that does some white-listing and image processing:https://github.com/mstrchrstphr/OCR-iOS-Example