Doing OCR with R

By using "tesseract", I created a sample script which works.Even it works for scanned PDF's too.

library(tesseract)library(pdftools)# Render pdf to png imageimg_file <- pdftools::pdf_convert("F:/gowtham/A/B/invoice.pdf", format = 'tiff',  dpi = 400)# Extract text from png imagetext <- ocr(img_file)write.table(text, "F:/gowtham/A/B/mydata.txt")

I'm new to R and Programming. Guide me if it's wrong. Hope this help you.

r shell pdf ocr tesseract

The newly released tesseract package might be worth checking out. It allows you to perform the whole process inside of R without the shell calls.

Taking the procedure as used in the help documentation of the tesseract package your function would look something like this:

lapply(myfiles, function(i){  # convert pdf to jpef/tiff and perform tesseract OCR on the image  # Read in the PDF  pdf <- pdf_text(i)  # convert pdf to tiff  bitmap <- pdf_render_page(news, dpi = 300)  tiff::writeTIFF(bitmap, paste0(i, ".tiff"))  # perform OCR on the .tiff file  out <- ocr(paste0, (".tiff"))  # delete tiff file  file.remove(paste0(i, ".tiff" ))})

CodeHunter

Doing OCR with R

Recent Posts

How can I color dots in a xy scatterplot according to column value?

How to update a claim in ASP.NET Identity?

What does {0} mean when initializing an object?

Accessing members of items in a JSONArray with Java

How to log SQL statements in Spring Boot?

Powershell Get-WebSite name parameter is ignored

How to detect scroll to bottom of html element

Java synchronized method

How to test controllers with CodeIgniter?

Detect Visual Composer

Matplotlib: Specify format of floats for tick labels

Rails join a list of strings with commas and "and" before the last