Converting a Vision VNTextObservation to a String Converting a Vision VNTextObservation to a String ios ios

Converting a Vision VNTextObservation to a String


SwiftOCR

I just got SwiftOCR to work with small sets of text.

https://github.com/garnele007/SwiftOCR

uses

https://github.com/Swift-AI/Swift-AI

which uses NeuralNet-MNIST model for text recognition.

TODO : VNTextObservation > SwiftOCR

Will post example of it using VNTextObservation once I have it one connected to the other.

OpenCV + Tesseract OCR

I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.

SEE ALSO : Google Vision iOS

Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.

https://developers.google.com/vision/text-overview

//Correction: just tried it but only Android version of the sdk supports text detection.

https://developers.google.com/vision/text-overview

If you subscribe to releases: https://libraries.io/cocoapods/GoogleMobileVision

Click SUBSCRIBE TO RELEASESyou can see when TextDetection is added to the iOS part of the Cocoapod


This is how to do it ...

    ////  ViewController.swift//import UIKitimport Visionimport CoreMLclass ViewController: UIViewController {    //HOLDS OUR INPUT    var  inputImage:CIImage?    //RESULT FROM OVERALL RECOGNITION    var  recognizedWords:[String] = [String]()    //RESULT FROM RECOGNITION    var recognizedRegion:String = String()    //OCR-REQUEST    lazy var ocrRequest: VNCoreMLRequest = {        do {            //THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z)            let model = try VNCoreMLModel(for:OCR().model)            return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)        } catch {            fatalError("cannot load model")        }    }()    //OCR-HANDLER    func handleClassification(request: VNRequest, error: Error?)    {        guard let observations = request.results as? [VNClassificationObservation]            else {fatalError("unexpected result") }        guard let best = observations.first            else { fatalError("cant get best result")}        self.recognizedRegion = self.recognizedRegion.appending(best.identifier)    }    //TEXT-DETECTION-REQUEST    lazy var textDetectionRequest: VNDetectTextRectanglesRequest = {        return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection)    }()    //TEXT-DETECTION-HANDLER    func handleDetection(request:VNRequest, error: Error?)    {        guard let observations = request.results as? [VNTextObservation]            else {fatalError("unexpected result") }       // EMPTY THE RESULTS        self.recognizedWords = [String]()        //NEEDED BECAUSE OF DIFFERENT SCALES        let  transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y:  (self.inputImage?.extent.size.height)!)        //A REGION IS LIKE A "WORD"        for region:VNTextObservation in observations        {            guard let boxesIn = region.characterBoxes else {                continue            }            //EMPTY THE RESULT FOR REGION            self.recognizedRegion = ""            //A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0)            for box in boxesIn            {                //SCALE THE BOUNDING BOX TO PIXELS                let realBoundingBox = box.boundingBox.applying(transform)                //TO BE SURE                guard (inputImage?.extent.contains(realBoundingBox))!                    else { print("invalid detected rectangle"); return}                //SCALE THE POINTS TO PIXELS                let topleft = box.topLeft.applying(transform)                let topright = box.topRight.applying(transform)                let bottomleft = box.bottomLeft.applying(transform)                let bottomright = box.bottomRight.applying(transform)                //LET'S CROP AND RECTIFY                let charImage = inputImage?                    .cropped(to: realBoundingBox)                    .applyingFilter("CIPerspectiveCorrection", parameters: [                        "inputTopLeft" : CIVector(cgPoint: topleft),                        "inputTopRight" : CIVector(cgPoint: topright),                        "inputBottomLeft" : CIVector(cgPoint: bottomleft),                        "inputBottomRight" : CIVector(cgPoint: bottomright)                        ])                //PREPARE THE HANDLER                let handler = VNImageRequestHandler(ciImage: charImage!, options: [:])                //SOME OPTIONS (TO PLAY WITH..)                self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill                //FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !!                do {                    try handler.perform([self.ocrRequest])                }  catch { print("Error")}            }            //APPEND RECOGNIZED CHARS FOR THAT REGION            self.recognizedWords.append(recognizedRegion)        }        //THATS WHAT WE WANT - PRINT WORDS TO CONSOLE        DispatchQueue.main.async {            self.PrintWords(words: self.recognizedWords)        }    }    func PrintWords(words:[String])    {        // VOILA'        print(recognizedWords)    }    func doOCR(ciImage:CIImage)    {        //PREPARE THE HANDLER        let handler = VNImageRequestHandler(ciImage: ciImage, options:[:])        //WE NEED A BOX FOR EACH DETECTED CHARACTER        self.textDetectionRequest.reportCharacterBoxes = true        self.textDetectionRequest.preferBackgroundProcessing = false        //FEED IT TO THE QUEUE FOR TEXT-DETECTION        DispatchQueue.global(qos: .userInteractive).async {            do {                try  handler.perform([self.textDetectionRequest])            } catch {                print ("Error")            }        }    }    override func viewDidLoad() {        super.viewDidLoad()        // Do any additional setup after loading the view, typically from a nib.        //LETS LOAD AN IMAGE FROM RESOURCE        let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too        //WE NEED A CIIMAGE - NOT NEEDED TO SCALE        inputImage = CIImage(image:loadedImage)!        //LET'S DO IT        self.doOCR(ciImage: inputImage!)    }    override func didReceiveMemoryWarning() {        super.didReceiveMemoryWarning()        // Dispose of any resources that can be recreated.    }}

You'll find the complete project here included is the trained model !


Apple finally updated Vision to do OCR. Open a playground and dump a couple of test images in the Resources folder. In my case, I called them "demoDocument.jpg" and "demoLicensePlate.jpg".

The new class is called VNRecognizeTextRequest. Dump this in a playground and give it a whirl:

import Visionenum DemoImage: String {    case document = "demoDocument"    case licensePlate = "demoLicensePlate"}class OCRReader {    func performOCR(on url: URL?, recognitionLevel: VNRequestTextRecognitionLevel)  {        guard let url = url else { return }        let requestHandler = VNImageRequestHandler(url: url, options: [:])        let request = VNRecognizeTextRequest  { (request, error) in            if let error = error {                print(error)                return            }            guard let observations = request.results as? [VNRecognizedTextObservation] else { return }            for currentObservation in observations {                let topCandidate = currentObservation.topCandidates(1)                if let recognizedText = topCandidate.first {                    print(recognizedText.string)                }            }        }        request.recognitionLevel = recognitionLevel        try? requestHandler.perform([request])    }}func url(for image: DemoImage) -> URL? {    return Bundle.main.url(forResource: image.rawValue, withExtension: "jpg")}let ocrReader = OCRReader()ocrReader.performOCR(on: url(for: .document), recognitionLevel: .fast)

There's an in-depth discussion of this from WWDC19