Table of Contents

XVIII - Application Example: Photo OCR

XVIII - Application Example: Photo OCR

18.1 - Problem Description and Pipeline

Photo OCR pipeline: (eg. a machine learning pipeline)
1. Text detection
2. Character segmentation
3. Character classification

18.2 - Sliding Windows

The sliding window should be moved on each step by the step-size or stride (better if this is of 1 pixel, but in practice we can use say 4 pixels).
For the text detection, we can use a small fixed size sliding windows, and then paint a black and white image with the results of the classifier (white is positive).
Then we expand this image, and from that we can extract text rectangles.

18.3 - Getting Lots of Data and Artificial Data

We can amplify the training dataset by introducing distortion (wraping on text, noisy background on audio, etc…)
Purely generated noise is generally less usefull.

18.4 - Ceiling Analysis: What part of the pipeline to work on next

For each module in the pipeline, we simulate a perfect accuracy, to figure out what would be the accuracy of the next modules in that case.