XVIII - Application Example: Photo OCR
18.1 - Problem Description and Pipeline
- Photo OCR pipeline: (eg. a machine learning pipeline)
- Text detection
- Character segmentation
- Character classification
18.2 - Sliding Windows
- The sliding window should be moved on each step by the step-size or stride (better if this is of 1 pixel, but in practice we can use say 4 pixels).
- For the text detection, we can use a small fixed size sliding windows, and then paint a black and white image with the results of the classifier (white is positive).
- Then we expand this image, and from that we can extract text rectangles.
18.3 - Getting Lots of Data and Artificial Data
- We can amplify the training dataset by introducing distortion (wraping on text, noisy background on audio, etc…)
- Purely generated noise is generally less usefull.
18.4 - Ceiling Analysis: What part of the pipeline to work on next
- For each module in the pipeline, we simulate a perfect accuracy, to figure out what would be the accuracy of the next modules in that case.