====== XVIII - Application Example: Photo OCR ====== ===== 18.1 - Problem Description and Pipeline ===== * Photo OCR pipeline: (eg. a machine learning pipeline) - Text detection - Character segmentation - Character classification ===== 18.2 - Sliding Windows ===== * The sliding window should be moved on each step by the **step-size** or **stride** (better if this is of 1 pixel, but in practice we can use say 4 pixels). * For the text detection, we can use a small fixed size sliding windows, and then paint a black and white image with the results of the classifier (white is positive). * Then we expand this image, and from that we can extract text rectangles. ===== 18.3 - Getting Lots of Data and Artificial Data ===== * We can amplify the training dataset by introducing distortion (wraping on text, noisy background on audio, etc...) * Purely generated noise is generally less usefull. ===== 18.4 - Ceiling Analysis: What part of the pipeline to work on next ===== * For each module in the pipeline, we simulate a perfect accuracy, to figure out what would be the accuracy of the next modules in that case.