public:courses:machine_learning:machine_learning:application_example

XVIII - Application Example: Photo OCR

  • Photo OCR pipeline: (eg. a machine learning pipeline)
    1. Text detection
    2. Character segmentation
    3. Character classification
  • The sliding window should be moved on each step by the step-size or stride (better if this is of 1 pixel, but in practice we can use say 4 pixels).
  • For the text detection, we can use a small fixed size sliding windows, and then paint a black and white image with the results of the classifier (white is positive).
  • Then we expand this image, and from that we can extract text rectangles.
  • We can amplify the training dataset by introducing distortion (wraping on text, noisy background on audio, etc…)
  • Purely generated noise is generally less usefull.
  • For each module in the pipeline, we simulate a perfect accuracy, to figure out what would be the accuracy of the next modules in that case.
  • public/courses/machine_learning/machine_learning/application_example.txt
  • Last modified: 2020/07/10 12:11
  • by 127.0.0.1