omputer Vision for Skew Correction, Text Inversion, Rotation Classification, Homography & Object Search with Applied Math
“Gone are the days of pure mathematical approaches to solve a vision problem, now that AI has made its foray” — this could be one of the most misleading thoughts of a Deep Learning practitioner, oblivious of traditional computer vision techniques. If you are one among, then here is an attempt to make you think again.
Many of the computer vision algorithms “running on the edge”, use traditional math, rather than compute and memory intensive neural nets. Consider, we have scanned images containing textual content. Lets keep AI aside and address classical problems in scanned images, viz. skew-ness, rotation and text inversion, deterministically. Feature matching & object search are also taken up using math-magic, to motivate you further.
The source code of all the below case studies can be found here.
Text Inversion
To identify inversion of text font from images is a daunting task. Inversion can happen when the document is scanned upside-down. The document can become inverted, even after rotation or skew correction, as rotation of 90+Θ is detected as 90-Θ, or -90-Θ as -90+Θ. Thus, text inversion is a common problem, but is challenging to recognize.
Lets see how to mathematically formulate inversion.
Method 1: Double Peaks
- Project the pixels on y-axis. Each line would result in a peak, in fact 2 peaks due to the shape of English character.
- Convolve with a Gaussian filter to smooth the noise.
- Calculate the fraction of peaks (lines) with sub-peaks on the right side.
Below is a figure to illustrate the approach
flip_detect.py - Hosted by GitHub
The above method will not work if the text is in CAPITAL letters or in some other language, as the “double peak logic” would likely falter.
Another numerical way to address the problem is to make use of the font shape, such as ‘Water Fill Technique’ [1] or to mathematically represent the character shape, as given below. We can describe any shape mathematically using shape context and log-bin histograms. Such attempts demonstrate “mathematical ingenuity” as an efficient solution to variety of problems.
Method 2: Shape Contexts using Log-Bin Histograms
- Find text bounding boxes from images using EAST. {below}
- Crop image inside bounding box and apply Canny edge detection.
- Take a dummy image with alphanumeric as base input. Find bounding boxes around each character in base input and image from step (b). Do steps {d}-{h} to find best correspondence between character pairs
- Randomly sample N points from edge elements of each character shape.
- Construct a new shape descriptor — shape context. The shape context at a point captures the distribution over relative positions of other shape points and thus summarizes global shape.
- Compare the log-polar histograms using Pearson’s chi-squared test or cosine distance.
- Find the numeral with minimum distance for each bounding box in base image. Sum up the cost values of each bounding box to find Sigma( Φ).
- Invert cropped image from step (b) and do steps {d}-{h} to compute Sigma( Φ’). Compare the Sigma values to know text inversion.
if Sigma( Φ) < Sigma( Φ’) then Input Image is Upright else Input Image is Flipped
text_inversion_sc.py Hosted by GitHub
EAST (An Efficient and Accurate Scene Text Detector)
The textual content inside an image can be localized using EAST algorithm.
deskew_EAST.py Hosted by GitHub
Here again, we can use a math-hack to localize text in an image, instead of using AI-based EAST algorithm. You can find consecutive local minima of y-projections of pixels to find consecutive trough that corresponds to line separation in an image. Once a line is found, you can run method 2, starting from step (b).
The above method would work irrespective of font case or language.
Skew Correction
Most of the scanned documents are skewed. Thus, it is required to de-skew the image before feeding an OCR or even to display.
Method 1: Iterative Projection
- Rotate the image from -10 degrees to +10 degrees.
- Compute projection of all pixels on y-axis.
- Calculate the pixel incidence density.
- Step the rotation angle by 0.5 angles and repeat steps 2, 3
- Find the angle Θ with maximum pixel incidence density.
The drawbacks of the above algorithm are:
- Iterative computation increases time complexity.
- Potential error of 0.5 degrees due to step size.
Mostly, scanned document would be of form format or tabular data containing lines or point spread of lines (lines can be disjoint in scanned image, due to lack of scan or print quality). Hence, the question boils down to “whether we can compute the line and Θ, given a point spread as input?”
Method 2: Hough Transform Peak
- Read the skewed image and do Canny Edge detection
- Hough Space = Call Hough_Transform (Edge Detected Image)
- Find the maxima in Hough space transform (accumulator matrix)
- Find Θ of the significant lines using tangent of slope.
- Calculate median of slopes, Θ’
- Rotate the image by Θ’
Hough Transform
- A straight line can be represented by the equation, ρ = Xcos ø + Ysin ø,
where (X, Y) denotes the point and (ø, ρ) represents the angle and distance of the normal from the origin. - If we consider (X, Y) as constants, then ρ is dependent on ø
- Thus, we will get a sinusoidal curve in the (ρ, ø) plane corresponding to point(X, Y).
- Corresponding to each pixel, draw the sine curves in the (ρ, c) plane.
- The sine curves corresponding to the pixels representing the edges of a line will intersect at a particular point in the (ρ, ø) plane.
- The point of intersection gives the constants (ρ, ø) using which we can draw the actual line in the (X, Y) plane. Here 0 < ø < 180.
- We can use this same procedure to detect high dimensional features.
- For a circle, a, b, r should be parameterized as hough space increasing the complexity to 3 dimensions.
The sine curve voting array in hough space can be visualized as below.
De-skew Source Code: deskew_hough.py Hosted by GitHub
De-skew Output:
Please note the accuracy at which skew angle is computed for the above image. In iterative approach, error of angle Θ could be[Step Size/2], not to mention the huge iterative computation cost.
Note: If the scanned document doesn’t contain point spread of lines in your case, then run the above PEAK algorithm to find bounding boxes around text. Angle Θ of horizontal edge of bounding box corresponds to skew-ness.
Rotation Classification
Rotation is a common problem in scanned images. The document can be rotated 90° or more, while being scanned.
It is possible to classify the images into 4 quadrants using a CNN model like below. But why to load NNs, when you can solve it arithmetically? Moreover, even after quadrant rotation, the image can remain skewed.
You can use the above skew correction code to find Θ and rotate. The only drawback is, rotation of 90+Θ could be detected as 90-Θ, and -90-Θ as -90+Θ. Hence, the image can get flipped, once you rotate!
Reason: if you rotate by 90-Θ instead of 90+Θ, then
(90-Θ) + (90+Θ) = 180°
Solution: To solve the above problem, just pass the de-skewed image to the text inversion code and flip it upright, if deemed necessary.
Homography
Let’s say you want to find an object (template) inside a bigger image with multiple objects. We can use Object detection models like SSD or YOLO with annotated Query Images to train different classes of objects to be found. But how do we use simple math to find and locate an object in a bigger image?
We can use homography to find point correspondences and transform the coordinates from one perspective to another. Homography is a transformation ( 3×3 matrix ) that maps the points in one image to the corresponding points in the other image.
These are the steps you can follow.
- Firstly, open the template image and the image to be matched.
- Find all features from both input images.
- Create an ORB keypoint detector which is less compute intensive than SIFT and SURF.
- Find the key points and their descriptors with the orb detector.
- Create matches of descriptors, then sort them based on distances.
- Use cv2.drawMatchesKnn to draw all the k best matches.
- Extract the matched keypoints from both images.
- Find homography matrix and do perspective transform
Source Code: homography.py Hosted by GitHub
Matching Output:
Object Search
Let’s say, you need to find an object from a set of images. You can use an AI model, as it is a classic case of image classification. But, can we use traditional math to do this? Here’s how…
- Read image of the object to search (Query Image)
- Do Canny edge detection and find bounding box around contour.
- Randomly sample ’n’ random points to describe the shape inside image.
- Iterate and get all images inside the input folder.
- Do steps 2 & 3 on each image.
- Compute the correlation value of random shape points of ‘Query Image’ with shape points of each image in the folder.
- Find the image with minimum correlation value. This image contains the nearest match of the object you are searching for.
Above equation conceptually formulates correlation as the similarity in deviation around mean. Thus, numerator signifies distribution similarity and denominator quantifies L2-norm for normalization.
searchImgObject.py Hosted by GitHub
Input Images and Query Object
Please note that a different car (purple) with similar shape has the second nearest match value, right after the red car. The correlation distance to other shapes are distinctively more. Thus you can see shape matching is functional.
Please note the correlation values will not be 0, even for same images, as random sampling of points is done to describe shapes. There are other ways to describe shapes without random sampling but time complexity of shape matching would become an order higher. One such method, known as Turning Function, is depicted below.
The source code of all the above case studies can be found here.
Conclusion
From the above case studies, you can see that the deterministic solutions can be more accurate and less compute intensive than iterative or AI-based solutions. While AI can outperform in many complex tasks, it is prudent to try out math-based solutions and keep AI as a last resort. You don’t need a sledgehammer to crack a nut. If the above read motivated you to appreciate the beauty of traditional techniques then it has served the purpose.
If you have any query or suggestion, you can reach me here