Fun With Facial Recognition

Dominick Caponi
6 min readFeb 5, 2020

--

A gentle introduction to what it is and how it works

If you’ve ever wanted to put together a neat project to show your friends or coworkers but find all the math behind facial recognition unapproachable, then hopefully this guide will serve as a primer to some basic ideas behind facial detection and recognition and give you a solid jumping off point for your projects

Topics to Cover

  1. Facial Detection — what constitutes a face and how do we identify that numerically
  2. Facial Recognition — Ok we found some faces, but who do they belong to
  3. Demo Setup — Diving in with a simple example

Facial Detection

In this demo, we’ll use OpenCV. OpenCV is a free to use computer vision library primarily used in C++ or Python. One of the simpler facial detection modes it offers leverages the Haar Cascading Classifier.

Haar Features source

In the Haar Classifier, each of these squares represents a 3x3 matrix of pixels where white is, for instance, 0 and black is 5. The square is placed on top of an image from a camera feed and typically scans each pixel, multiplying this 3x3 with the 3x3 chunk of the image to identify whether or not that chunk of the image is closely related to any of these 4 features. The idea here is that these features approximate a face like so.

Haar Features Mapping to a Face source

Since most of the image is not a face, it helps to discard swaths of the image that are identified as “not-face” early on. The cascading part of the cascading classifier divides the image into 24x24 pixel squares and applies the feature matrices to each square. Once it determines after the first few features that the square contains no face, it discards the square and moves on.

When the OpenCV Haar classifier is trained, it was given thousands of images of faces and no-faces. After applying the Haar features in a sliding window technique, it uses Adaptive Boosting or AdaBoost to determine which feature combinations best represent a face.

AdaBoost from 10,000 feet

In adaptive boosting, each face is represented as a row of data. The columns are represented, simply put, as the output of the matrix multiplication of each chunk of the image and those 4 filters. Each feature is represented as a decision tree with one node (or stump if you will) where the decision to be made is whether or not the face was greater than or less than a certain cutoff for the closeness to the given Haar feature.

As face/not-face is predicted on all training data, the stumps/features that are correct more often are weighed more heavily in the next prediction until each feature has a weight representing how good it is at detecting a face. For example, features 2 and 3 in the above features image may be weighted more heavily than features 1 and 4.

When we use the Haar face classifier, we’re rapidly applying all those features to our image feed and using all the weighted decision stumps from training, it is able to classify whether or not our feed contains a face.

Facial Recognition

Now that we’ve identified a face from the feed, we need a way to figure out who it belongs to. Normally, this is done by labeling the face, taking a bunch of pictures of the face, and computing something called the Local Binary Pattern Histogram (LBPH for short).

Example of the LBPH source

A sliding window is placed over the area identified to be a face by our Haar Cascade Classifier and we convert each pixel into a white or black pixel based on how different they are from the central pixel in a 3x3 filter.

Process for computing the pixel value around a central pixel source

Repeating this process results in a histogram for each 3x3 chunk, and a final concatenation of all those chunks for a final “signature”

Also the image looks all weird

Result of a LBPH pixel calculation

With a trained recognition algorithm, we’re able to compare incoming video feeds, which get turned into their own histograms, to the histograms we know about from our training. The distance off, is measured using the Euclidian distance formula and the closest match between the input image and the trained images is identified appropriately.

Euclidean Distance Formula

Implementing For Yourself

Lastly, this step will walk through what you can do to get set up yourself. This assumes you have OpenCV installed on your machine and your environment configured to use Python 3. You’ll need to install jupyter if you haven’t already. The jupyter notebook is available here and you can run it by navigating to the project directory and running jupyter notebook Once in your browser at localhost:8888, navigate to the Facial Recognition Demo notebook.

This notebook is broken down into 4 main phases. The setup phase where we import the requisite packages and initialize the camera configuration, the face gathering phase where we take 30 snapshots of the face for training the LBPH recognizer, the training of the LBPH recognizer, and finally using it to recognize the faces. The first part is explanatory so I wont spend any time on it.

The second part is where it gets interesting. You’ll see this lineface_cascade = cv2.CascadeClassifier('./classifiers/haar_frontal_face.xml') here which is where we set up our Cascade classifier to use the Haar facial features. There are other features you can use to detect different types of objects using cascade classifiers, but for now let’s focus on faces. Cascade classifier is just the part where we disregard areas of the image that we know contain no faces early on and focus on testing more Haar features on areas that are more likely to contain a face. Once we initialize our cascade classifier, we ask for a name to set up the face id, name mapping to make things more user friendly.

My face

The image capture process is pretty common in the code here (I know its repetitious, I meant for the cells to be runnable by themselves so I could use the code in later projects) we open the camera, convert the feed to grayscale, apply the facial recognizer, do stuff with the faces it detected, then close the camera.

My face in b&w

The next block trains the classifier by opening each face image saved in your datasets folder and builds the histograms. It then saves the histogram for each face as a row of training data so it can compare incoming feeds later.

My face as a histogram

Finally, the recognition block. You’ll may want to restart your notebook as mine kept holding onto the camera reference and forced me to restart. You can also run this block in a separate .py file in the same project directory. This block opens the camera, converts the feed to grayscale, detects the face, builds a histogram of that face and then draws a block around the region it thinks is a face, and if it’s your face, it’ll write your name on it.

Bringing it Together

Hopefully this intro to facial recognition was a fun and approachable way to understanding facial recognition and serves as a solid jumping off point for building your own facial recognition apps or diving into deeper aspects of ML and CV like CNNs and Adaptive Boosting. Hit me up on Twitter or LinkedIn if you have questions or comments. I don’t claim to be any sort of expert in this so if you have suggestions, I’m wide open. Thanks for reading!

--

--

Dominick Caponi
Dominick Caponi

Written by Dominick Caponi

Software Engineer | Product Leader

No responses yet