OpenCV Face Recognition

In this tutorial, you will learn how to use OpenCV to perform face recognition. To build our face recognition system, we’ll first perform face detection, extract face embeddings from each face using deep learning, train a face recognition model on the embeddings, and then finally recognize faces in both images and video streams with OpenCV.

Having a dataset with faces of various individuals is crucial for building effective face recognition systems. It allows the model to learn diverse facial features, leading to an improved ability to recognize different individuals.

Roboflow has free tools for each stage of the computer vision pipeline that will streamline your workflows and supercharge your productivity.

Sign up or Log in to your Roboflow account to access state of the art dataset libaries and revolutionize your computer vision pipeline.

You can start by choosing your own datasets or using our PyimageSearch’s assorted library of useful datasets.

Bring data in any of 40+ formats to Roboflow, train using any state-of-the-art model architectures, deploy across multiple platforms (API, NVIDIA, browser, iOS, etc), and connect to applications or 3rd party tools.

This tutorial will use OpenCV to perform face recognition on a dataset of our faces.

You can swap in your own dataset of faces of course! All you need to do is follow my directory structure in insert your own face images.

As a bonus, I’ve also included how to label “unknown” faces that cannot be classified with sufficient confidence.

To learn how to perform OpenCV face recognition, just keep reading!

Looking for the source code to this post?

OpenCV Face Recognition

In today’s tutorial, you will learn how to perform face recognition using the OpenCV library.

You might be wondering how this tutorial is different from the one I wrote a few months back on face recognition with dlib?

Well, keep in mind that the dlib face recognition post relied on two important external libraries:

  1. dlib (obviously)
  2. face_recognition (which is an easy to use set of face recognition utilities that wraps around dlib)

While we used OpenCV to facilitate face recognition, OpenCV itself was not responsible for identifying faces.

In today’s tutorial, we’ll learn how we can apply deep learning and OpenCV together (with no other libraries other than scikit-learn) to:

  1. Detect faces
  2. Compute 128-d face embeddings to quantify a face
  3. Train a Support Vector Machine (SVM) on top of the embeddings
  4. Recognize faces in images and video streams

All of these tasks will be accomplished with OpenCV, enabling us to obtain a “pure” OpenCV face recognition pipeline.

How OpenCV’s face recognition works

In order to build our OpenCV face recognition pipeline, we’ll be applying deep learning in two key steps:

  1. To apply face detection, which detects the presence and location of a face in an image, but does not identify it
  2. To extract the 128-d feature vectors (called “embeddings”) that quantify each face in an image

I’ve discussed how OpenCV’s face detection works previously, so please refer to it if you have not detected faces before.

The model responsible for actually quantifying each face in an image is from the OpenFace project, a Python and Torch implementation of face recognition with deep learning. This implementation comes from Schroff et al.’s 2015 CVPR publication, FaceNet: A Unified Embedding for Face Recognition and Clustering.

Reviewing the entire FaceNet implementation is outside the scope of this tutorial, but the gist of the pipeline can be seen in Figure 1 above.

First, we input an image or video frame to our face recognition pipeline. Given the input image, we apply face detection to detect the location of a face in the image.

Face alignment, as the name suggests, is the process of (1) identifying the geometric structure of the faces and (2) attempting to obtain a canonical alignment of the face based on translation, rotation, and scale.

While optional, face alignment has been demonstrated to increase face recognition accuracy in some pipelines.

After we’ve (optionally) applied face alignment and cropping, we pass the input face through our deep neural network:

The FaceNet deep learning model computes a 128-d embedding that quantifies the face itself.

But how does the network actually compute the face embedding?

The answer lies in the training process itself, including:

  1. The input data to the network
  2. The triplet loss function

To train a face recognition model with deep learning, each input batch of data includes three images:

The anchor is our current face and has identity A.

The second image is our positive image — this image also contains a face of person A.

The negative image, on the other hand, does not have the same identity, and could belong to person B, C, or even Y!

The point is that the anchor and positive image both belong to the same person/face while the negative image does not contain the same face.

The neural network computes the 128-d embeddings for each face and then tweaks the weights of the network (via the triplet loss function) such that:

  1. The 128-d embeddings of the anchor and positive image lie closer together
  2. While at the same time, pushing the embeddings for the negative image father away

In this manner, the network is able to learn to quantify faces and return highly robust and discriminating embeddings suitable for face recognition.

And furthermore, we can actually reuse the OpenFace model for our own applications without having to explicitly train it!

Even though the deep learning model we’re using today has (very likely) never seen the faces we’re about to pass through it, the model will still be able to compute embeddings for each face — ideally, these face embeddings will be sufficiently different such that we can train a “standard” machine learning classifier (SVM, SGD classifier, Random Forest, etc.) on top of the face embeddings, and therefore obtain our OpenCV face recognition pipeline.

If you are interested in learning more about the details surrounding triplet loss and how it can be used to train a face embedding model, be sure to refer to my previous blog post as well as the Schroff et al. publication.

Our face recognition dataset

The dataset we are using today contains three people:

Each class contains a total of six images.

If you are building your own face recognition dataset, ideally, I would suggest having 10-20 images per person you wish to recognize — be sure to refer to the “Drawbacks, limitations, and how to obtain higher face recognition accuracy” section of this blog post for more details.

Project structure

Once you’ve grabbed the zip from the “Downloads” section of this post, go ahead and unzip the archive and navigate into the directory.

From there, you may use the tree command to have the directory structure printed in your terminal:

$ tree --dirsfirst . ├── dataset │ ├── adrian [6 images] │ ├── trisha [6 images] │ └── unknown [6 images] ├── images │ ├── adrian.jpg │ ├── patrick_bateman.jpg │ └── trisha_adrian.jpg ├── face_detection_model │ ├── deploy.prototxt │ └── res10_300x300_ssd_iter_140000.caffemodel ├── output │ ├── embeddings.pickle │ ├── le.pickle │ └── recognizer.pickle ├── extract_embeddings.py ├── openface_nn4.small2.v1.t7 ├── train_model.py ├── recognize.py └── recognize_video.py 7 directories, 31 files

There are quite a few moving parts for this project — take the time now to carefully read this section so you become familiar with all the files in today’s project.

Our project has four directories in the root folder: