I recently developed a facial recognition and clustering tool designed to help a client generate critical statistics on gender representation in media, including linear TV and online streaming services. This project was a complete full-stack effort, combining deep learning with scalable cloud architecture.


Phase 1: Building the Core Detection and Clustering Engine

The initial challenge was creating a reliable system to analyse video content. This required a robust backend capable of handling video processing, face detection, and unsupervised clustering.

Backend Technical Stack

The core back-end system was developed entirely in Python, leveraging a powerful set of libraries to handle intensive computational tasks:

  • Face Detection & Recognition: I utilised a deep learning Convolutional Neural Network (CNN) model to accurately detect faces in video frames. Detected faces were then clustered using algorithms from scikit-learn (sklearn) based on facial embeddings generated by the recognition library.
  • Video Processing: OpenCV and pafy were used to manage the extraction of frames from video sources.
  • Performance Optimisation: Numpy and Pandas were used for high-speed data manipulation, while the entire process leveraged multiprocessing to handle the high computational load efficiently. The API itself was built using Flask.

This system outputs distinct groups of faces, allowing the user to scroll through these automatically clustered groups and tag each cluster as "Female", "Male", or "Unknown".

Statistical Output

Once all face groupings are tagged, the tool generates detailed statistics and graphs, summarising the percentage screen time and showing precisely when men and women were on screen throughout the analysed video.


Phase 2: Scalability and Online Deployment with AWS

Following the successful deployment of the initial locally installed version—which garnered real-world attention, including features on The Influential Women Podcast and Times Radio—the client needed a solution with unlimited scale and accessibility.

I subsequently developed an online version of the tool, allowing users to detect faces in any public YouTube video, classify gender, and generate representation statistics instantly.

AWS Serverless Architecture

To ensure the tool is highly scalable and cost-effective, I deployed it using a range of AWS services. This serverless architecture handles the variable load that comes with processing video content on demand.

Frontend User Experience

The front-end user interface was developed using ReactJS, providing a fast, responsive, and intuitive experience for users to input video URLs, manage clusters, and view the final analytical output.

This project showcases my ability to move a complex, deep-learning-based application from a custom local installation to a highly scalable, production-ready cloud service.


Try the Tool

You can try the online version of the tool and see the results for yourself: