(1 - 1 of 1)
- Face search and clustering at scale
- Otto, Charles
- Electronic Theses & Dissertations
There has been a great deal of progress on the problem of unconstrained face recognition in recent years, particularly with the emergence of deep learning methods for generating face representations. Developing robust face representations is not only of great practical interest for classic biometric problems like face verification, but also for emerging large scale face search problems. In both social media, and law-enforcement applications, an extremely large volume of face data is becoming...
Show moreThere has been a great deal of progress on the problem of unconstrained face recognition in recent years, particularly with the emergence of deep learning methods for generating face representations. Developing robust face representations is not only of great practical interest for classic biometric problems like face verification, but also for emerging large scale face search problems. In both social media, and law-enforcement applications, an extremely large volume of face data is becoming commonplace. One emerging problem related to this high volume of data is to devise methods to search for persons of interest among the billions of shared photos on these websites. Another challenge is to group or cluster large collections of unlabeled face images by identity, for example as a prelude to manual examination of a collection of images in law-enforcement applications.Regarding the face search problem, we propose a face search system which combines a fast search procedure, coupled with a state-of-the-art commercial off the shelf (COTS) matcher, in a cascaded framework. Given a probe face, we first filter the large gallery of photos to find the top-k most similar faces using features learned by a convolutional neural network. The k retrieved candidates are re-ranked by combining similarities based on deep features and those output by the COTS matcher. We evaluate the proposed face search system on a gallery containing 80 million web-downloaded face images. Experimental results demonstrate that while the deep features perform worse than the COTS matcher on a mugshot dataset (93.7% vs. 98.6% TAR@FAR of 0.01%), fusing the deep features with the COTS matcher improves the overall performance (99.5% TAR@FAR of 0.01%). This shows that the learned deep features provide complementary information over representations used in state-of-the-art face matchers. On the unconstrained face image benchmarks, the performance of the learned deep features is competitive with reported accuracies. LFW database: 98.20% accuracy under the standard protocol and 88.03% TAR@FAR of 0.1% under the BLUFR protocol; IJB-A benchmark: 51.0% TAR@FAR of 0.1% (verification), rank 1 retrieval of 82.2% (closed-set search), 61.5% FNIR@FPIR of 1% (open-set search). The proposed face search system offers an excellent trade-off between accuracy and scalability on galleries with millions of images. Additionally, in a face search experiment involving photos of the Tsarnaev brothers, convicted of the Boston Marathon bombing, the proposed cascade face search system could find the younger brother's (Dzhokhar Tsarnaev) photo at rank 1 in 1 second on a 5M gallery and at rank 8 in 7 seconds on an 80M gallery, using a Intel Xeon processor clocked at 3.1 GHz.In terms of clustering, in social media, law enforcement, and other applications the number of unlabeled faces can be of the order of hundreds of million, while the number of identities (clusters) can range from a few thousand to millions. To address the challenges of run-time complexity and cluster quality, we present an approximate Rank-Order clustering algorithm that performs better than popular clustering algorithms (k-Means and Spectral). Our experiments include clustering up to 123 million face images into over 10 million clusters. Clustering results are analyzed in terms of external (known face labels) and internal (unknown face labels) cluster quality measures, and run-time. Our clustering algorithm achieves an F-measure of 0.87 on the LFW benchmark (13K faces of 5,749 individuals), which drops to 0.27 on the largest dataset considered (13K faces in LFW + 123M distractor images). Additionally, we show that frames in the YouTube benchmark can be clustered with an F-measure of 0.71. An internal per-cluster quality measure is developed to rank individual clusters for manual exploration of high quality clusters that are compact and isolated.Finally, we further develop our face representation, leveraging more advanced network architectures, and an order of magnitude more training data--attaining a verification rate of 92.22% at 0.1% FAR on the BLUFR verification protocol, using a single model.