FROM PIXELS TO IDENTITY : VISUAL RECOGNITION AND BIOMETRIC APPLICATIONS
Automated visual recognition has undergone a transformative evolution, advancing from handcrafted feature extraction to deep learning-driven systems that now permeate modern security, social, and personal computing platforms. Within this rapidly evolving landscape, face and body recognition have emerged as critical tasks—driven by their non-contact nature, scalability, and growing presence in real-world applications. However, achieving robust and generalizable performance in unconstrained settings continues to pose significant challenges, including image degradation, pose misalignment, limited training data, and the complexities of multimodal recognition.This thesis investigates these challenges through the lens of biometric recognition, leveraging the transformative potential of deep learning and generative artificial intelligence to address both algorithmic and data-centric limitations. It introduces six major contributions. AdaFace proposes an adaptive margin loss that prioritizes learning from high-quality samples, improving performance in low-quality image conditions. CAFace targets video-based recognition with an attention-based feature aggregation framework optimized for temporal redundancy and long-duration sequences. DCFace pioneers synthetic dataset generation using a dual-condition diffusion model, enabling ethical, diverse, and scalable data creation for face recognition. KPRPE introduces a keypoint-aware positional encoding scheme that enhances robustness to misalignment and geometric variation. SapiensID unifies face and full-body recognition via a multi-resolution transformer trained on the large-scale, multimodal WebBody4M dataset.Building upon these advances, the thesis concludes with a contribution aimed at real-world deployment: an efficient unified backbone for human recognition. This architecture introduces Keypoint-based Token Fusion (KP-ToFu) and Keypoint Absolute Position Encoding (KP-APE) to reduce computational cost while preserving spatial fidelity and identity-relevant detail. The result is a model that achieves good performance with significantly lower FLOPs, making unified recognition systems viable for resource-constrained applications.Together, these contributions form a comprehensive exploration of visual recognition in the deep learning era, highlighting how adaptive loss design, synthetic data generation, positional encoding, and architectural innovations can collectively address longstanding challenges. This thesis lays the foundation for the next generation of intelligent biometric systems—systems that are robust and explainable for deployment in complex, real-world environments.
Read
- In Collections
-
Electronic Theses & Dissertations
- Copyright Status
- In Copyright
- Material Type
-
Theses
- Authors
-
Kim, Minchul
- Thesis Advisors
-
Liu, Xiaoming
- Committee Members
-
Jain, Anil
Ross, Arun
Kong, Yu
Morris, Daniel
- Date Published
-
2025
- Subjects
-
Computer science
- Program of Study
-
Computer Science - Doctor of Philosophy
- Degree Level
-
Doctoral
- Language
-
English
- Pages
- 202 pages
- Permalink
- https://doi.org/doi:10.25335/794b-t960