Face recognition is a classical problem in Computer Vision that has experienced significant progress recently. Yet, face recognition in videos remains challenging. In digital videos, face recognition is complicated by occlusion, pose and lighting variations, and persons entering and leaving the scene. The goal of the thesis is to develop a fast method for face recognition in digital videos that is applicable to large datasets. Instead of the standard video-based methods that are tested on short videos, the goal of the approach is to be applicable to long educational videos of several minutes to hours, with the ultimate goal of testing over a thousand hours of videos.
The thesis introduces several methods to address the problems associated with video face recognition. First, to address issues associated with pose and lighting variations, a collection of face prototypes is associated with each student. Second, to speed up the process, sampling, K-means Clustering, and a combination of both are used to reduce the number of face prototypes associated with each student. Third, to further speed up the method, the videos are processed at different frame rates. Fourth, the thesis proposes the use of active sets to address occlusion and also to eliminate the need to apply face recognition on video frames with slow face motions. Fifth, the thesis develops a group face detector that recognizes students within a collaborative learning group, while rejecting out-of-group face detections. Sixth, the thesis introduces a face DeID for protecting the identities of the students. Seventh, the thesis uses data augmentation to increase the size of the training set. The different methods are combined using multi-objective optimization to guarantee that the full method remains fast without sacrificing accuracy.
To test the approach, the thesis develops the AOLME dataset that consists of 138 student faces with 81 boys and 57 girls of ages 10 to 14, which were predominantly Latina/o students. The video dataset consists of 3 Cohorts, 3 Levels from two schools (Urban and Rural) throughout the course of 3 years. Each Cohort and Level contain multiple sessions and an average of 5 small groups of 4 students per school. Each session has from 4 to 9 videos that average 20 minutes each. The thesis trained on different video clips for recognizing 32 different students from both schools. The training and validation datasets consisted of 22 different sessions, whereas the test set contained videos from seven other sessions. Different sessions were used for training, validation, and testing. The video face recognition was tested on 13 video clips extracted from different groups, with a duration that ranges from 10 seconds to 10 minutes. Compared to the baseline method, the final optimized method resulted in very fast recognition times with significant improvements in face recognition accuracy. Using face prototype sampling only, the proposed method achieved an accuracy of 71.8\% compared to 62.3\% for the baseline system, while running 11.6 times faster.
Face recognition, activity detection, Human front-face detection and recognition, video analysis
Level of Degree
Electrical and Computer Engineering
First Committee Member (Chair)
Second Committee Member
Third Committee Member
Tran, Phuong. "Fast Video-based Face Recognition in Collaborative Learning Environments." (2021). https://digitalrepository.unm.edu/ece_etds/565