Classification of human activity in raw video presents a challenging problem that remains unsolved, and is of great interest for large datasets. Though there have been several attempts at applying image processing techniques to video to recognize human activity in controlled video segments, few have attained a significant degree of success in raw videos.
Raw video classification exhibits significant challenges that can be addressed through the use of geometric information. Current techniques employ a combination of temporal information of the feature space or a combination of Convolutional and Recurrent Neural Networks (CNN and RNNs). CNNs are used for frame feature extraction and RNNs are then applied for motion vector extraction and classification. These techniques, which utilize information from the entirety of a frame, attempt to classify action based on all motion vectors and all objects found in the video. Such methods are cumbersome, often difficult to train, and do not generalize well beyond the dataset used.
This thesis explores the use of color based object detection in conjunction with contextualization of object interaction to isolate motion vectors specific to an activity sought within uncropped video. Feature extraction in this thesis differs significantly from other methods by using geometric relationships between objects to infer context. The approach avoids the need for video cropping or substantial preprocessing by significantly reducing the number of features analyzed in a single frame. The method was tested using 43 uncropped video clips with 620 video frames for writing, 1050 for typing, and 1755 frames for talking. Using simple KNN classification, the method gave accuracies of 72.6% for writing, 71% for typing and 84.6% for talking. Classification accuracy improved to 92.5% (writing), 82.5% (typing) and 99.7% (talking) with the use of a trained Deep Neural Network.
human activity classification; context-based methods
This material is based upon work supported by the National Science Foundation under Grant No. 1613637 and NSF AWD CNS-1422031
Level of Degree
Electrical and Computer Engineering
First Committee Member (Chair)
Second Committee Member
Third Committee Member
Jacoby, Abigail R.. "Context-Sensitive Human Activity Classification in Video Utilizing Object Recognition and Motion Estimation." (2017). https://digitalrepository.unm.edu/ece_etds/404