Abstract:
Monitoring student engagement in a class- room environment has become an area of active re- search in computer vision, affective computing and educational technology. Since the last decade ap- proaches have evolved from specialized hardware- based tools to lightweight and real-time pipelines run- ning on RGB video. In this paper, we review twenty- two studies from 2013 to 2026 that cover different aspects of automated classroom engagement moni- toring, including motion and depth-based attention assessment, skeleton-based action recognition, facial expression analysis, multi-object tracking, behaviour detection using YOLO-family architectures, multi- modal fusion of behaviour and emotion cues, dataset building and deployment readiness. Each study is an- alyzed according to modality, detection method, data set and limitations. We divide the literature into six streams: early sensor-based systems, pose and skele- ton approaches, facial expression approaches, be- haviour detection models, multi-modal fusion frame- works, datasets and benchmarks. A comparative ta- ble summarizes the important characteristics of all reviewed works. We identify some research gaps as being persistent -india-based emotion data is not available at all for the student level, longitudinal per- student tracking is not done as a whole, there is lim- ited on-premise deployment and there is no integra- tion of teacher-facing analytics -which are essential for future research in the field.
Keywords: