Skip to content. Skip to main navigation.
Automating video stream processing for inferring situations of interest has been an ongoing challenge. This problem is currently exacerbated by the volume of surveillance/monitoring videos generated. Currently, manual or context-based customized techniques are used for this purpose. To the best to our knowledge the attempted work in this area use a custom query language to extract data and infer simple situations from the video streams, thus adding an additional overhead to learn their query language. Objective of the work in this thesis is to develop a framework that extracts data from video streams generating a data representation such that simple "what-if" kind of situations can be evaluated from the extracted data by posing queries in a Non-procedural manner (SQL or CQL without using any custom query languages).
This thesis proposes ways to pre-process videos so as to extract the needed information from each frame. It elaborates on algorithms and experimental results for extracting objects, their features (location, bounding box, and feature vectors), and their identification across frames, along with converting all that information into an expressive data model. Pre-processing of video streams to extract queryable representation involves parameters and techniques that are context-based, that is, dependent on the type of video streams and the type of objects present in them. And lot of tuning of values or experiments is essential to choose right techniques or right values for these parameters. This thesis additionally, proposes starting values for such tuning or experiments, in order to reach the right values.