A DATAFLOW APPROACH TO EFFICIENT CHANGE DETECTION OF HTML/XML DOCUMENTS IN WEBVIGIL
Data on the web is constantly increasing. Many a times, users are interested in specific changes to the data on the web. Currently, in order to detect changes of interest, users have to poll the pages periodically and check for the changes they are interested in. Efficient and effective change detection and notification is critical in many environments where a lot of resources are wasted in monitoring changes to the web manually. WebVigiL is a change monitoring system, which efficiently monitors changes to the page on behalf of the user and notifies the changes in a timely manner. It is a general-purpose, server based information monitoring and notification system.
This thesis investigates how active capability (ECA Rules) has been adapted for change monitoring. WebVigiL supports several types of changes such as keywords, phrases, links, images, and any change. A change detector, which facilitates monitoring primitive (above types) and composite (combinations of above types) changes to HTML/XML pages has been designed and implemented. Algorithms for detecting composite changes are discussed. Grouping techniques for efficient change detection are also discussed.