IMPROVEMENTS TO CHANGE-DETECTION AND FETCHING TO HANDLE MULTIPLE URLs IN WEBVIGIL
The burgeoning data on the Web makes it difficult for one to keep track of the changes that constantly occur to specific information of interest. The interest of users has extended from mere retrieval of information of interest, to keeping track of how that information changes. Currently, the most widespread way of detecting changes occurring to web-content is to manually retrieve the pages of interest and check them for changes. This mode of action not only wastes useful resources, but is also likely to present information that may not be relevant to the given context. Hence, an effective profile-based, selective content-oriented change detection is required. WebVigiL is a general-purpose, active capability-based selective intelligent information monitoring and notification system. It handles specification, management, and propagation of customized changes as requested by a user in the best way possible.
When the user specifies a page (as a URL), it may contain multiple frames instead of a single content page. This entails additional pages to be fetched for detecting changes. This thesis addresses change detection to pages containing frames and generalizes it in terms of depth. It also addresses the case of monitoring multiple web pages for different change types with a single request. In addition, when a user leaves the task of determining the change frequency to the system, the system is responsible for minimizing the number of fetches. Balancing number of fetches without losing too many changes is another problem addressed in this thesis. This thesis presents an intelligent adaptive-fetching mechanism to fetch web pages based on the change history of those web pages. Finally, the efficient usage of resources plays an even more important role when the request is to monitor multiple URLs (using sentinel operators). It investigates the efficient grouping of change detection requests with similar characteristics.