Approaches to Improve the Performance of Storage and Processing-Subsystems in WebVigiL
World Wide Web has gained lot of prominence with respect to retrieving information and data delivery. Users are interested in monitoring selective changes to contents than merely surfing on the web. Selective content monitoring of the web requires an effective monitoring system for change detection and notification based on user requirements. WebVigiL is a profile-based system that monitors, retrieves, and detects specific changes to HTML and XML pages on the web and notifies users in a timely manner.
The first prototype concentrated on the functionality of the WebVigiL system. This thesis investigates improvements to the performance and reliability aspects of the first prototype in a number of ways. First, a novel object reuse strategy (similar to a buffer manager) has been proposed to reduce the number of times an object is retrieved from disk and converted into an in-memory object for change detection. Second, a diff-based version manager has been implemented to reduce the disk storage growth thereby improving the scalability of the system. Third, as the WebVigiL system is prone to system failures, recovery has been added to make sure that the WebVigiL server can recover gracefully after a failure and continue to monitor sentinels. Of course, no monitoring is possible during its downtime. However, previously defined sentinels will continue to be monitored after recovery as if the system never went down. Intelligent fetching of monitored pages is currently done at the WebVigiL mediator. As part of this thesis, a server-side (at the web site where the page resides) fetch has been introduced and its effectiveness has been analyzed with respect to the data transfer and the number of fetches as compared to the mediator-side fetch.