RUNTIME OPTIMIZATION AND LOAD SHEDDING IN MAVSTREAM: DESIGN AND IMPLEMENTATION
In data stream processing systems Quality of Service (or QoS) is extremely important. The system should try its best to meet the QoS requirements specified by a user. On account of this difference, unlike in a database management system, a query cannot be optimized once and executed. It has been shown that different scheduling strategies are useful in trading tuple latency requirements with memory and throughput requirements. In addition, data stream processing systems may experience significant fluctuations in input rates.
In order to meet the QoS requirements of data stream processing, a runtime optimizer equipped with several scheduling and load shedding strategies is critical. This entails monitoring of QoS measures at run-time to effect the processing of the queries to meet expected QoS requirements.
This thesis addresses runtime optimization issues for MavStream, a data stream management system (DSMS) being developed at UT Arlington. We have developed a runtime optimizer for matching the output (latency, memory, and throughput) of a continuous query (CQ) with its QoS requirements. Calibrations are made by monitoring the output and comparing it with the expected output characteristics. Alternative scheduling strategies are chosen as needed based on the runtime feedback. A decision table is used to choose a scheduling strategy based on the priorities of QoS requirements and their violation. The decision table approach allows us to add new scheduling strategies as well as compute the strategy to be used in an extensible manner. A master scheduler has been implemented to enable changing scheduling strategies in the middle of continuous query processing and optimize each query individually (that is, different queries can be executed using different schedulers).
In addition, to cope with situations where the arrival rates of input streams exceed the processing capacity of the system, we have incorporated load shedding component into the runtime optimizer as well. We have implemented shedders as part of the buffers to minimize the overhead for load shedding. We also choose load shedders that minimize the error introduced into the result as a result of dropping some tuples. Finally, load shedders are activated and deactivated by the runtime optimizer. Both random and semantic shedding of tuples is supported.
A large number of experiments have been conducted to test the runtime optimizer and observe the effect of different scheduling strategies and load shedding on the output of continuous queries.