INFOFILTER: COMPLEX PATTERN SPECIFICATION AND DETECTION OVER TEXT STREAMS
Information filtering deals with monitoring text streams to detect patterns that are more complex than those handled by search engines. Text stream monitoring and pattern detection have far reaching applications such as tracking information flow among terrorist outfits, web parental control, continuous monitoring of rival web sites in e-commerce, and so forth. InfoFilter, a content-based information filtering system presented in this thesis, detects complex patterns in text streams that include but are not limited to news feed, email, web pages and caption text from streaming videos. Pattern characterization requirements of many applications entail an expressive language for specifying patterns than what is currently provided by Information Retrieval Query Languages (IRQLs). In essence, pattern specification and detection play a major role in information filtering. In this thesis, we describe InfoFilter, which allows users to specify complex patterns such as sequential or structural patterns, wild cards, word frequencies, proximity, Boolean operators and synonyms using the proposed Pattern Specification Language Psnoop and to detect these patterns efficiently using the data flow paradigm over Pattern Detection Graphs (PDGs).