Complexity Bounds for Relational Algebra over Text Extractors

29Jan
12:00 29th January 2019 ( Hilary Term 2019 )
051

Information Extraction (IE) is the task of extracting structured information from textual data. We explore a programming paradigm that is supported by several IE systems where relations extracted by atomic extractors undergo a relational manipulation. In our efforts toward achieving a better understanding of IE systems, we study the computational complexity of queries in the framework of document spanners (spanners, for short) that was introduced by Fagin et al. A spanner is a function that extracts from a document (string) a relation over text intervals, called spans, using either atomic extractors or a relational algebra query on top of these extractors. Evaluating a spanner on a document is a computational problem that involves three components: the atomic extractors, the relational structure of the query, and the document. We investigate the complexity of this problem from various angles, each keeping some components fixed and regarding the rest as input.

Speaker bio

Liat Peterfreund is a PhD candidate in the Computer Science Department in the Technion. Her research is done under the supervision of Prof. Benny Kimelfeld and focuses on establishing the foundations of incorporating Information Extraction in Database Queries.