Optimally Summarizing Data by Small Fact Sets for Concise Answers to Voice Queries
Our goal is to find combinations of facts that optimally summarize data sets. We consider this problem in the context of voice query interfaces for simple, exploratory data analysis. Here, the system answers voice queries with a short summary of relevant data. Finding optimal voice data summaries is computationally expensive. Prior work in this domain has exploited sampling and incremental processing. Instead, we rely on a pre-processing stage generating summaries of data subsets in a batch operation. This step reduces run time overheads by orders of magnitude. We present multiple algorithms for the pre-processing stage, realizing different tradeoffs between optimality and data processing overheads. We analyze our algorithms formally and compare them experimentally with prior methods for generating voice data summaries. We report on multiple user studies with a prototype system implementing our approach. Furthermore, we report on insights gained from a public deployment of our system on the Google Assistant Platform.
READ FULL TEXT