ABSTRACT
Tables are pervasive on the Web. Informative web tables range across a large variety of topics, which can naturally serve as a significant resource to satisfy user information needs. Driven by such observations, in this paper, we investigate an important yet largely under-addressed problem: Given millions of tables, how to precisely retrieve table cells to answer a user question. This work proposes a novel table cell search framework to attack this problem. We first formulate the concept of a relational chain which connects two cells in a table and represents the semantic relation between them. With the help of search engine snippets, our framework generates a set of relational chains pointing to potentially correct answer cells. We further employ deep neural networks to conduct more fine-grained inference on which relational chains best match the input question and finally extract the corresponding answer cells. Based on millions of tables crawled from the Web, we evaluate our framework in the open-domain question answering (QA) setting, using both the well-known WebQuestions dataset and user queries mined from Bing search engine logs. On WebQuestions, our framework is comparable to state-of-the-art QA systems based on knowledge bases (KBs), while on Bing queries, it outperforms other systems with a 56.7% relative gain. Moreover, when combined with results from our framework, KB-based QA performance can obtain a relative improvement of 28.1% to 66.7%, demonstrating that web tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs.
- Freebase wiki. http://wiki.freebase.com/wiki/Wikipedia.Google Scholar
- M. D. Adelfio and H. Samet. Schema extraction for tabular data on the web. VLDB, 6(6):421--432, 2013. Google ScholarDigital Library
- I. Androutsopoulos, G. D. Ritchie, and P. Thanisch. Natural language interfaces to databases--an introduction. Natural language engineering, 1(01):29--81, 1995. Google ScholarCross Ref
- S. Balakrishnan, A. Y. Halevy, B. Harb, H. Lee, J. Madhavan, A. Rostamizadeh, W. Shen, K. Wilder, F. Wu, and C. Yu. Applying webtables in practice. In CIDR, 2015.Google Scholar
- J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on freebase from question-answer pairs. In EMNLP, pages 1533--1544, 2013.Google Scholar
- J. Berant and P. Liang. Semantic parsing via paraphrasing. In ACL, pages 1415--1425, 2014. Google ScholarCross Ref
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250. ACM, 2008. Google ScholarDigital Library
- E. Brill, S. Dumais, and M. Banko. An analysis of the AskMSR question-answering system. In Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10, pages 257--264. ACL, 2002. Google ScholarDigital Library
- C. J. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Learning, 11:23--581, 2010.Google Scholar
- C. J. Burges, K. M. Svore, P. N. Bennett, A. Pastusiak, and Q. Wu. Learning to rank using an ensemble of lambda-gradient models. In Yahoo! Learning to Rank Challenge, pages 25--35, 2011.Google Scholar
- M. J. Cafarella, A. Halevy, D. Z. Wang, E. Wu, and Y. Zhang. Webtables: exploring the power of tables on the web. VLDB, 1(1):538--549, 2008. Google ScholarDigital Library
- M. J. Cafarella, A. Y. Halevy, Y. Zhang, D. Z. Wang, and E. Wu. Uncovering the relational web. In WebDB. Citeseer, 2008.Google Scholar
- J. Chu-Carroll, J. Prager, C. Welty, K. Czuba, and D. Ferrucci. A multi-strategy and multi-source approach to question answering. Technical report, DTIC Document, 2006.Google Scholar
- A. Das Sarma, L. Fang, N. Gupta, A. Halevy, H. Lee, F. Wu, R. Xin, and C. Yu. Finding related tables. In SIGMOD, pages 817--828. ACM, 2012. Google ScholarDigital Library
- X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, pages 601--610. ACM, 2014. Google ScholarDigital Library
- A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. In SIGKDD. ACM, 2014. Google ScholarDigital Library
- A. Fader, L. S. Zettlemoyer, and O. Etzioni. Paraphrase-driven learning for open question answering. In ACL, pages 1608--1618, 2013.Google Scholar
- D. Ferrucci, E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, et al. Building watson: An overview of the deepqa project. AI magazine, 31(3):59--79, 2010.Google ScholarDigital Library
- J. H. Friedman. Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189--1232, 2001. Google ScholarCross Ref
- J. Gao, P. Pantel, M. Gamon, X. He, L. Deng, and Y. Shen. Modeling interestingness with deep neural networks. In EMNLP, 2014. Google ScholarCross Ref
- B. Hu, Z. Lu, H. Li, and Q. Chen. Convolutional neural network architectures for matching natural language sentences. In NIPS, pages 2042--2050, 2014.Google ScholarDigital Library
- P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM, pages 2333--2338. ACM, 2013. Google ScholarDigital Library
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. Google ScholarCross Ref
- J. Ko, E. Nyberg, and L. Si. A probabilistic graphical model for joint answer ranking in question answering. In SIGIR, pages 343--350. ACM, 2007. Google ScholarDigital Library
- F. Li and H. Jagadish. Constructing an interactive natural language interface for relational databases. VLDB, 8(1):73--84, 2014. Google ScholarDigital Library
- Y. Li, H. Yang, and H. Jagadish. Nalix: an interactive natural language interface for querying xml. In SIGMOD, pages 900--902. ACM, 2005. Google ScholarDigital Library
- G. Limaye, S. Sarawagi, and S. Chakrabarti. Annotating and searching web tables using entities, types and relationships. VLDB, 3(1--2):1338--1347, 2010. Google ScholarDigital Library
- C. D. Manning, P. Raghavan, H. Schütze, et al. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008. Google ScholarCross Ref
- B. Min, R. Grishman, L. Wan, C. Wang, and D. Gondek. Distant supervision for relation extraction with an incomplete knowledge base. In HLT-NAACL, pages 777--782, 2013.Google Scholar
- D. Nadeau and S. Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3--26, 2007. Google ScholarCross Ref
- P. Pasupat and P. Liang. Compositional semantic parsing on semi-structured tables. In ACL, 2015. Google ScholarCross Ref
- R. Pimplikar and S. Sarawagi. Answering table queries on the web using column keywords. VLDB, 5(10):908--919, 2012. Google ScholarDigital Library
- D. Pinto, M. Branstein, R. Coleman, W. B. Croft, M. King, W. Li, and X. Wei. Quasm: a system for question answering using semi-structured data. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pages 46--55. ACM, 2002. Google ScholarDigital Library
- A.-M. Popescu, O. Etzioni, and H. Kautz. Towards a theory of natural language interfaces to databases. In Proceedings of the 8th international conference on Intelligent user interfaces, pages 149--157. ACM, 2003. Google ScholarDigital Library
- S. Reddy, M. Lapata, and M. Steedman. Large-scale semantic parsing without question-answer pairs. Transactions of the Association for Computational Linguistics, 2:377--392, 2014.Google ScholarCross Ref
- N. Schlaefer, P. Gieselmann, T. Schaaf, and A. Waibel. A pattern learning approach to question answering within the ephyra framework. In Text, speech and dialogue, pages 687--694. Springer, 2006. Google ScholarDigital Library
- Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM, pages 101--110. ACM, 2014. Google ScholarDigital Library
- Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. Learning semantic representations using convolutional neural networks for web search. In WWW companion, pages 373--374, 2014. Google ScholarDigital Library
- R. Socher, D. Chen, C. D. Manning, and A. Ng. Reasoning with neural tensor networks for knowledge base completion. In NIPS, pages 926--934, 2013.Google ScholarDigital Library
- H. Sun, H. Ma, W.-t. Yih, C.-T. Tsai, J. Liu, and M.-W. Chang. Open domain question answering via semantic enrichment. In WWW, pages 1045--1055, 2015. Google ScholarDigital Library
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104--3112, 2014.Google ScholarDigital Library
- C. Unger, L. Bühmann, J. Lehmann, A.-C. Ngonga Ngomo, D. Gerber, and P. Cimiano. Template-based question answering over RDF data. In WWW, pages 639--648, 2012. Google ScholarDigital Library
- P. Venetis, A. Halevy, J. Madhavan, M. Paşca, W. Shen, F. Wu, G. Miao, and C. Wu. Recovering semantics of tables on the web. VLDB, 4(9):528--538, 2011. Google ScholarDigital Library
- E. M. Voorhees and D. M. Tice. Building a question answering test collection. In SIGIR, pages 200--207. ACM, 2000. Google ScholarDigital Library
- R. West, E. Gabrilovich, K. Murphy, S. Sun, R. Gupta, and D. Lin. Knowledge base completion via search-based question answering. In WWW, pages 515--526, 2014. Google ScholarDigital Library
- M. Yahya, K. Berberich, S. Elbassuoni, M. Ramanath, V. Tresp, and G. Weikum. Natural language questions for the web of data. In EMNLP-CoNLL, pages 379--390. ACL, 2012. Google ScholarDigital Library
- M. Yakout, K. Ganjam, K. Chakrabarti, and S. Chaudhuri. Infogather: entity augmentation and attribute discovery by holistic matching with web tables. In SIGMOD, pages 97--108. ACM, 2012. Google ScholarDigital Library
- M. Yang, B. Ding, S. Chaudhuri, and K. Chakrabarti. Finding patterns in a knowledge base using keywords to compose table answers. VLDB, 7(14):1809--1820, 2014. Google ScholarDigital Library
- Y. Yang and M.-W. Chang. S-mart: Novel tree-based structured learning algorithms applied to tweet entity linking. In ACL, 2015.Google ScholarCross Ref
- X. Yao and B. Van Durme. Information extraction over structured data: Question answering with freebase. In ACL, 2014.Google ScholarCross Ref
- W.-t. Yih, M.-W. Chang, X. He, and J. Gao. Semantic parsing via staged query graph generation: Question answering with knowledge base. In ACL, 2015.Google ScholarCross Ref
- M. Zhang and K. Chakrabarti. Infogather+: Semantic matching and annotation of numeric and time-varying attributes in web tables. In SIGMOD, pages 145--156. ACM, 2013. Google ScholarDigital Library
- L. Zou, R. Huang, H. Wang, J. X. Yu, W. He, and D. Zhao. Natural language question answering over RDF: a graph data driven approach. In SIGMOD, pages 313--324. ACM, 2014. Google ScholarDigital Library
Index Terms
- Table Cell Search for Question Answering
Recommendations
Open Domain Question Answering via Semantic Enrichment
WWW '15: Proceedings of the 24th International Conference on World Wide WebMost recent question answering (QA) systems query large-scale knowledge bases (KBs) to answer a question, after parsing and transforming natural language questions to KBs-executable forms (e.g., logical forms). As a well-known fact, KBs are far from ...
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data MiningCommunity Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...
Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering
Question answering (QA) aims at finding exact answers to a user's question from a large collection of documents. Most QA systems combine information retrieval with extraction techniques to identify a set of likely candidates and then utilize some ...
Comments