Abstract
Helpdesk databases are used to store past interactions between customers and companies to improve customer service quality. One common scenario of using helpdesk database is to find whether recommendations exist given a new problem from a customer. However, customers often provide incomplete or even inaccurate information. Manually preparing a list of clarification questions does not work for large databases. This paper investigates the problem of automatic generation of a minimal number of questions to reach an appropriate recommendation. This paper proposes a novel dynamic active probing method. Compared to other alternatives such as decision tree and case-based reasoning, this method has two distinctive features. First, it actively probe the customer to get useful information to reach the recommendation, and the information provided by customer will be immediately used by the method to dynamically generate the next questions to probe. This feature ensures that all available information from the customer is used. Second, this method is based on a probabilistic model, and uses a data augmentation method which avoids overfitting when estimating the probabilities in the model. This feature ensures that the method is robust to databases that are incomplete or contain errors. Experimental results verify the effectiveness of our approach.
- C. C. Aggarwal and P. S. Yu. The igrid index: reversing the dimensionality curse for similarity indexing in high dimensional space. In SIGKDD, pages 119--129, 2000. Google ScholarDigital Library
- R. Agrawal, R. Rantzau, and E. Terzi. Context-sensitive ranking. In SIGMOD, pages 383--394, 2006. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, G. Das, and A. Gionis. Automated ranking of database query results. In CIDR, 2003.Google Scholar
- D. W. Aha, D. Mcsherry, and Q. Yang. Advances in conversational case-based reasoning. The Knowledge Engineering Review, 20(3):247--254, 2006. Google ScholarDigital Library
- J. Allen. Natural Language Understanding. Addison Wesley, 1994.Google Scholar
- K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is nearest neighbor meaningful? In ICDT, 1999. Google ScholarDigital Library
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. In ICDE, pages 421--430, 2001. Google ScholarDigital Library
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth and Brooks/Cole, Monterey, CA, 1984.Google Scholar
- D. Bridge, M. H. Goker, L. Mcginty, and B. Smyth. Case-based recommender systems. The Knowledge Engineering Review. 20(3):315--320, 2006. Google ScholarDigital Library
- M. Brodie, I. Rish, and S. Ma. Intelligent probing: a cost-effective approach to fault diagnosis in computer networks. IBM System Journal, 41(3), 2002. Google ScholarDigital Library
- C.-H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan. A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 18(10):1411--1428, 2006. Google ScholarDigital Library
- S. Chaudhuri, G. Das, V. Hristidis, and G. Weikum. Probabilistic ranking of database query results. In VLDB, pages 888--899, 2004. Google ScholarDigital Library
- S. Chaudhuri and L. Gravano. Evaluating top-k selection queries. In M. P. Atkinson, M. E. Orlowska, P. Valduriez, S. B. Zdonik, and M. L. Brodie, editors, VLDB, pages 397--410. Morgan Kaufmann, 1999. Google ScholarDigital Library
- C. Cieri, D. Graff, and M. Liberman. The TDT-2 text and speech corpus, 1999.Google Scholar
- D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In Advances in Neural Information Processing Systems, pages 705--712, 1995.Google ScholarCross Ref
- P. Cunningham and S. B. A comparison of Model-based and incremental case-based approaches to electronic fault diagnosis. Technical Report TCD-CS-94-21, 1994. Google ScholarDigital Library
- T. Darrell, P. Indyk, and G. Shakhnarovich. Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press, 2006. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Proceedings of the Royal Statistical Society, pages 1--38, 1976.Google Scholar
- L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer, 2001.Google ScholarCross Ref
- M. Doyle and P. Cunningham. A dynamic approach to reducing dialog in on-line decision guides. In EWCBR, pages 49--60, 2000. Google ScholarDigital Library
- J. English, M. Hearst, R. Sinha, K. Swearingen, and K.-P. Yee. Hierarchical faceted metadata in site search interfaces. In CHI '02, pages 628--639, 2002. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001. Google ScholarDigital Library
- M. Franz, T. Ward, J. S. McCarley, and W.-J. Zhu. Unsupervised and supervised clustering for topic tracking. In SIGIR, pages 310--317, 2001. Google ScholarDigital Library
- Z. Ghahramani and M. I. Jordan. Supervised learning from incomplete data via an EM approach. In Advances in Neural Information Processing Systems, pages 120--127, 1994.Google ScholarDigital Library
- R. Grishman. Information extraction: Techniques and challenges. In SCIE '97: International Summer School on Information Extraction, pages 10--27, 1997. Springer-Verlag. Google ScholarDigital Library
- T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, Prediction. Springer, 2001.Google Scholar
- H. V. Jagadish, B. C. Ooi, K.-L. Tan, C. Yu, and R. Z. 0003. idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst., 30(2):364--397, 2005. Google ScholarDigital Library
- M. Jayapandian and H. V. Jagadish. Automating the design and construction of query forms. In ICDE, page 125, 2006. Google ScholarDigital Library
- A. A. Kamis and E. A. Stohr. Parametric search engines: what makes them effective when shopping online for differentiated products? Inf. Manage., 43(7):904--918, 2006. Google ScholarDigital Library
- R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1--2):273--324, 1997. Google ScholarDigital Library
- D. Kossmann, F. Ramsak, and S. Rost. Shooting stars in the sky: An online algorithm for skyline queries. In VLDB, pages 275--286, 2002. Google ScholarDigital Library
- G. Kumaran and J. Allan. Simple questions to improve pseudo-relevance feedback results. In SIGIR, pages 661--662, 2006. Google ScholarDigital Library
- P. Langley. Selection of relevant features in machine learning. In AAAI Fall Symposium on Relevance, pages 140--144, 1994.Google ScholarCross Ref
- N. Mirzadeh, F. Ricci, and M. Bansal. Feature selection methods for conversational recommender systems. In Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service, pages 772--777, 2005. Google ScholarDigital Library
- National Retail Federation. Importance of customer service reinforced in nrf foundation/american express study. http://www.nrf.com/content/press/release2004/custserv1104.htm.Google Scholar
- T. Nguyen, M. Czerwinski, and D. Lee. COMPAQ QuickSource: Providing the consumer with the power of artificial intelligence. In IAAI '93, pages 142--151, 1993. Google ScholarDigital Library
- J. R. Quinlan. Introduction of decision trees. Machine Learning, (1):81--106, 1986. Google ScholarDigital Library
- D. B. Rubin. Multiple Imputation for Nonresponse in surveys. Wiley, New York, 1987.Google Scholar
- S. Schmitt. simvar: A similarity-influenced question selection criterion for e-sales dialogs. Artificial Intelligence Review, 18:195--221, 2002. Google ScholarDigital Library
- H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Computational Learning Theory, pages 287--294, 1992. Google ScholarDigital Library
- K. Sparck Jones. Automatic Keyword Classification for Information Retrieval. Butterworth, London, 1971.Google Scholar
- M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82:528--550, 1987.Google ScholarCross Ref
- P. Wu, Y. Sismanis, and B. Reinwald. Towards keyword-driven analytical processing. In SIGMOD, pages 617--628, 2007. Google ScholarDigital Library
- J. Xu and W. B. Croft. Query expansion using local and global document analysis. In SIGIR, pages 4--11, 1996. Google ScholarDigital Library
- K.-P. Yee, K. Swearingen, K. Li, and M. Hearst. Faceted metadata for image search and browsing. In CHI '03, pages 401--408, 2003. Google ScholarDigital Library
Index Terms
- Dynamic active probing of helpdesk databases
Recommendations
Probe generation for active probing
SummaryActive probing is a widely adopted approach for developing effective solutions for network monitoring and diagnosing. However, the use of probing techniques incurs costs in terms of additional network traffic. Furthermore, probing stations are ...
Active probing is a widely adopted approach for developing effective solutions for network monitoring and diagnosing. The set of probes used for fault detection and/or diagnosis (called the target probe set) is selected by a probe selection algorithm ...
Activating Case-Based Reasoning with Active Databases
EWCBR '00: Proceedings of the 5th European Workshop on Advances in Case-Based ReasoningMany of today's CBR systems are passive in nature: they require human users to activate them manually and to provide information about the incoming problem explicitly. In this paper, we present an integrated system that combines CBR system with an ...
Real-Time and Active Databases: A Survey
ARTDB '97: Proceedings of the Second International Workshop on Active, Real-Time, and Temporal Database SystemsActive real-time databases have emerged as a research area in which concepts of active databases and real-time databases are combined into a real-time database with reactive behavior. However, this marriage is not free from complications. The main ...
Comments