Abstract
By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, we extend crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time. However, the number of crowd-sourced mini-tasks to fill in missing values may often be prohibitively large and the resulting data quality is doubtful. Instead of simple crowd-sourcing to obtain all values individually, we leverage the usergenerated data found in the Social Web: By exploiting user ratings we build perceptual spaces, i.e., highly-compressed representations of opinions, impressions, and perceptions of large numbers of users. Using few training samples obtained by expert crowd sourcing, we then can extract all missing data automatically from the perceptual space with high quality and at low costs. Extensive experiments show that our approach can boost both performance and quality of crowd-enabled databases, while also providing the flexibility to expand schemas in a query-driven fashion.
- M. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, "CrowdDB: Answering queries with crowdsourcing," in Proc. SIGMOD Int. Conf. on Management of Data, pp. 61--72, 2011. Google ScholarDigital Library
- A. Parameswaran and N. Polyzotis, "Answering queries using humans, algorithms and databases," in Proc. Conf. on Innovative Data Systems Research, pp. 160--166, 2011.Google Scholar
- A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller, "Crowdsourced databases: query processing with people," in Proc. Conf. on Innovative Data Systems Research, pp. 211--214, 2011.Google Scholar
- X. Dong and A. Y. Halevy, "Malleable schemas: a preliminary report," in Proc. Int. Workshop on the Web and Databases, pp. 139--144, 2005.Google Scholar
- X. Dong and A. Halevy, "A platform for personal information management and integration," in Proc. Conf. on Innovative Data Systems Research, pp. 119--130, 2005.Google Scholar
- X. Zhou, J. Gaugaz, W.-T. Balke, and W. Nejdl, "Query relaxation using malleable schemas," in Proc. SIGMOD Int. Conf. on Management of Data, pp. 545--556, 2007. Google ScholarDigital Library
- P. G. Ipeirotis, "Analyzing the Amazon Mechanical Turk marketplace," ACM Crossroads, 17(2), pp. 16--21, 2010. Google ScholarDigital Library
- D. Kahneman and A. Tversky, "The psychology of preferences," Scientific American, 246(1), pp. 160--173, 1982.Google ScholarCross Ref
- T. Hofmann, "Latent semantic models for collaborative filtering," ACM Transactions on Information Systems, 22(1), pp. 89--115, 2004. Google ScholarDigital Library
- Y. Koren and R. Bell, "Advances in collaborative filtering," in Recommender Systems Handbook, pp. 145--186, Springer, 2011.Google Scholar
- J. Selke and W. T. Balke, "Extracting features from ratings: the role of factor models," in Proc. Multidisciplinary Workshop on Advances in Preference Handling, pp. 61--66, 2011.Google Scholar
- M. Khoshneshin and W. Street, "Collaborative filtering via Euclidean embedding," in Proc. ACM Conf. on Recommender Systems, pp. 87--94, 2010. Google ScholarDigital Library
- R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis, "Large-scale matrix factorization with distributed stochastic gradient descent," in Proc. SIGKDD Conf. on Knowledge Discovery and Data Mining, pp. 69--77, 2011. Google ScholarDigital Library
- H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support vector regression machines," Advances in Neural Information Processing Systems, 54(2), pp. 155--161, 1997.Google Scholar
- F. Jäkel, B. Schölkopf, and F. Wichmann, "Does cognitive science need kernels?," Trends in Cognitive Sciences, 13(9), pp. 381--388, 2009.Google ScholarCross Ref
- R. Stam, Film Theory: An Introduction. Blackwell, 2000. Google ScholarDigital Library
- A. Kittur, E. H. Chi, and B. Suh, "Crowdsourcing user studies with Mechanical Turk," in Proc. Conf. on Human Factors in Computing Systems, pp. 453--456, 2008. Google ScholarDigital Library
- B. Mehta and W. Nejdl, "Attack resistant collaborative filtering," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 75--82, 2008. Google ScholarDigital Library
- A. J. Smola and B. Schölkopf, "A tutorial on support vector regression," Statistics and Computing, 14(3), pp. 199--222, 2004. Google ScholarDigital Library
- H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, 21(9), pp. 1263--1284, 2009. Google ScholarDigital Library
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, 41(6), pp. 391--407, 1990.Google ScholarCross Ref
- X. Amatriain, N. Lathia, J. M. Pujol, H. Kwak, and N. Oliver, "The wisdom of the few: a collaborative filtering approach based on expert opinions from the web," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 532--539, 2009. Google ScholarDigital Library
- R. Jin, L. Si, and C. Zhai, "A study of mixture models for collaborative filtering," Information Retrieval, 9(3), pp. 357--382, 2006. Google ScholarDigital Library
- Y. Koren, "Collaborative filtering with temporal dynamics," Communications of the ACM, 53(4), pp. 89--97, 2010. Google ScholarDigital Library
- G. Katz, N. Ofek, B. Shapira, L. Rokach, and G. Shani, "Using Wikipedia to boost collaborative filtering techniques," in Proc. ACM Conf. on Recommender Systems, pp. 285--288, 2011. Google ScholarDigital Library
- O. Chapelle, B. Schölkopf, and A. Zien, Eds., Semi-Supervised Learning. MIT Press, 2006.Google ScholarCross Ref
- T. Joachims, "Transductive inference for text classification using support vector machines," in Proc. Int. Conf. on Machine Learning, pp. 200--209, 1999. Google ScholarDigital Library
- R. Collobert, F. Sinz, J. Weston, and L. Bottou, "Large scale transductive SVMs," Journal of Machine Learning Research, 7(7), pp. 1687--1712, 2006. Google ScholarDigital Library
- J. Heer and M. Bostock, "Crowdsourcing graphical perception: using Mechanical Turk to assess visualization design," in Proc. Conf. on Human Factors in Computing Systems, pp. 203--212, 2010. Google ScholarDigital Library
- M. Goodchild and J. A. Glennon, "Crowdsourcing geographic information for disaster response: a research frontier," Int. Journal of Digital Earth, 3(3), pp. 231--241, 2010.Google ScholarCross Ref
- V. Sheng, F. Provost, and P. G. Ipeirotis, "Get another label? improving data quality and data mining using multiple, noisy labelers," in Proc. Int. Conf. on Knowledge Discovery and Data Mining, pp. 614--622, 2008. Google ScholarDigital Library
- V. C. Raykar et al., "Learning from crowds," Journal of Machine Learning Research, 99, pp. 1297--1322, 2010. Google ScholarDigital Library
- H. Yang, A. Mityagin, K. M. Svore, and S. Markov, "Collecting high quality overlapping labels at low cost," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 459--466, 2010. Google ScholarDigital Library
- W. Mason and D. J. Watts, "Financial incentives and the performance of crowds," SIGKDD Explorations, 11(2), pp. 100--108, 2010. Google ScholarDigital Library
- G. Murphy, The Big Book of Concepts. MIT Press, 2004.Google Scholar
- A. M. Olney, "Likability-based genres: analysis and evaluation of the Netflix dataset," in Proc. Annual Meeting of the Cognitive Science Society, pp. 37--42, 2010.Google Scholar
Recommendations
Skyline queries in crowd-enabled databases
EDBT '13: Proceedings of the 16th International Conference on Extending Database TechnologySkyline queries are a well-established technique for database query personalization and are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skylines queries are severely hampered and ...
Bridging XML-schema and relational databases: a system for generating and manipulating relational databases using valid XML documents
DocEng '01: Proceedings of the 2001 ACM Symposium on Document engineeringMany organizations and enterprises establish distributed working environments, where different users need to exchange information based on a common model. XML is widely used to facilitate this information exchange. The extensibility of XML allows the ...
Query processing under GLAV mappings for relational and graph databases
Schema mappings establish a correspondence between data stored in two databases, called source and target respectively. Query processing under schema mappings has been investigated extensively in the two cases where each target atom is mapped to a query ...
Comments