skip to main content
research-article

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Authors Info & Claims
Published:01 February 2012Publication History
Skip Abstract Section

Abstract

By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, we extend crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time. However, the number of crowd-sourced mini-tasks to fill in missing values may often be prohibitively large and the resulting data quality is doubtful. Instead of simple crowd-sourcing to obtain all values individually, we leverage the usergenerated data found in the Social Web: By exploiting user ratings we build perceptual spaces, i.e., highly-compressed representations of opinions, impressions, and perceptions of large numbers of users. Using few training samples obtained by expert crowd sourcing, we then can extract all missing data automatically from the perceptual space with high quality and at low costs. Extensive experiments show that our approach can boost both performance and quality of crowd-enabled databases, while also providing the flexibility to expand schemas in a query-driven fashion.

References

  1. M. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, "CrowdDB: Answering queries with crowdsourcing," in Proc. SIGMOD Int. Conf. on Management of Data, pp. 61--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Parameswaran and N. Polyzotis, "Answering queries using humans, algorithms and databases," in Proc. Conf. on Innovative Data Systems Research, pp. 160--166, 2011.Google ScholarGoogle Scholar
  3. A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller, "Crowdsourced databases: query processing with people," in Proc. Conf. on Innovative Data Systems Research, pp. 211--214, 2011.Google ScholarGoogle Scholar
  4. X. Dong and A. Y. Halevy, "Malleable schemas: a preliminary report," in Proc. Int. Workshop on the Web and Databases, pp. 139--144, 2005.Google ScholarGoogle Scholar
  5. X. Dong and A. Halevy, "A platform for personal information management and integration," in Proc. Conf. on Innovative Data Systems Research, pp. 119--130, 2005.Google ScholarGoogle Scholar
  6. X. Zhou, J. Gaugaz, W.-T. Balke, and W. Nejdl, "Query relaxation using malleable schemas," in Proc. SIGMOD Int. Conf. on Management of Data, pp. 545--556, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. G. Ipeirotis, "Analyzing the Amazon Mechanical Turk marketplace," ACM Crossroads, 17(2), pp. 16--21, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Kahneman and A. Tversky, "The psychology of preferences," Scientific American, 246(1), pp. 160--173, 1982.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. Hofmann, "Latent semantic models for collaborative filtering," ACM Transactions on Information Systems, 22(1), pp. 89--115, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Koren and R. Bell, "Advances in collaborative filtering," in Recommender Systems Handbook, pp. 145--186, Springer, 2011.Google ScholarGoogle Scholar
  11. J. Selke and W. T. Balke, "Extracting features from ratings: the role of factor models," in Proc. Multidisciplinary Workshop on Advances in Preference Handling, pp. 61--66, 2011.Google ScholarGoogle Scholar
  12. M. Khoshneshin and W. Street, "Collaborative filtering via Euclidean embedding," in Proc. ACM Conf. on Recommender Systems, pp. 87--94, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis, "Large-scale matrix factorization with distributed stochastic gradient descent," in Proc. SIGKDD Conf. on Knowledge Discovery and Data Mining, pp. 69--77, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support vector regression machines," Advances in Neural Information Processing Systems, 54(2), pp. 155--161, 1997.Google ScholarGoogle Scholar
  15. F. Jäkel, B. Schölkopf, and F. Wichmann, "Does cognitive science need kernels?," Trends in Cognitive Sciences, 13(9), pp. 381--388, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  16. R. Stam, Film Theory: An Introduction. Blackwell, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Kittur, E. H. Chi, and B. Suh, "Crowdsourcing user studies with Mechanical Turk," in Proc. Conf. on Human Factors in Computing Systems, pp. 453--456, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Mehta and W. Nejdl, "Attack resistant collaborative filtering," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 75--82, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. J. Smola and B. Schölkopf, "A tutorial on support vector regression," Statistics and Computing, 14(3), pp. 199--222, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, 21(9), pp. 1263--1284, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, 41(6), pp. 391--407, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Amatriain, N. Lathia, J. M. Pujol, H. Kwak, and N. Oliver, "The wisdom of the few: a collaborative filtering approach based on expert opinions from the web," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 532--539, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Jin, L. Si, and C. Zhai, "A study of mixture models for collaborative filtering," Information Retrieval, 9(3), pp. 357--382, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Koren, "Collaborative filtering with temporal dynamics," Communications of the ACM, 53(4), pp. 89--97, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Katz, N. Ofek, B. Shapira, L. Rokach, and G. Shani, "Using Wikipedia to boost collaborative filtering techniques," in Proc. ACM Conf. on Recommender Systems, pp. 285--288, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. O. Chapelle, B. Schölkopf, and A. Zien, Eds., Semi-Supervised Learning. MIT Press, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  27. T. Joachims, "Transductive inference for text classification using support vector machines," in Proc. Int. Conf. on Machine Learning, pp. 200--209, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Collobert, F. Sinz, J. Weston, and L. Bottou, "Large scale transductive SVMs," Journal of Machine Learning Research, 7(7), pp. 1687--1712, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Heer and M. Bostock, "Crowdsourcing graphical perception: using Mechanical Turk to assess visualization design," in Proc. Conf. on Human Factors in Computing Systems, pp. 203--212, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Goodchild and J. A. Glennon, "Crowdsourcing geographic information for disaster response: a research frontier," Int. Journal of Digital Earth, 3(3), pp. 231--241, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  31. V. Sheng, F. Provost, and P. G. Ipeirotis, "Get another label? improving data quality and data mining using multiple, noisy labelers," in Proc. Int. Conf. on Knowledge Discovery and Data Mining, pp. 614--622, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. C. Raykar et al., "Learning from crowds," Journal of Machine Learning Research, 99, pp. 1297--1322, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. H. Yang, A. Mityagin, K. M. Svore, and S. Markov, "Collecting high quality overlapping labels at low cost," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 459--466, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. W. Mason and D. J. Watts, "Financial incentives and the performance of crowds," SIGKDD Explorations, 11(2), pp. 100--108, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Murphy, The Big Book of Concepts. MIT Press, 2004.Google ScholarGoogle Scholar
  36. A. M. Olney, "Likability-based genres: analysis and evaluation of the Netflix dataset," in Proc. Annual Meeting of the Cognitive Science Society, pp. 37--42, 2010.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 6
    February 2012
    96 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 February 2012
    Published in pvldb Volume 5, Issue 6

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader