research-article

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Authors:
Joachim Selke

Technische Universität Braunschweig, Braunschweig, Germany

Technische Universität Braunschweig, Braunschweig, Germany
View Profile

,
Christoph Lofi

Technische Universität Braunschweig, Braunschweig, Germany

Technische Universität Braunschweig, Braunschweig, Germany
View Profile

,
Wolf-Tilo Balke

Technische Universität Braunschweig, Braunschweig, Germany

Technische Universität Braunschweig, Braunschweig, Germany
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 6pp 538–549https://doi.org/10.14778/2168651.2168655

Published:01 February 2012Publication History

Proceedings of the VLDB Endowment

Abstract

By incorporating human workers into the query execution process crowd-enabled databases facilitate intelligent, social capabilities like completing missing data at query time or performing cognitive operators. But despite all their flexibility, crowd-enabled databases still maintain rigid schemas. In this paper, we extend crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time. However, the number of crowd-sourced mini-tasks to fill in missing values may often be prohibitively large and the resulting data quality is doubtful. Instead of simple crowd-sourcing to obtain all values individually, we leverage the usergenerated data found in the Social Web: By exploiting user ratings we build perceptual spaces, i.e., highly-compressed representations of opinions, impressions, and perceptions of large numbers of users. Using few training samples obtained by expert crowd sourcing, we then can extract all missing data automatically from the perceptual space with high quality and at low costs. Extensive experiments show that our approach can boost both performance and quality of crowd-enabled databases, while also providing the flexibility to expand schemas in a query-driven fashion.

References

M. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin, "CrowdDB: Answering queries with crowdsourcing," in Proc. SIGMOD Int. Conf. on Management of Data, pp. 61--72, 2011. Google ScholarDigital Library
A. Parameswaran and N. Polyzotis, "Answering queries using humans, algorithms and databases," in Proc. Conf. on Innovative Data Systems Research, pp. 160--166, 2011.Google Scholar
A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller, "Crowdsourced databases: query processing with people," in Proc. Conf. on Innovative Data Systems Research, pp. 211--214, 2011.Google Scholar
X. Dong and A. Y. Halevy, "Malleable schemas: a preliminary report," in Proc. Int. Workshop on the Web and Databases, pp. 139--144, 2005.Google Scholar
X. Dong and A. Halevy, "A platform for personal information management and integration," in Proc. Conf. on Innovative Data Systems Research, pp. 119--130, 2005.Google Scholar
X. Zhou, J. Gaugaz, W.-T. Balke, and W. Nejdl, "Query relaxation using malleable schemas," in Proc. SIGMOD Int. Conf. on Management of Data, pp. 545--556, 2007. Google ScholarDigital Library
P. G. Ipeirotis, "Analyzing the Amazon Mechanical Turk marketplace," ACM Crossroads, 17(2), pp. 16--21, 2010. Google ScholarDigital Library
D. Kahneman and A. Tversky, "The psychology of preferences," Scientific American, 246(1), pp. 160--173, 1982.Google ScholarCross Ref
T. Hofmann, "Latent semantic models for collaborative filtering," ACM Transactions on Information Systems, 22(1), pp. 89--115, 2004. Google ScholarDigital Library
Y. Koren and R. Bell, "Advances in collaborative filtering," in Recommender Systems Handbook, pp. 145--186, Springer, 2011.Google Scholar
J. Selke and W. T. Balke, "Extracting features from ratings: the role of factor models," in Proc. Multidisciplinary Workshop on Advances in Preference Handling, pp. 61--66, 2011.Google Scholar
M. Khoshneshin and W. Street, "Collaborative filtering via Euclidean embedding," in Proc. ACM Conf. on Recommender Systems, pp. 87--94, 2010. Google ScholarDigital Library
R. Gemulla, P. J. Haas, E. Nijkamp, and Y. Sismanis, "Large-scale matrix factorization with distributed stochastic gradient descent," in Proc. SIGKDD Conf. on Knowledge Discovery and Data Mining, pp. 69--77, 2011. Google ScholarDigital Library
H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vapnik, "Support vector regression machines," Advances in Neural Information Processing Systems, 54(2), pp. 155--161, 1997.Google Scholar
F. Jäkel, B. Schölkopf, and F. Wichmann, "Does cognitive science need kernels?," Trends in Cognitive Sciences, 13(9), pp. 381--388, 2009.Google ScholarCross Ref
R. Stam, Film Theory: An Introduction. Blackwell, 2000. Google ScholarDigital Library
A. Kittur, E. H. Chi, and B. Suh, "Crowdsourcing user studies with Mechanical Turk," in Proc. Conf. on Human Factors in Computing Systems, pp. 453--456, 2008. Google ScholarDigital Library
B. Mehta and W. Nejdl, "Attack resistant collaborative filtering," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 75--82, 2008. Google ScholarDigital Library
A. J. Smola and B. Schölkopf, "A tutorial on support vector regression," Statistics and Computing, 14(3), pp. 199--222, 2004. Google ScholarDigital Library
H. He and E. A. Garcia, "Learning from imbalanced data," IEEE Transactions on Knowledge and Data Engineering, 21(9), pp. 1263--1284, 2009. Google ScholarDigital Library
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, 41(6), pp. 391--407, 1990.Google ScholarCross Ref
X. Amatriain, N. Lathia, J. M. Pujol, H. Kwak, and N. Oliver, "The wisdom of the few: a collaborative filtering approach based on expert opinions from the web," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 532--539, 2009. Google ScholarDigital Library
R. Jin, L. Si, and C. Zhai, "A study of mixture models for collaborative filtering," Information Retrieval, 9(3), pp. 357--382, 2006. Google ScholarDigital Library
Y. Koren, "Collaborative filtering with temporal dynamics," Communications of the ACM, 53(4), pp. 89--97, 2010. Google ScholarDigital Library
G. Katz, N. Ofek, B. Shapira, L. Rokach, and G. Shani, "Using Wikipedia to boost collaborative filtering techniques," in Proc. ACM Conf. on Recommender Systems, pp. 285--288, 2011. Google ScholarDigital Library
O. Chapelle, B. Schölkopf, and A. Zien, Eds., Semi-Supervised Learning. MIT Press, 2006.Google ScholarCross Ref
T. Joachims, "Transductive inference for text classification using support vector machines," in Proc. Int. Conf. on Machine Learning, pp. 200--209, 1999. Google ScholarDigital Library
R. Collobert, F. Sinz, J. Weston, and L. Bottou, "Large scale transductive SVMs," Journal of Machine Learning Research, 7(7), pp. 1687--1712, 2006. Google ScholarDigital Library
J. Heer and M. Bostock, "Crowdsourcing graphical perception: using Mechanical Turk to assess visualization design," in Proc. Conf. on Human Factors in Computing Systems, pp. 203--212, 2010. Google ScholarDigital Library
M. Goodchild and J. A. Glennon, "Crowdsourcing geographic information for disaster response: a research frontier," Int. Journal of Digital Earth, 3(3), pp. 231--241, 2010.Google ScholarCross Ref
V. Sheng, F. Provost, and P. G. Ipeirotis, "Get another label? improving data quality and data mining using multiple, noisy labelers," in Proc. Int. Conf. on Knowledge Discovery and Data Mining, pp. 614--622, 2008. Google ScholarDigital Library
V. C. Raykar et al., "Learning from crowds," Journal of Machine Learning Research, 99, pp. 1297--1322, 2010. Google ScholarDigital Library
H. Yang, A. Mityagin, K. M. Svore, and S. Markov, "Collecting high quality overlapping labels at low cost," in Proc. Int. SIGIR Conf. on Research and Development in Information Retrieval, pp. 459--466, 2010. Google ScholarDigital Library
W. Mason and D. J. Watts, "Financial incentives and the performance of crowds," SIGKDD Explorations, 11(2), pp. 100--108, 2010. Google ScholarDigital Library
G. Murphy, The Big Book of Concepts. MIT Press, 2004.Google Scholar
A. M. Olney, "Likability-based genres: analysis and evaluation of the Netflix dataset," in Proc. Annual Meeting of the Cognitive Science Society, pp. 37--42, 2010.Google Scholar

Recommendations

Skyline queries in crowd-enabled databases
EDBT '13: Proceedings of the 16th International Conference on Extending Database Technology

Skyline queries are a well-established technique for database query personalization and are widely acclaimed for their intuitive query formulation mechanisms. However, when operating on incomplete datasets, skylines queries are severely hampered and ...
Read More
Bridging XML-schema and relational databases: a system for generating and manipulating relational databases using valid XML documents
DocEng '01: Proceedings of the 2001 ACM Symposium on Document engineering

Many organizations and enterprises establish distributed working environments, where different users need to exchange information based on a common model. XML is widely used to facilitate this information exchange. The extensibility of XML allows the ...
Read More
Query processing under GLAV mappings for relational and graph databases

Schema mappings establish a correspondence between data stored in two databases, called source and target respectively. Query processing under schema mappings has been investigated extensively in the two cases where each target atom is mapped to a query ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 6
February 2012
96 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 February 2012
Published in pvldb Volume 5, Issue 6
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 223
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Skyline queries in crowd-enabled databases

Bridging XML-schema and relational databases: a system for generating and manipulating relational databases using valid XML documents

Query processing under GLAV mappings for relational and graph databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Pushing the boundaries of crowd-enabled databases with query-driven schema expansion

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Skyline queries in crowd-enabled databases

Bridging XML-schema and relational databases: a system for generating and manipulating relational databases using valid XML documents

Query processing under GLAV mappings for relational and graph databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media