research-article

Open Access

Ground Truth Inference for Weakly Supervised Entity Matching

Authors:
Renzhi Wu

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA

0000-0002-9144-8999
View Profile

,
Alexander Bendeck

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA

0000-0002-9799-2194
View Profile

,
Xu Chu

Georgia Institute of Technology, Atlanta, GA, USA

Georgia Institute of Technology, Atlanta, GA, USA

0009-0007-3202-3767
View Profile

,
Yeye He

Microsoft Research, Redmond, WA, USA

Microsoft Research, Redmond, WA, USA

0000-0003-2824-5299
View Profile

Authors Info & Claims

Proceedings of the ACM on Management of Data Volume 1 Issue 1Article No.: 32pp 1–28https://doi.org/10.1145/3588712

Published:30 May 2023Publication History

Proceedings of the ACM on Management of Data

Abstract

Entity matching (EM) refers to the problem of identifying pairs of data records in one or more relational tables that refer to the same entity in the real world. Supervised machine learning (ML) models currently achieve state-of-the-art matching performance; however, they require a large number of labeled examples, which are often expensive or infeasible to obtain. This has inspired us to approach data labeling for EM using weak supervision. In particular, we use the labeling function abstraction popularized by Snorkel, where each labeling function (LF) is a user-provided program that can generate many noisy match/non-match labels quickly and cheaply. Given a set of user-written LFs, the quality of data labeling depends on a labeling model to accurately infer the ground-truth labels. In this work, we first propose a simple but powerful labeling model for general weak supervision tasks. Then, we tailor the labeling model specifically to the task of entity matching by considering the EM-specific transitivity property.

The general form of our labeling model is simple while substantially outperforming the best existing method across ten general weak supervision datasets. To tailor the labeling model for EM, we formulate an approach to ensure that the final predictions of the labeling model satisfy the transitivity property required in EM, utilizing an exact solution where possible and an ML-based approximation in remaining cases. On two single-table and nine two-table real-world EM datasets, we show that our labeling model results in a 9% higher F1 score on average than the best existing method. We also show that a deep learning EM end model (DeepMatcher) trained on labels generated from our weak supervision approach is comparable to an end model trained using tens of thousands of ground-truth labels, demonstrating that our approach can significantly reduce the labeling efforts required in EM.

Supplemental Material

PACMMOD-V1mod032.mp4

Presentation video for the paper "Ground Truth Inference for Weakly Supervised Entity Matching" at SIGMOD 2023

mp4

79.9 MB

Download

References

[n.d.]. Benchmark datasets for entity resolution. https://dbs.uni-leipzig.de/research/projects/object_matching/benchmark_datasets_for_entity_resolution.Google Scholar
[n.d.]. Competera Product Matching for Price Comparison. https://competera.net/solutions/by-need/product-matching.Google Scholar
2021. Blocking - py_ entitymatching 0.4.0 documentation. http://anhaidgroup.github.io/py_entitymatching/v0.4.0/user_manual/api/blocking.html [Online; accessed 6. Jul. 2022].Google Scholar
2021. Cholesky decomposition - Wikipedia. https://en.wikipedia.org/w/index.php?title=Cholesky_decomposition&oldid=1059421881 [Online; accessed 21. Jan. 2022].Google Scholar
2021. Permutation matrix - Wikipedia. https://en.wikipedia.org/w/index.php?title=Permutation_matrix&oldid=1059174802 [Online; accessed 11. Dec. 2021].Google Scholar
2021. scipy.sparse.csgraph.min _weight _ full_ bipartite_matching - SciPy v1.7.1 Manual. https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csgraph.min_weight_full_bipartite_matching.html#scipy.sparse.csgraph. min_weight_full_bipartite_matching [Online; accessed 9. Dec. 2021].Google Scholar
2022. Ground Truth Inference for Weakly Supervised Entity Matching (technical report). https://figshare.com/s/6d57cabada80b1e3d42d.Google Scholar
2022. SIMPLE: data and code. https://figshare.com/s/60a4b1595827bb44d5aeGoogle Scholar
2022. snorkel. https://github.com/snorkel-team/snorkel [Online; accessed 23. Jan. 2022].Google Scholar
2022. wrench. https://github.com/JieyuZ2/wrench [Online; accessed 23. Feb. 2022].Google Scholar
anhaidgroup. 2022. deepmatcher. https://github.com/anhaidgroup/deepmatcher [Online; accessed 7. Jan. 2022].Google Scholar
Anonymous. 2023. Learning Hyper Label Model for Programmatic Weak Supervision. In Submitted to The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=aCQt_BrkSjC under review.Google Scholar
Arvind Arasu, Michaela Götz, and Raghav Kaushik. 2010. On active learning of record matching packages. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 783--794.Google ScholarDigital Library
Eric Arazo, Diego Ortego, Paul Albert, Noel E O'Connor, and Kevin McGuinness. 2020. Pseudo-labeling and confirmation bias in deep semi-supervised learning. In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 1--8.Google ScholarCross Ref
Jurian Baas, Mehdi Dastani, and Ad Feelders. 2021. Exploiting transitivity constraints for entity matching in knowledge graphs. arXiv preprint arXiv:2104.12589 (2021).Google Scholar
Christoph Böhm, Gerard De Melo, Felix Naumann, and Gerhard Weikum. 2012. LINDA: distributed web-of-data-scale entity matching. In Proceedings of the 21st ACM international conference on Information and knowledge management. 2104--2108.Google ScholarDigital Library
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.Google ScholarCross Ref
Peter Christen. 2012. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer.Google ScholarDigital Library
chu-data lab. 2022. zeroer. https://github.com/chu-data-lab/zeroer [Online; accessed 10. Jul. 2022].Google Scholar
Contributors to Wikimedia projects. 2022. Variational Bayesian methods - Wikipedia. https://en.wikipedia.org/w/index.php?title=Variational_Bayesian_methods&oldid=1071116594 [Online; accessed 25. Mar. 2022].Google Scholar
Valter Crescenzi, Andrea De Angelis, Donatella Firmani, Maurizio Mazzei, Paolo Merialdo, Federico Piai, and Divesh Srivastava. 2021. Alaska: A Flexible Benchmark for Data Integration Tasks. arXiv preprint arXiv:2101.11259 (2021).Google Scholar
Hong Cui, Jingjing Zhang, Chunfeng Cui, and Qinyu Chen. 2016. Solving large-scale assignment problems by Kuhn-Munkres algorithm. In 2nd Int. Conf. Advances Mech. Eng. Ind. Inform.(AMEII 2016).Google ScholarCross Ref
Tivadar Danka and Peter Horvath. [n.d.]. modAL: A modular active learning framework for Python. ([n. d.]). https://github.com/modAL-python/modAL available on arXiv at https://arxiv.org/abs/1805.00979.Google Scholar
Nilaksh Das, Sanya Chaba, Renzhi Wu, Sakshi Gandhi, Duen Horng Chau, and Xu Chu. 2020. Goggles: Automatic image labeling with affinity coding. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 1717--1732.Google ScholarDigital Library
Sanjib Das, AnHai Doan, Paul Suganthan G. C., Chaitanya Gokhale, Pradap Konda, Yash Govind, and Derek Paulsen. [n.d.]. The Magellan Data Repository. https://sites.google.com/site/anhaidgroup/useful-stuff/the-magellan-data-repository.Google Scholar
Alexander Philip Dawid and Allan M Skene. 1979. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics) 28, 1 (1979), 20--28.Google ScholarCross Ref
Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. 2012. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In Proceedings of the 21st international conference on World Wide Web. 469--478.Google ScholarDigital Library
Xin Luna Dong. 2019. Building a Broad Knowledge Graph for Products. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8--11, 2019. IEEE, 25. https://doi.org/10.1109/ICDE.2019.00010Google ScholarCross Ref
Xin Luna Dong and Theodoros Rekatsinas. 2018. Data integration and machine learning: A natural synergy. In Proceedings of the 2018 international conference on management of data. 1645--1650.Google ScholarDigital Library
Ahmed K Elmagarmid, Panagiotis G Ipeirotis, and Vassilios S Verykios. 2007. Duplicate Record Detection: A Survey. IEEETKDE 19, 1 (2007), 1--16.Google Scholar
Jason A Fries, Paroma Varma, Vincent S Chen, Ke Xiao, Heliodoro Tejeda, Priyanka Saha, Jared Dunnmon, Henry Chubb, Shiraz Maskatia, Madalina Fiterau, et al. 2019. Weakly supervised classification of aortic valve malformations using unlabeled cardiac MRI sequences. Nature communications 10, 1 (2019), 1--10.Google Scholar
Daniel Fu, Mayee Chen, Frederic Sala, Sarah Hooper, Kayvon Fatahalian, and Christopher Ré. 2020. Fast and threerious: Speeding up weak supervision with triplet methods. In International Conference on Machine Learning. PMLR, 3280--3291.Google Scholar
Huiji Gao, Geoffrey Barbier, and Rebecca Goolsby. 2011. Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intelligent Systems 26, 3 (2011), 10--14.Google ScholarDigital Library
Lise Getoor and Ashwin Machanavajjhala. 2012. Entity resolution: theory, practice and open challenges. PVLDB 5, 12 (2012), 2018--2019.Google ScholarDigital Library
Forest Gregg and Derek Eder. 2022. Dedupe. https://github.com/dedupeio/dedupe (2022).Google Scholar
Thomas N Herzog, Fritz J Scheuren, and William E Winkler. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.Google Scholar
Petr Hruby, Timothy Duff, Anton Leykin, and Tomas Pajdla. 2022. Learning to Solve Hard Minimal Problems. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5532--5542.Google ScholarCross Ref
Shahana Ibrahim and Xiao Fu. 2021. Crowdsourcing via Annotator Co-occurrence Imputation and Provable Symmetric Nonnegative Matrix Factorization. In International Conference on Machine Learning. PMLR, 4544--4554.Google Scholar
jettify. 2021. pytorch-optimizer. https://github.com/jettify/pytorch-optimizer [Online; accessed 10. Dec. 2021].Google Scholar
Roy Jonker and Anton Volgenant. 1987. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38, 4 (1987), 325--340.Google ScholarDigital Library
David Karger, Sewoong Oh, and Devavrat Shah. 2011. Iterative learning for reliable crowdsourcing systems. Advances in neural information processing systems 24 (2011).Google Scholar
Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models: principles and techniques. MIT press.Google ScholarDigital Library
Pradap Konda, Sanjib Das, Paul Suganthan GC, AnHai Doan, Adel Ardalan, Jeffrey R Ballard, Han Li, Fatemah Panahi, Haojun Zhang, Jeff Naughton, et al. 2016. Magellan: Toward building entity matching management systems. Proceedings of the VLDB Endowment 9, 12 (2016), 1197--1208.Google ScholarDigital Library
Pradap Venkatramanan Konda. 2018. Magellan: Toward building entity matching management systems. The University of Wisconsin-Madison.Google Scholar
Hanna Köpcke, Andreas Thor, and Erhard Rahm. 2010. Evaluation of entity resolution approaches on real-world match problems. Proceedings of the VLDB Endowment 3, 1--2 (2010), 484--493.Google ScholarDigital Library
Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, and Zoubin Ghahramani. 2013. Sigma: Simple greedy matching for aligning large knowledge bases. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 572--580.Google ScholarDigital Library
Dong-Hyun Lee et al. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, Vol. 3. 896.Google Scholar
Peng Li, Xiang Cheng, Xu Chu, Yeye He, and Surajit Chaudhuri. 2021. Auto-FuzzyJoin: Auto-Program Fuzzy Similarity Joins Without Labeled Examples. In SIGMOD '21: International Conference on Management of Data, Virtual Event, China, June 20--25, 2021, Guoliang Li, Zhanhuai Li, Stratos Idreos, and Divesh Srivastava (Eds.). ACM, 1064--1076. https://doi.org/10.1145/3448016.3452824Google ScholarDigital Library
Qi Li, Yaliang Li, Jing Gao, Bo Zhao, Wei Fan, and Jiawei Han. 2014. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1187--1198.Google ScholarDigital Library
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep Entity Matching with Pre-Trained Language Models. Proc. VLDB Endow. 14, 1 (Sept. 2020), 50--60. https://doi.org/10.14778/3421424.3421431Google ScholarDigital Library
Yuliang Li, Jinfeng Li, Yoshihiko Suhara, AnHai Doan, and Wang-Chiew Tan. 2020. Deep entity matching with pre-trained language models. arXiv preprint arXiv:2004.00584 (2020).Google Scholar
Yuan Li, Benjamin Rubinstein, and Trevor Cohn. 2019. Exploiting worker correlation for label aggregation in crowdsourcing. In International Conference on Machine Learning. 3886--3895.Google Scholar
Yinghao Li, Pranav Shetty, Lucas Liu, Chao Zhang, and Le Song. 2021. BERTifying the Hidden Markov Model for Multi-Source Weakly Supervised Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 6178--6190.Google ScholarCross Ref
Pierre Lison, Jeremy Barnes, Aliaksandr Hubin, and Samia Touileb. 2020. Named Entity Recognition without Labelled Data: A Weak Supervision Approach. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 1518--1533.Google ScholarCross Ref
Qiang Liu, Jian Peng, and Alexander T Ihler. 2012. Variational inference for crowdsourcing. Advances in neural information processing systems 25 (2012).Google Scholar
Gilles Louppe. 2014. Understanding random forests: From theory to practice. arXiv preprint arXiv:1407.7502 (2014).Google Scholar
Xuezhe Ma. 2020. Apollo: An adaptive parameter-wise diagonal quasi-newton method for nonconvex stochastic optimization. arXiv preprint arXiv:2009.13586 (2020).Google Scholar
megagonlabs. 2022. ditto. https://github.com/megagonlabs/ditto [Online; accessed 6. Jul. 2022].Google Scholar
Sidharth Mudgal, Han Li, Theodoros Rekatsinas, AnHai Doan, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra. 2018. Deep Learning for Entity Matching: A Design Space Exploration. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein (Eds.). ACM, 19--34. https://doi.org/10.1145/3183713.3196926Google ScholarDigital Library
Ebraheem Muhammad, Thirumuruganathan Saravanan, Joty Shafiq, Nan Tang, and Ouzzani Mourad. 2018. Distributed Representations of Tuples for Entity Resolution. Proceedings of the VLDB Endowment 11, 11 (2018).Google Scholar
Fionn Murtagh and Pedro Contreras. 2012. Algorithms for hierarchical clustering: an overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 2, 1 (2012), 86--97.Google ScholarCross Ref
Radford M Neal and Geoffrey E Hinton. 1998. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in graphical models. Springer, 355--368.Google Scholar
Eniola Olaleye. 2022. WINNING APPROACH ML COMPETITION 2022 - Machine Learning Insights - Medium. Medium (Mar 2022). https://medium.com/machine-learning-insights/winning-approach-ml-competition-2022-b89ec512b1bbGoogle Scholar
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. 2017. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 652--660.Google Scholar
Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. 2008. Dataset shift in machine learning. Mit Press.Google ScholarDigital Library
Alexander Ratner, Stephen H. Bach, Henry R. Ehrenberg, Jason Alan Fries, Sen Wu, and Christopher Ré. 2017. Snorkel: Rapid Training Data Creation with Weak Supervision. Proc. VLDB Endow. 11, 3 (2017), 269--282. https://doi.org/10.14778/3157794.3157797Google ScholarDigital Library
Alexander Ratner, Braden Hancock, Jared Dunnmon, Frederic Sala, Shreyash Pandey, and Christopher Ré. 2019. Training complex models with multi-task weak supervision. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4763--4771.Google ScholarDigital Library
Alexander J Ratner, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. 2016. Data programming: Creating large training sets, quickly. Advances in neural information processing systems 29 (2016), 3567--3575.Google ScholarDigital Library
Alexander J. Ratner, Christopher De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. 2016. Data Programming: Creating Large Training Sets, Quickly. In Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5--10, 2016, Barcelona, Spain, Daniel D. Lee, Masashi Sugiyama, Ulrike von Luxburg, Isabelle Guyon, and Roman Garnett (Eds.). 3567--3575. http://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quicklyGoogle Scholar
Vikas C Raykar, Shipeng Yu, Linda H Zhao, Gerardo Hermosillo Valadez, Charles Florin, Luca Bogoni, and Linda Moy. 2010. Learning from crowds. Journal of machine learning research 11, 4 (2010).Google Scholar
S Reddi, Manzil Zaheer, Devendra Sachan, Satyen Kale, and Sanjiv Kumar. 2018. Adaptive methods for nonconvex optimization. In Proceeding of 32nd Conference on Neural Information Processing Systems (NIPS 2018).Google Scholar
Joshua Robinson, Stefanie Jegelka, and Suvrit Sra. 2020. Strength from weakness: Fast learning using weak supervision. In International Conference on Machine Learning. PMLR, 8127--8136.Google Scholar
Salva Rühling Cachay, Benedikt Boecking, and Artur Dubrawski. 2021. End-to-End Weak Supervision. Advances in Neural Information Processing Systems 34 (2021).Google Scholar
Sunita Sarawagi and Anuradha Bhamidipaty. 2002. Interactive deduplication using active learning. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. 269--278.Google ScholarDigital Library
Jaeho Shin, Sen Wu, Feiran Wang, Christopher De Sa, Ce Zhang, and Christopher Ré. 2015. Incremental knowledge base construction using deepdive. In Proceedings of the VLDB Endowment International Conference on Very Large Data Bases, Vol. 8. NIH Public Access, 1310.Google ScholarDigital Library
Michael Stonebraker and Ihab F Ilyas. 2018. Data Integration: The Current Status and the Way Forward. IEEE Data Eng. Bull. 41, 2 (2018), 3--9.Google Scholar
Paroma Varma and Christopher Ré. 2018. Snuba: Automating weak supervision to label training data. In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, Vol. 12. NIH Public Access, 223.Google ScholarDigital Library
Paroma Varma and Christopher Ré. 2018. Snuba: Automating Weak Supervision to Label Training Data. Proc. VLDB Endow. 12, 3 (nov 2018), 223--236. https://doi.org/10.14778/3291264.3291268Google ScholarDigital Library
Matteo Venanzi, John Guiver, Gabriella Kazai, Pushmeet Kohli, and Milad Shokouhi. 2014. Community-based bayesian aggregation models for crowdsourcing. In Proceedings of the 23rd international conference on World wide web. 155--164.Google ScholarDigital Library
Renzhi Wu, Sanya Chaba, Saurabh Sawlani, Xu Chu, and Saravanan Thirumuruganathan. 2020. ZeroER: Entity Resolution using Zero Labeled Examples. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (May 2020), 1149--1164. https://doi.org/10.1145/3318464.3389743Google ScholarDigital Library
Renzhi Wu, Shen-En Chen, Jieyu Zhang, and Xu Chu. 2023. Learning Hyper Label Model for Programmatic Weak Supervision. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=aCQt_BrkSjCGoogle Scholar
Renzhi Wu, Nilaksh Das, Sanya Chaba, Sakshi Gandhi, Duen Horng Chau, and Xu Chu. 2022. A Cluster-then-label Approach for Few-shot Learning with Application to Automatic Image Data Labeling. ACM Journal of Data and Information Quality (JDIQ) 14, 3 (2022), 1--23.Google ScholarDigital Library
Renzhi Wu, Bolin Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, and Jingren Zhou. 2021. Learning to Be a Statistician: Learned Estimator for Number of Distinct Values. Proc. VLDB Endow. 15, 2 (oct 2021), 272--284. https://doi.org/10.14778/3489496.3489508Google ScholarDigital Library
Renzhi Wu, Prem Sakala, Peng Li, Xu Chu, and Yeye He. 2021. Demonstration of Panda: A Weakly Supervised Entity Matching System. Proc. VLDB Endow. 14, 12 (jul 2021), 2735--2738. https://doi.org/10.14778/3476311.3476332Google ScholarDigital Library
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, and Christopher Ré. 2018. Fonduer: Knowledge base construction from richly formatted data. In Proceedings of the 2018 international conference on management of data. 1301--1316.Google ScholarDigital Library
Jieyu Zhang, Yue Yu, Yinghao Li, Yujing Wang, Yaming Yang, Mao Yang, and Alexander Ratner. 2021. WRENCH: A Comprehensive Benchmark for Weak Supervision. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=Q9SKS5k8ioGoogle Scholar
Chen Zhao and Yeye He. 2019. Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In The World Wide Web Conference. 2413--2424.Google ScholarDigital Library
Yudian Zheng, Guoliang Li, Yuanbing Li, Caihua Shan, and Reynold Cheng. 2017. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment 10, 5 (2017), 541--552.Google ScholarDigital Library

Index Terms

Ground Truth Inference for Weakly Supervised Entity Matching
1. Information systems
  1. Data management systems
    1. Information integration
      1. Entity resolution

Recommendations

SPL-LDP: a label distribution propagation method for semi-supervised partial label learning
Abstract
Partial label learning learns from examples represented by a single instance while associated with multiple candidate labels, among which only one valid label resides. However, in real-world applications, collecting candidate label sets for all ...
Read More
Robust Graph Meta-Learning for Weakly Supervised Few-Shot Node Classification
Graph machine learning (Graph ML) models typically require abundant labeled instances to provide sufficient supervision signals, which is commonly infeasible in real-world scenarios since labeled data for newly emerged concepts (e.g., new categorizations ...
Read More
Exclusive Constrained Discriminative Learning for Weakly-Supervised Semantic Segmentation
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

How to import image-level labels as weak supervision to direct the region-level labeling task is the core task of weakly-supervised semantic segmentation. In this paper, we focus on designing an effective but simple weakly-supervised constraint, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the ACM on Management of Data Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Editor:
Divyakant Agrawal
UC Santa Barbara, United States
Issue’s Table of Contents
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 May 2023
Published in pacmmod Volume 1, Issue 1

Permissions
Request permissions about this article.
Request Permissions
Author Tags
entity matching
labeling model
transitivity
weak supervision
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 394
  Total Downloads
- Downloads (Last 12 months)392
- Downloads (Last 6 weeks)26
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ground Truth Inference for Weakly Supervised Entity Matching

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

SPL-LDP: a label distribution propagation method for semi-supervised partial label learning

Robust Graph Meta-Learning for Weakly Supervised Few-Shot Node Classification

Exclusive Constrained Discriminative Learning for Weakly-Supervised Semantic Segmentation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Ground Truth Inference for Weakly Supervised Entity Matching

Proceedings of the ACM on Management of Data

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

SPL-LDP: a label distribution propagation method for semi-supervised partial label learning

Robust Graph Meta-Learning for Weakly Supervised Few-Shot Node Classification

Exclusive Constrained Discriminative Learning for Weakly-Supervised Semantic Segmentation

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media