research-article

Magellan: toward building entity matching management systems over data science stacks

Authors:
Pradap Konda

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Sanjib Das

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Paul Suganthan G. C.

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
AnHai Doan

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Adel Ardalan

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Jeffrey R. Ballard

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Han Li

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Fatemah Panahi

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Haojun Zhang

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Jeff Naughton

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Shishir Prasad

@WalmartLabs

@WalmartLabs
View Profile

,
Ganesh Krishnan

@WalmartLabs

@WalmartLabs
View Profile

,
Rohit Deep

@WalmartLabs

@WalmartLabs
View Profile

,
Vijay Raghavendra

@WalmartLabs

@WalmartLabs
View Profile

Proceedings of the VLDB Endowment Volume 9 Issue 13pp 1581–1584https://doi.org/10.14778/3007263.3007314

Published:01 September 2016Publication History

Proceedings of the VLDB Endowment

Abstract

Entity matching (EM) has been a long-standing challenge in data management. Most current EM works, however, focus only on developing matching algorithms. We argue that far more efforts should be devoted to building EM systems. We discuss the limitations of current EM systems, then present Magellan, a new kind of EM systems that addresses these limitations. Magellan is novel in four important aspects. (1) It provides a how-to guide that tells users what to do in each EM scenario, step by step. (2) It provides tools to help users do these steps; the tools seek to cover the entire EM pipeline, not just matching and blocking as current EM systems do. (3) Tools are built on top of the data science stacks in Python, allowing Magellan to borrow a rich set of capabilities in data cleaning, IE, visualization, learning, etc. (4) Magellan provide a powerful scripting environment to facilitate interactive experimentation and allow users to quickly write code to "patch" the system. We have extensively evaluated Magellan with 44 students and users at various organizations. In this paper we propose demonstration scenarios that show the promise of the Magellan approach.

References

P. Christen. Febrl: A freely available record linkage system with a graphical user interface. HDKM, 2008. Google ScholarDigital Library
P. Christen. Data Matching. Springer, 2012.Google ScholarCross Ref
M. Dallachiesa, A. Ebaid, A. Eldawy, A. Elmagarmid, I. F. Ilyas, M. Ouzzani, and N. Tang. Nadeef: A commodity data cleaning system. SIGMOD, 2013. Google ScholarDigital Library
A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007. Google ScholarDigital Library
M. Fortini, M. Scannapieco, L. Tosco, and T. Tuoto. Towards an open source toolkit for building record linkage workflows. In In Proc. of the SIGMOD Workshop on Information Quality in Information Systems, 2006.Google Scholar
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, 2011. Google ScholarDigital Library
P. Konda et al. Magellan: Toward building entity matching management systems. In UW-Madison Technical Report, 2016.Google Scholar

Recommendations

Magellan: toward building entity matching management systems

Entity matching (EM) has been a long-standing challenge in data management. Most current EM works focus only on developing matching algorithms. We argue that far more efforts should be devoted to building EM systems. We discuss the limitations of ...
Read More
Magellan: toward building ecosystems of entity matching solutions

Entity matching (EM) finds data instances that refer to the same real-world entity. In 2015, we started the Magellan project at UW-Madison, jointly with industrial partners, to build EM systems. Most current EM systems are stand-alone monoliths. In ...
Read More
Building Websites with DotNetNuke 5
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 9, Issue 13
September 2016
378 pages
ISSN:2150-8097
Editor:
Surajit Chaudhuri
Microsoft Research
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2016
Published in pvldb Volume 9, Issue 13
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 288
  Total Downloads
- Downloads (Last 12 months)66
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Magellan: toward building entity matching management systems over data science stacks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Magellan: toward building entity matching management systems

Magellan: toward building ecosystems of entity matching solutions

Building Websites with DotNetNuke 5

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Magellan: toward building entity matching management systems over data science stacks

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Magellan: toward building entity matching management systems

Magellan: toward building ecosystems of entity matching solutions

Building Websites with DotNetNuke 5

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media