skip to main content
10.1145/1978942.1979444acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Wrangler: interactive visual specification of data transformation scripts

Published:07 May 2011Publication History

ABSTRACT

Though data analysis tools continue to improve, analysts still expend an inordinate amount of time and effort manipulating data and assessing data quality issues. Such "data wrangling" regularly involves reformatting data values or layout, correcting erroneous or missing values, and integrating multiple data sources. These transforms are often difficult to specify and difficult to reuse across analysis tasks, teams, and tools. In response, we introduce Wrangler, an interactive system for creating data transformations. Wrangler combines direct manipulation of visualized data with automatic inference of relevant transforms, enabling analysts to iteratively explore the space of applicable operations and preview their effects. Wrangler leverages semantic data types (e.g., geographic locations, dates, classification codes) to aid validation and type conversion. Interactive histories support review, refinement, and annotation of transformation scripts. User study results show that Wrangler significantly reduces specification time and promotes the use of robust, auditable transforms instead of manual editing.

Skip Supplemental Material Section

Supplemental Material

paper355.m4v

m4v

12.5 MB

References

  1. A. Arasu and H. Garcia-Molina. Extracting structured data from web pages. In ACM SIGMOD, pages 337--348, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. F. Blackwell. SWYN: A visual representation for regular expressions. In Your Wish is my Command: Programming by Example, pages 245--270, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Chiticariu, P. G. Kolaitis, and L. Popa. Interactive generation of integrated schemas. In ACM SIGMOD, pages 833--846, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, Inc., New York, NY, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In ACM SIGMOD, pages 240--251, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE TKDE, 19(1):1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Fisher and R. Gruber. Pads: a domain-specific language for processing ad hoc data. In ACM PLDI, pages 295--304, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Galhardas, D. Florescu, D. Shasha, and E. Simon. Ajax: an extensible data cleaning tool. In ACM SIGMOD, page 590, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. M. Haas, M. A. Hernández, H. Ho, L. Popa, and M. Roth. Clio grows up: from research prototype to industrial tool. In ACM SIGMOD, pages 805--810, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. M. Hellerstein. Quantitative data cleaning for large databases, 2008. White Paper, United Nations Economic Commission for Europe.Google ScholarGoogle Scholar
  11. V. Hodge and J. Austin. A survey of outlier detection methodologies. Artif. Intell. Rev., 22(2):85--126, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Horvitz. Principles of mixed-initiative user interfaces. In ACM CHI, pages 159--166, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Huynh and S. Mazzocchi. Google Refine. http://code.google.com/p/google-refine/.Google ScholarGoogle Scholar
  14. D. F. Huynh, R. C. Miller, and D. R. Karger. Potluck: semi-ontology alignment for casual users. In ISWC, pages 903--910, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Z. G. Ives, C. A. Knoblock, S. Minton, M. Jacob, P. Pratim, T. R. Tuchinda, J. Luis, A. Maria, and M. C. Gazen. Interactive data integration through smart copy & paste. In CIDR, 2009.Google ScholarGoogle Scholar
  16. H. Kang, L. Getoor, B. Shneiderman, M. Bilgic, and L. Licamele. Interactive entity resolution in relational data: A visual analytic tool and its evaluation. IEEE TVCG, 14(5):999--1014, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian. SchemaSQL: An extension to SQL for multidatabase interoperability. ACM Trans. Database Syst., 26(4):476--519, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Lin, J. Wong, J. Nichols, A. Cypher, and T. A. Lau. End-user programming of mashups with vegemite. In IUI, pages 97--106, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. C. Miller and B. A. Myers. Interactive simultaneous editing of multiple text regions. In USENIX Tech. Conf., pages 161--174, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. A. Norman. The Design of Everyday Things. Basic Books, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. The VLDB Journal, 10:334--350, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. G. Robertson, M. P. Czerwinski, and J. E. Churchill. Visualization of mappings between schemas. In ACM CHI, pages 431--439, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Scaffidi, B. Myers, and M. Shaw. Intelligently creating and recommending reusable reformatting rules. In ACM IUI, pages 297--306, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Soderland. Learning information extraction rules for semi-structured and free text. Mach. Learn., 34(1--3):233--272, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Tuchinda, P. Szekely, and C. A. Knoblock. Building mashups by example. In ACM IUI, pages 139--148, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Wrangler: interactive visual specification of data transformation scripts

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '11: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
      May 2011
      3530 pages
      ISBN:9781450302289
      DOI:10.1145/1978942

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 May 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '11 Paper Acceptance Rate410of1,532submissions,27%Overall Acceptance Rate6,199of26,314submissions,24%

      Upcoming Conference

      CHI '24
      CHI Conference on Human Factors in Computing Systems
      May 11 - 16, 2024
      Honolulu , HI , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader