Abstract
We address the problem of performing semantic transformations on strings, which may represent a variety of data types (or their combination) such as a column in a relational table, time, date, currency, etc. Unlike syntactic transformations, which are based on regular expressions and which interpret a string as a sequence of characters, semantic transformations additionally require exploiting the semantics of the data type represented by the string, which may be encoded as a database of relational tables. Manually performing such transformations on a large collection of strings is error prone and cumbersome, while programmatic solutions are beyond the skill-set of end-users. We present a programming by example technology that allows end-users to automate such repetitive tasks.
We describe an expressive transformation language for semantic manipulation that combines table lookup operations and syntactic manipulations. We then present a synthesis algorithm that can learn all transformations in the language that are consistent with the user-provided set of input-output examples. We have implemented this technology as an add-in for the Microsoft Excel Spreadsheet system and have evaluated it successfully over several benchmarks picked from various Excel help-forums.
- A. Arasu, S. Chaudhuri, and R. Kaushik. Transformation-based framework for record matching. In ICDE, pages 40--49, 2008. Google ScholarDigital Library
- A. Arasu, S. Chaudhuri, and R. Kaushik. Learning string transformations from examples. PVLDB, 2(1):514--525, 2009. Google ScholarDigital Library
- A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, pages 89--103, 2010. Google ScholarDigital Library
- R. Dhamankar, Y. Lee, A. Doan, A. Y. Halevy, and P. Domingos. iMAP: Discovering complex mappings between database schemas. In SIGMOD, pages 383--394, 2004. Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Trans. Knowl. Data Eng., 19(1):1--16, 2007. Google ScholarDigital Library
- M. Gualtieri. Deputize end-user developers to deliver business agility and reduce costs. In Forrester Report for Application Development and Program Management Professionals, April 2009.Google Scholar
- S. Gulwani. Dimensions in program synthesis. In PPDP, pages 13--24, 2010. Google ScholarDigital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317--330, 2011. Google ScholarDigital Library
- S. Gulwani, W. R. Harris, and R. Singh. Spreadsheet data manipulation using examples. In Communications of the ACM, 2012. To Appear. Google ScholarDigital Library
- W. R. Harris and S. Gulwani. Spreadsheet table transformations from examples. In PLDI, pages 317--328, 2011. Google ScholarDigital Library
- S. Jha, S. Gulwani, S. A. Seshia, and A. Tiwari. Oracle-guided component-based program synthesis. In ICSE, pages 215--224, 2010. Google ScholarDigital Library
- N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: similarity measures and algorithms. In SIGMOD, pages 802--803, 2006. Google ScholarDigital Library
- T. Lau. Why programming-by-demonstration systems fail: Lessons learned for usable ai. AI Magazine, 30(4):65--67, 2009.Google ScholarCross Ref
- T. Lau, S. Wolfman, P. Domingos, and D. Weld. Programming by demonstration using version space algebra. Machine Learning, 53(1-2):111--156, 2003. Google ScholarDigital Library
- R. C. Miller and B. A. Myers. Interactive simultaneous editing of multiple text regions. In USENIX Annual Technical Conference, General Track, pages 161--174, 2001. Google ScholarDigital Library
- R. Singh and S. Gulwani. Learning semantic string transformations from examples. Technical Report MSR-TR-2012-5, Microsoft Research, 2012.Google ScholarDigital Library
- R. Singh and S. Gulwani. Synthesizing number transformations from input-output examples. In CAV, 2012. To Appear. Google ScholarDigital Library
- Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. Query by output. In SIGMOD, pages 535--548, 2009. Google ScholarDigital Library
- R. H. Warren and F. W. Tompa. Multi-column substring matching for database schema translation. In VLDB, pages 331--342, 2006. Google ScholarDigital Library
Recommendations
Learning syntactic program transformations from examples
ICSE '17: Proceedings of the 39th International Conference on Software EngineeringAutomatic program transformation tools can be valuable for programmers to help them with refactoring tasks, and for Computer Science students in the form of tutoring systems that suggest repairs to programming assignments. However, manually creating ...
Spreadsheet table transformations from examples
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationEvery day, millions of computer end-users need to perform tasks over large, tabular data, yet lack the programming knowledge to do such tasks automatically. In this work, we present an automatic technique that takes from a user an example of how the ...
Synthesizing number transformations from input-output examples
CAV'12: Proceedings of the 24th international conference on Computer Aided VerificationNumbers are one of the most widely used data type in programming languages. Number transformations like formatting and rounding present a challenge even for experienced programmers as they find it difficult to remember different number format strings ...
Comments