ABSTRACT
We propose regular expression types as a foundation for XML processing languages. Regular expression types are a natural generalization of Document Type Definitions (DTDs), describing structures in XML documents using regular expression operators (i.e., *, ?, |, etc.) and supporting a simple but powerful notion of subtyping.The decision problem for the subtype relation is EXPTIME-hard, but it can be checked quite efficiently in many cases of practical interest. The subtyping algorithm developed here is a variant of Aiken and Murphy's set-inclusion constraint solver, to which are added several optimizations and two new properties: (1) our algorithm is provably complete, and (2) it allows a useful "subtagging" relation between nodes with different labels in XML trees.
- 1.A. Aiken and B. R. Murphy. Implementing regular tree expressions. In J. Hughes, editor, Functional Programming Languages and Computer Architecture1991, volume 523 of Lecture Notes in Computer Science. Springer-Verlag, 1991.]] Google ScholarDigital Library
- 2.A. Aiken and E. L. Wimmers. Solving systems of set constraints (extended abstract). In Proceedings, Seventh Annual IEEE Symposium on Logic in Computer Science, pages 329{340, Santa Cruz, California, 22-25 June 1992. IEEE Computer Society Press.]]Google ScholarCross Ref
- 3.R. M. Amadio and L. Cardelli. Subtyping recursive types. ACM Transactions on Programming Languages and Systems, 15(4):575-631, 1993. Preliminary version in POPL '91 (pp. 104-118); also DEC Systems Research Center Research Report number 62, August 1990.]] Google ScholarDigital Library
- 4.Amy Felty, Elsa Gunter, John Hannan, Dale Miller, Gopalan Nadathur and A. Scedrov. Lambda prolog: An extended logic programming language. In E. L. R. Overbeek, editor, Proceedings on the 9th International Conference onAutomated Deduction, volume 310 of LNCS, pages 754-755, Berlin, May 1988. Springer.]] Google ScholarDigital Library
- 5.L. Augustsson. Cayenne | a language with dependent types. In Proceedings of the ACM SIGPLAN International Conference onFunctional Programming (ICFP '98), volume 34(1) of ACM SIGPLAN Notices, pages 239-250. ACM, June 1999.]] Google ScholarDigital Library
- 6.M. Brandt and F. Henglein. Coinductive axiomatization of recursive type equality and subtyping. In R. Hindley, editor, Proc. 3d Int'l Conf. on Typed Lambda Calculi and Applications (TLCA), Nancy, France, April 2-4, 1997, volume 1210 of Lecture Notes in Computer Science (LNCS), pages 63-81. Springer-Verlag, Apr. 1997. Full version in Fundamenta Informaticae, Vol. 33, pp. 309-338, 1998.]] Google ScholarDigital Library
- 7.P. Buneman and B. Pierce. Union types for semistructured data. In Proceedings of the International Database Programming Languages Workshop, Sept. 1999. Also available as University of Pennsylvania Dept. of CIS technical report MS-CIS-99-09.]] Google ScholarDigital Library
- 8.S. S. Chawathe. Comparing hierarchical data in external memory. InProceedings of the Twenty-~fth International Conference onVery Large Data Bases, pages 90-101, Edinburgh, Scotland, U.K., Sept. 1999.]] Google ScholarDigital Library
- 9.S. Cluet and J. Simeon. Using YAT tobuildaweb server. In Intl. Workshop on the Web and Databases (WebDB), 1998.]] Google ScholarDigital Library
- 10.H. Common, M. Dauchet, R. Gilleron, F. Jacquemard, D. Lugiez, S. Tison, and M. Tommasi. Tree automata techniques and applications. Draft book; available electronically on http:// www.grappa.univ-lille3.fr/tata.]]Google Scholar
- 11.M. Fahndrich and A. Aiken. Making set-constraint program analyses scale. Technical Report CSD-96-917, University of California, Berkeley, Sept. 1996.]] Google ScholarDigital Library
- 12.V. Gapeyev, M. Levin, and B. Pierce. Recursive subtyping revealed. In Proceedings of the International Conference onFunctional Programming (ICFP), 2000.]] Google ScholarDigital Library
- 13.J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.]] Google ScholarDigital Library
- 14.H. Hosoya and B. Pierce. Tree automata and pattern matching, July 2000. Available through http:// www.cis.upenn.edu/~hahosoya/papers/ tapat-full.ps.]]Google Scholar
- 15.H. Hosoya and B. C. Pierce. XDuce: A typed XML processing language. In Proceedings of Third International Workshop on the Web and Databases (WebDB2000), May 2000.]] Google ScholarDigital Library
- 16.H. Hosoya, J. Vouillon, and B. C. Pierce. Regular expression types for XML. Technical report, University ofPennsylvania, 2000.]]Google Scholar
- 17.X. Leroy, J. Vouillon, D. Doligez, et al. The Objective Caml system. Software and documentation available on the Web, http://pauillac.inria.fr/ocaml/, 1996.]]Google Scholar
- 18.E. Meijer and M. Shields. XMLambda: A Functional Programming Language for Constructing and Manipulating XML Documents. page 13. Submitted to USENIX 2000 Technical Conference.]]Google Scholar
- 19.T. Milo, D. Suciu, and V. Vianu. Typechecking for XML transformers. In Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages 11-22. ACM, May 2000.]] Google ScholarDigital Library
- 20.H. Seidl. Deciding equivalence of ~nite tree automata. SIAM Journal of Computing, 19(3):424-437, June 1990.]] Google ScholarDigital Library
- 21.M. Wallace and C. Ranciman. Haskell and XML: Generic combinators or type-based translation? In Proceedings of the Fourth ACM SIGPLAN International Conference onFunctional Programming (ICFP`99), volume 34-9 of ACM Sigplan Notices, pages 148-159, N.Y., Sept. 27-29 1999. ACM Press.]] Google ScholarDigital Library
- 22.Extensible markup language (XMLTM). http://www.w3.org/XML/.]]Google Scholar
- 23.XML Schema Part 0: Primer, W3C Working Draft. http://www.w3.org/TR/xmlschema-0/, 2000.]]Google Scholar
Index Terms
- Regular expression types for XML
Recommendations
Regular expression types for XML
We propose regular expression types as a foundation for XML processing languages. Regular expression types are a natural generalization of Document Type Definitions (DTDs), describing structures in XML documents using regular expression operators (i.e., ...
Regular expression types for XML
We propose regular expression types as a foundation for statically typed XML processing languages. Regular expression types, like most schema languages for XML, introduce regular expression notations such as repetition (*), alternation (|), etc., to ...
Regular expression pattern matching for XML
POPL '01: Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languagesWe propose regular expression pattern matching as a core feature for programming languages for manipulating XML (and similar tree-structured data formats). We extend conventional pattern-matching facilities with regular expression operators such as ...
Comments