Abstract
PADS is a declarative data description language that allows data analysts to describe both the physical layout of ad hoc data sources and semantic properties of that data. From such descriptions, the PADS compiler generates libraries and tools for manipulating the data, including parsing routines, statistical profiling tools, translation programs to produce well-behaved formats such as Xml or those required for loading relational databases, and tools for running XQueries over raw PADS data sources. The descriptions are concise enough to serve as "living" documentation while flexible enough to describe most of the ASCII, binary, and Cobol formats that we have seen in practice. The generated parsing library provides for robust, application-specific error handling.
- Abstract syntax description language. http://sourceforge.net/projects/asdl.Google Scholar
- Cisco netflow. http://www.cisco.com/warp/public/732/Tech/nmp/netflow/index.shtml.Google Scholar
- DFDL project. http://forge.gridforum.org/projects/dfdl-wg.Google Scholar
- Erlang bit syntax. http://www.erlang.se/euc/99/binaries.ps.Google Scholar
- Galax user manual. http://www.galaxquery.org/doc.html#manual.Google Scholar
- Hypertext transfer protocol -- HTTP/1.1. http://www.w3.org/Protocols/rfc2616/rfc2616.html.Google Scholar
- PADS user manual. http://www.padsproj.org/doc.html#manual.Google Scholar
- Unicode home page. http://www.unicode.org/.Google Scholar
- G. Back. DataScript - A specification and scripting language for binary data. In Proceedings of Generative Programming and Component Engineering, volume 2487, pages 66--77. LNCS, 2002. Google ScholarDigital Library
- J. Bell, F. Bellegarde, J. Hook, R. B. Kieburtz, A. Kotov, J. Lewis, L. McKinney, D. P. Oliva, T. Sheard, L. Tong, L. Walton, and T. Zhou. Software design for reliability and reuse: A proof-of-concept demonstration. In TRI-Ada '94 proceedings, pages 396--404, 1994. Google ScholarDigital Library
- S. Boag, D. Chamberlin, M. F. Fernández, D. Florescu, J. Robie, and J. Siméon. XQuery 1.0 An XML Query Language, W3C Working Draft, Aug 2004. http://www.w3.org/TR/xquery.Google Scholar
- S. Chandra, N. Heintze, D. MacQueen, D. Oliva, and M. Siff. C-frontend library for SML/NJ. See cm.bell-labs.com/cm/cs/what/smlnj., 1999.Google Scholar
- C. Cortes, K. Fisher, D. Pregibon, A. Rogers, and F. Smith. Hancock: A language for analyzing transactional data streams. ACM Trans. Program. Lang. Syst., 26(2):301--338, 2004. Google ScholarDigital Library
- C. Cortes and D. Pregibon. Giga mining. In KDD, 1998.Google Scholar
- C. Cortes and D. Pregibon. Information mining platform: An infrastructure for KDD rapid deployment. In KDD, 1999. Google ScholarDigital Library
- C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, and O. Spatscheck. Gigascope: High performance network monitoring with an SQL interface. In SIGMOD. ACM, 2002. Google ScholarDigital Library
- O. Dubuisson. ASN.1: Communication between heterogeneous systems. Morgan Kaufmann, 2001. Google ScholarDigital Library
- M. F. Fernández, J. Siméon, B. Choi, A. Marian, and G. Sur. Implementing XQuery 1.0: The Galax experience. In VLDB, pages 1077--1080. ACM, 2003. Google ScholarDigital Library
- G. Fowler, D. Korn, S. North, and P. Vo. The AT&T AST opensource software collection. In Proceedings of the FREENIX Track 2000 Usenix Annual Technical Conference, pages 187--195, 2000. Google ScholarDigital Library
- A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, small-space algorithms for approximate histogram maintenance. In STOC, pages 389--398, 2002. Google ScholarDigital Library
- A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. Strauss. How to summarize the universe: Dynamic maintenance of quantiles. In VLDB, pages 454--465, 2002. Google ScholarDigital Library
- R. Greer. Daytona and the fourth-generation language Cymbal. In A. Delis, C. Faloutsos, and S. Ghandeharizadeh, editors, SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania, USA. ACM Press, 1999. Also available at www.research.att.com/projects/daytona. Google ScholarDigital Library
- S. Guha, P. Indyk, S. Muthukrishnan, and M. Strauss. Histogramming data streams with fast per-item processing. In ICALP, pages 681--692, 2002. Google ScholarDigital Library
- R. Kieburtz, L. McKinney, J. Bell, J. Hook, A. Kotov, J. Lewis, D. Oliva, T. Sheard, I. Smith, and L. Walton. A software engineering experiment in software component generation. In Proceedings of the 18th International Conference on Software Engineering, 1996. Google ScholarDigital Library
- D. G. Korn and K.-P. Vo. SFIO: Safe/fast string/file IO. In Proc. of the Summer '91 Usenix Conference, pages 235--256. USENIX, 1991.Google Scholar
- B. Krishnamurthy and J. Rexford. Web Protocols and Practice. Addison Wesley, 2001.Google Scholar
- B. Krishnamurthy and J. Wang. On network-aware clustering of web clients. In Proceedings of SIGCOMM 2000. ACM, 2000. Google ScholarDigital Library
- B. Krishnamurthy and C. Wills. Improving web experience by client characterization driven server adaptation. In Proceedings of WWW 2002. ACM, 2002. Google ScholarDigital Library
- P. McCann and S. Chandra. PacketTypes: Abstract specification of network protocol messages. In ACM Conference of Special Interest Group on Data Communications (SIGCOMM), pages 321--333, August 1998. Google ScholarDigital Library
Index Terms
- PADS: a domain-specific language for processing ad hoc data
Recommendations
The next 700 data description languages
Proceedings of the 2006 POPL ConferenceIn the spirit of Landin, we present a calculus of dependent types to serve as the semantic foundation for a family of languages called data description languages. Such languages, which include pads, datascript, and packettypes, are designed to ...
PADS/ML: a functional data description language
Proceedings of the 2007 POPL ConferenceMassive amounts of useful data are stored and processed in ad hoc formats for which common tools like parsers, printers, query engines and format converters are not readily available. In this paper, we explain the design and implementation of PADS/ML , ...
The PADS project: an overview
ICDT '11: Proceedings of the 14th International Conference on Database TheoryThe goal of the PADS project, which started in 2001, is to make it easier for data analysts to extract useful information from ad hoc data files. This paper does not report new results, but rather gives an overview of the project and how it helps bridge ...
Comments