Abstract
PSDG is a parallel synthetic data generator designed to generate "industrial sized" data sets quickly using cluster computing. PSDG depends on SDDL, a synthetic data description language that provides flexibility in the types of data we can generate.
- Turbo Data, http://www.turbodata.caGoogle Scholar
- GS Data Generator, http://www.GSApps.comGoogle Scholar
- DTM Data Generator, http://www.sqledit.comGoogle Scholar
- RowGen, http://www.iri.comGoogle Scholar
- N. Bruno and S. Chaudhuri. "Flexible Database Generators," Proceedings of the 31st VLDB Conference, pp. 1097--1107, 2005. Google ScholarDigital Library
- K. Houkjaer, K. Torp, and R. Wind. "Simple and Realistic Data Generation," Proceedings on Very Large Data Bases, 2006, pp. 1243--1246. Google ScholarDigital Library
- P. Lin et al. "Development of a Synthetic Data Set Generator for Building and Testing Information Discovery Systems," Proceedings of the Third International Conference on Information Technology: New Generations, IEEE Computer Society, Las Vegas, USA, April 10--12, 2006, pp. 707--712. Google ScholarDigital Library
- J. Gray et al. "Quickly Generating Billion-Record Synthetic Databases," Proceedings of the ACM International Conference on Management of Data (SIGMOD), 1994. Google ScholarDigital Library
- P. E. O'Neil. The Set-Query Benchmark. www.cs.umb.edu/~poneil/SetQBM.pdfGoogle Scholar
- Transaction Processing Performance Council, http://www.tpc.org/tpccGoogle Scholar
- J. Stephens and M. Poess, "MUDD: a Multi-Dimensional Data Generator", International Workshop on Software and Performance, Redwood City, California, January 2004, pp. 104--109. Google ScholarDigital Library
- KRDataGeneration home page, http://www.datageneration.com, accessed January 2007.Google Scholar
- University of Arkansas Synthetic Data Generation home page, http://www.csce.uark.edu/~cwt/SDG.Google Scholar
Index Terms
- A parallel general-purpose synthetic data generator
Recommendations
A Massively Parallel Digital Processor for Spotlight Synthetic Aperture Radar
Near-real-time digital formation of large synthetic aper ture radar SAR images requires the computational throughput that only a dedicated special processor or a massively parallel computer can offer. This article documents the implementation of digital ...
Data-Parallel Programming on MIMD Computers
The implementation of two compilers for the data-parallel programming language Dataparallel C is described. One compiler generates code for Intel and nCUBE hypercube multicomputers; the other generates code for Sequent multiprocessors. A suite of ...
Generating Synthetic Data to Match Data Mining Patterns
Synthetic data sets can be useful for repeatable regression testing and for providing realistic — but not real — data to third parties for testing new software. In some cases, it is desirable that the synthetic data set be realistic, preserving various ...
Comments