Abstract
The historical, current, and future trends in knowledge discovery from data in astronomy are presented here. The story begins with a brief history of data gathering and data organization. A description of the development of new information science technologies for astronomical discovery is then presented. Among these are e-Science and the virtual observatory, with its data discovery, access, display, and integration protocols; astroinformatics and data mining for exploratory data analysis, information extraction, and knowledge discovery from distributed data collections; new sky surveys’ databases, including rich multivariate observational parameter sets for large numbers of objects; and the emerging discipline of data-oriented astronomical research, called astroinformatics. Astroinformatics is described as the fourth paradigm of astronomical research, following the three traditional research methodologies: observation, theory, and computation/modeling. Astroinformatics research areas include machine learning, data mining, visualization, statistics, semantic science, and scientific data management. Each of these areas is now an active research discipline, with significant science-enabling applications in astronomy. Research challenges and sample research scenarios are presented in these areas, in addition to sample algorithms for data-oriented research. These information science technologies enable scientific knowledge discovery from the increasingly large and complex data collections in astronomy. The education and training of the modern astronomy student must consequently include skill development in these areas, whose practitioners have traditionally been limited to applied mathematicians, computer scientists, and statisticians. Modern astronomical researchers must cross these traditional discipline boundaries, thereby borrowing the best of breed methodologies from multiple disciplines. In the era of large sky surveys and numerous large telescopes, the potential for astronomical discovery is equally large, and so the data-oriented research methods, algorithms, and techniques that are presented here will enable the greatest discovery potential from the ever-growing data and information resources in astronomy.
Somewhere, something incredible is waiting to be known.
Carl Sagan
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- 2MASS :
-
2-Micron All-Sky Survey
- AAO :
-
Anglo-Australian Observatory
- ADAC :
-
Astronomical Data Archives Center (Japan)
- ADASS :
-
Astronomical Data Analysis Software and Systems
- ADS :
-
Astronomical Data Center
- ApJS :
-
Astrophysical Journal Supplement
- ANN :
-
Artificial neural network
- BD :
-
Bonner Durchmusterung
- CADC :
-
Canadian Astronomy Data Center
- CDS :
-
Center de Donnees astronomique de Strasbourg (France)
- GCVS :
-
General Catalog of Variable Stars
- DDM :
-
Distributed data mining
- DMD :
-
Distributed mining of data
- DOE :
-
Department of Energy
- DSS :
-
Digital Sky Survey
- EDA :
-
Exploratory data analysis
- HD :
-
Henry Draper
- HEASARC :
-
High Energy Astrophysics Science Archive Research Center
- IPAC :
-
Infrared Processing and Analysis Center
- IRSA :
-
Infrared Science Archive
- IVAO :
-
International Virtual Observatory Alliance
- KDD :
-
Knowledge Discovery in Databases
- KNN :
-
K-nearest neighbors
- LEDAS :
-
Leicester Database and Archive Service (UK)
- LSST :
-
Large Synoptic Survey Telescope
- MAST :
-
Multimission Archive at Space Telescope
- MDD :
-
Mining of distributed data
- ML :
-
Machine learning
- NASA :
-
National Aeronautics and Space Administration
- NED :
-
NASA/IPAC Extragalactic Database
- NGC :
-
New General Catalog
- NSF :
-
National Science Foundation
- NVO :
-
National Virtual Observatory
- Pan-STARRS :
-
Panoramic Survey Telescope and Rapid Response System
- PB :
-
Petabyte
- PDMP :
-
Project Data Management Plan
- PI :
-
Principal investigator
- RA/Dec :
-
Right ascension and declination
- RDF :
-
Resource Description Framework
- SAO :
-
Smithsonian Astrophysical Observatory
- SIMBAD :
-
Set of Identifications, Measurements, and Bibliography for Astronomical Data
- SDSS :
-
Sloan Digital Sky Survey
- SVM :
-
Support vector machine
- TB :
-
Terabyte
- VAO :
-
Virtual Astronomy Observatory
- TMSS :
-
Two-Micron Sky Survey
- VO :
-
Virtual observatory
- WWW :
-
World Wide Web
- XML :
-
eXtensible Markup Language
- ADS :
- CDS :
- HEASARC :
- IRSA :
- IVOA :
- MAST :
- NED :
- SDSS :
- SIMBAD :
- VAO :
References
Abell, G. O. 1958, ApJS, 3, 211
Ball, N. M., & Brunner, R. J. 2010, Data mining and machine learning in astronomy. Int. J. Mod. Phys. D, 19(7), 1049
Ball, N. M., & McConnell, S. 2011, IVOA KDD-IG: A User Guide for Data Mining in Astronomy, downloaded from http://www.ivoa.net/cgi-bin/twiki/bin/view/IVOA/IvoaKDDguide
Ball, N. M., et al. 2006, ApJ, 650, 497
Bayes, Rev. T. 1763, An essay toward solving a problem in the Doctrine of chances. Philos. Trans. R. Soc. Lond., 53, 370
Bazell, D., & Peng, Y. 1998, ApJS, 116, 47
Becciani, U., et al. 2010, Publ. ASP, 122, 119
Becker, A. C. 2008, AN, 329, 280
Becla, J., et al. 2006, Designing a multi-petabyte database for LSST, in Observatory Operations: Strategies, Processes, and Systems, Proc. SPIE, Vol. 6270, ed. D. R. Silva, & R. E. Doxsey. doi:10.1117/12.671721
Bell, G., Gray, J., & Szalay, A. 2006, Petascale computational systems. IEEE Comput, 39(1), 110
Bennett, A. S. 1962, Mem. R. Astron. Soc., 68, 163
Bhaduri, K., Das, K., Liu, K., Kargupta, H., & Ryan, J. 2008 (Release 1.8), downloaded from http://www.cs.umbc.edu/~hillol/DDMBIB/
Bhaduri, K., et al. 2011, J. Stat. Anal. Data Min., 4(3), 336
Bloom, J. S., Butler, N. R., & Perley, D. A. 2007, Gamma-ray bursts, classified physically, in AIP Conf. Proc., Vol. 1000 (Melville, NY: American Institute of Physics), Gamma-Ray Bursts, 11
Bloom, J. S., et al. 2008, Towards a real-time transient classification engine. Astron. Nach., 329, 284
Boch, T., Fernique, P., & Bonnarel, F. 2008, Astronomical Data Analysis Software and Systems (ADASS) XVI, ASP Conf. Ser. 394 (Chicago: Astronomical Society of the Pacific), 217
Borne, K. 2001a, Science user scenarios for a VO design reference mission: science requirements for data mining, in Virtual Observatories of the Future, ASP Conf. Ser. 225 (Chicago: Astronomical Society of the Pacific), 333
Borne, K. 2001b, Data mining in astronomical databases, in Mining the Sky (Berlin/Heidelberg: Springer-Verlag), 671
Borne, K. 2003, SPIE Data Mining and Knowledge Discovery, Vol. 5098 (Bellingham: SPIE), 211
Borne, K. D. 2007, Astroinformatics: the new eSc- ience paradigm for astronomy research and education. Microsoft eScience Workshop at RENCI, downloaded from http://research.microsoft.com/en-us/um/redmond/events/escience2007/escienceagenda_posters.aspx
Borne, K. 2008a, A machine learning classification broker for the LSST transient database. Astron. Nach., 329, 255
Borne, K. 2008b, Data science challenges from distributed petascale astronomical sky surveys, in DOE Workshop on Mathematical Analysis of Petascale Data, downloaded from http://www.orau.gov/mathforpetascale/slides/Borne.pdf
Borne, K. 2009a, Scientific data mining in astronomy, in Next Generation Data Mining (Chapman and Hall/Boca Raton: CRC), 91
Borne, K. 2009b, The VO and Large Surveys: What More Do We Need? downloaded from http://www.astro.caltech.edu/$\sim$george/AIworkshop/Borne.pdf
Borne, K. 2009c, The Zooniverse: Advancing Science through User-Guided Learning in Massive Data Streams, downloaded from http://www.kd2u.org/NGDM09/schedule_NGDM/schedule.htm
Borne, K. 2010, Astroinformatics: data-oriented astronomy research and education. Earth Sci. Inform., 3, 5
Borne, K., & Vedachalam, A. 2012, Surprise detection in multivariate astronomical data, in Statistical Challenges in Modern Astronomy V, ed. E. D. Feigelson, & G. J. Babu (New York: Springer), 275–290
Borne, K., Becla, J., Davidson, I., Szalay, A., & Tyson, J. A. 2008, The LSST data mining research agenda, in AIP Conference Proceedings for Classification and Discovery in Large Astronomical Surveys, Vol. 1082 (Melville, NY: American Institute of Physics), 347
Borne, K., et al. 2009, Astroinformatics: a 21st century approach to astronomy, in ASTRO2010 Decadal Survey in Astronomy and Astrophysics position paper, arXiv:0909.3892v1
Breiman, L. 2001, Mach. Learn., 45(1), 5
Brunner, R., Djorgovski, S. G., Prince, T. A., & Szalay, A. S. 2002, Massive datasets in astronomy, in The Handbook of Massive Data Sets, ed. J. Abello, P. M. Pardalos, & M. Resende (Norwell: Kluwer), 931–979
Budavari, T., et al. 2009, ApJ, 694, 1281
Carliles, S., et al. 2010, ApJ, 712, 511
Codd, E. F. 1970, Commun. ACM, 13(6), 377
Das, K., et al. 2009, in SIAM Conference on Data Mining SDM09, 247–258, downloaded from http://www.siam.org/proceedings/datamining/2009/dm09.php
Debosscher, J., et al. 2007, Automated supervised classification of variable stars. I. Methodology. A&A, 475, 1159
Drake, A. J., et al. 2009, ApJ, 696, 870
Djorgovski, S. G., et al. 2008, AN, 329, 263
Djorgovski, S. G., & Davis, M. 1987, Fundamental properties of elliptical galaxies. ApJ, 313, 59
Djorgovski, S. G., et al. 2001, Exploration of parameter spaces in a virtual observatory, in Mining the Sky, Proc. SPIE, Vol. 4477, ed. J.-L. Starck, & F. Murtagh (Bellingham: SPIE), 43
DOE-1, 2007, Visualization and Knowledge Discovery: Report from the DOE/ASCR Workshop on Visual Analysis and Data Exploration at Extreme Scale, downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs
DOE-2, 2008, Mathematics for Analysis of Petascale Data Workshop Report, downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf
DOE-3, 2008, Applied Mathematics at the U.S. Department of Energy: Past, Present and a View to the Future, http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/Brown_Report\_May\_08.pdf
Dolensky, M. 2004, Applicability of emerging resource discovery standards to the VO, in Toward an International Virtual Observatory, ed. P. J. Quinn, & K. M. Gorski (Berlin: Springer), 265
Dressler, A., et al. 1987, Spectroscopy and photometry of elliptical galaxies. I – a new distance estimator. ApJ, 313, 42
Dutta, H., et al. 2007, in SIAM Conference Data Mining SDM07, 473–476, downloaded from http://www.siam.org/proceedings/datamining/2007/dm07.php
Dutta, H., et al. 2009, in IEEE International Conference on Data Mining, Workshops, 495–500, downloaded from http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5360457
Eastman, T., Borne, K., Green, J, Grayzeck, E., McGuire, R., Sawyer, D. 2005, eScience and archiving for space science. Data Sci. J., 4, 67–76
Euchner, F., et al. 2004, Astronomical Data Analysis Software and Systems (ADASS) XIII, ASP Conf. Ser. 314 (Chicago: Astronomical Society of the Pacific), 578
Fellhauer, M., & Heggie, D. 2005, A&A, 435, 875
Fabbiano, G., et al. 2010, Recommendations of the VAO Science Council, arXiv:1006.2168v1, http://www.aui.edu/vao.php?q=science.council
Fortson, L., et al. 2011, Galaxy zoo: morphological classification and citizen science, in Advances in Machine Learning and Data Mining for Astronomy, ed. M. J. Way, J. D. Scargle, K. M. Ali, & A. N. Srivastava (Chapman and Hall/Boca Raton: CRC)
Gardner, J. P., Connolly, A., & McBride, C. 2007, Astronomical Data Analysis Software and Systems (ADASS) XVI, ASP Conf. Ser. 376 (Chicago: Astronomical Society of the Pacific), 69
Giannella, C., Dutta, H., Borne, K., Wolff, R., & Kargupta, H. 2006, in SIAM Conference on Data Mining SDM06, Workshop on Scientific Data Mining, downloaded from http://www.siam.org/meetings/sdm06/workproceed/Scientific{\%}20Datasets/
Graham, M. J. 2009, Astronomical Data Analysis Software and Systems (ADASS) XVIII, ASP Conf. Ser. 411, 165
Graham, M. J., et al. 2005, Astronomical Data Analysis Software and Systems (ADASS) XIV, ASP Conf. Ser. 347 (Chicago: Astronomical Society of the Pacific), 394
Graham, M. J., Fitzpatrick, M. J., & McGlynn, T. A. (eds) 2008, The National Virtual Observatory: Tools and Techniques for Astronomical Research, ASP Conf. Ser. 382 (Chicago: Astronomical Society of the Pacific)
Graham, M. J. 2010, Hot-Wiring the Transient Universe, 119, available from http://hotwireduniverse.org/
Gray, J. 2003, Online Science, downloaded from http://research.microsoft.com/en-us/um/people/gray/JimGrayTalks.htm
Gray, J., & Szalay, A. 2004, Where the Rubber Meets the Sky: Bridging the Gap Between Databases and Science, Microsoft Technical Report MSR-TR-2004–110, IEEE Data Engineering Bulletin, 27(4), 3–11
Greene, G., et al. 2008, The National Virtual Observatory: Tools and Techniques for Astronomical Research, ASP Conf. Ser. 382 (Chicago: Astronomical Society of the Pacific), 111
Grosbl, P., et al. 2005, Astronomical Data Analysis Software and Systems (ADASS) XIV, ASP Conf. Ser. 347, 124
Harberts, R., et al. 2003, Intelligent Archive Visionary Use Case: Virtual Observatories, downloaded from http://disc.sci.gsfc.nasa.gov/intelligent_archive/presentations/presentations.shtml
Hendler, J. 2003, Science, 299(5606), 520
Hey, T., & Trefethen, A. 2002, Future Gen. Comput. Syst., 18, 1017
Hey, T., Tansley, S., & Tolle, K. (eds) 2009, The Fourth Paradigm: Data-Intensive Scientific Discovery, downloaded from http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Hojnacki, S. M., et al. 2007, ApJ, 659, 585
Ivezic, Z., et al. 2008, Parameterization and classification of 20 Billion LSST objects: lessons from SDSS, in Classification and Discovery in Large Astronomical Surveys, AIP Conf. Proc., Vol. 1082 (Melville, NY: American Institute of Physics), 359
Kegelmeyer, P., et al 2008, Mathematics for Analysis of Petascale Data: Report on a Department of Energy Workshop, downloaded from http://www.sc.doe.gov/ascr/ProgramDocuments/Docs/PetascaleDataWorkshopReport.pdf
Liu, C., et al. 2006, Advanced Software and Control for Astronomy, Proc. SPIE, Vol. 6274 (Bellingham: SPIE), 627415
LSST Science Collaborations and the LSST Project 2009, LSST Science Book, Version 2.0, arXiv:0912.0201, http://www.lsst.org/lsst/scibook
Lynds, B. T. 1962, ApJS, 7, 1
Mahootian, F., & Eastman, T. 2009, World Futures, 65, 61
Mahule, T., et al. 2010, in NASA Conference on Intelligent Data Understanding, downloaded from https://c3.ndc.nasa.gov/dashlink/resources/220/,pp.243-257
McGlynn, T. 2008, in The National Virtual Observatory: Tools and Techniques for Astronomical Research, ASP Conf. Ser. 382 (Chicago: Astronomical Society of the Pacific), 51
Missaoui, R., et al. 2005, Similarity measures for efficient content-based image retrieval. IEEE Proc. Vision Image Signal Process., 152(6), 875
Mould, J. 2004, LSST Followup, downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml
Murthy, S. K., Kasif, S., & Salzberg, S. 1994, J Artif. Intell. Res., 2, 1
Nisbet, R., Elder, J., IV, & Miner, G. 2009, Handbook of Statistical Analysis and Data Mining Applications (Amsterdam/Boston: Academic)
Ochsenbein, F., Bauer, P., & Marcout, J. 2000, The VizieR database of astronomical catalogues. A&ASS, 143, 23
Oreiro, R., et al. 2011, A&A, 530, A2
Pimblett, K. A. 2011, MNRAS, 411, 2637
Plante, R., et al. 2004, Astronomical Data Analysis Software and Systems (ADASS) XIII, ASP Conf. Ser. 314 (Chicago: Astronomical Society of the Pacific), 585
Plante, R., et al. 2010, Building Archives in the Virtual Observatory Era in Software and Cyberinfrastructure for Astronomy, Proc. SPIE, Vol. 7740 (Bellingham: SPIE), 77400K
Quinlan, J. R. 1996, Bagging, boosting, and c4.5, in the Proceedings of the 13th National Conference on Artificial Intelligence, AAAI Press (Portland, OR: Association for the Advancement of Artificial Intelligence), 725
Ramapriyan, H. K., et al. 2002. Conceptual Study of Intelligent Archives of the Future, downloaded from http://disc.sci.gsfc.nasa.gov/intelligent_archive/presentations/presentations.shtml
Raskin, R., G. & Pan, M. J. 2005, Knowledge representation in the semantic web for earth and environmental terminology (SWEET). Comput. Geosci., 31(9), 1119
Rebbapragada, U., et al. 2009, Finding anomalous periodic time series: an application to catalogs of periodic variable stars. Mach. Learn., 74(3), 281
Rossi, G., & Sheth, R. K. 2008, MNRAS, 387, 735
Rotem, D., & Shoshani, A. 2009, Scientific Data Management: Challenges, Technology, and Deployment (Chapman and Hall/Boca Raton: CRC)
Sarro, L., et al. 2009, Automated supervised classification of variable stars. II. Application to the OGLE database. A&A, 494, 739
Sebok, W. 1979, AJ, 84, 1526
Schaaf, A. 2007, Web Information Systems Engineering, WISE 2007 Workshop, Lecture Notes in Computer Science, Vol. 4832 (Heidelberg: Springer), 52
Shabalin, A. A., Weigman, V. J., Perou, C. M., & Nobel, A. B. 2009, Finding large average submatrices in high dimensional data. Ann. Appl. Stat., 3(3), 985
Sharpless, S. 1959, ApJS, 4, 257
Springel, V., et al. 2005, Simulations of the formation, evolution and clustering of galaxies and quasars. Nature, 435, 629
Strauss, M. 2004, Towards a Design Reference Mission for the LSST, downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml
Szalay, A. 2008, Preserving digital data for the future of eScience. Science News (from the August 30, 2008 issue)
Szalay, A., Gray, J., & vandenBerg, J. 2002, Petabyte scale data mining: dream or reality? in Astronomy Telescopes and Instruments, Proc. SPIE, Vol. 4836 (Bellingham: SPIE), 333
Tan, P.-N., Steinbach, M., & Kumar, V. 2006, Introduction to Data Mining (Boston: Addison Wesley)
Taylor, M., et al. 2010, IVOA Recommendation: Simple Application Messaging Protocol Version 1.2, downloaded from http://www.ivoa.net/Documents/latest/SAMP.html
Trimble, V., & Ceja, J. A. 2010, Astron. Nach., 331, 338
Tyson, J. A. 2004, The Large Synoptic Survey Telescope: Science & Design, downloaded from http://www.lsst.org/Meetings/CommAccess/abstracts.shtml
Tyson, J. A., and LSST collaboration 2008, LSST Petascale Data R&D Challenges, downloaded from http://universe.ucdavis.edu/docs/LSST_petascale_challenge.pdf
von Ahn, L. 2007, Human computation, in The proceedings of the 4th International Conference on Knowledge Capture. doi:10.1145/1298406.1298408
Wadadekar, Y. 2005, Publ. ASP, 117, 79
Wang, D., Zhang, Y., & Zhao, Y. 2010, in Software and Cyberinfrastructure for Astronomy, Proc. SPIE, Vol. 7740 (Bellingham: SPIE), 701937.1
White, R. L. 2008, Astronomical applications of oblique decision trees, in AIP Conference Proceedings for Classification and Discovery in Large Astronomical Surveys, Vol. 1082 (Melville, NY: American Institute of Physics), 37
White, R. L. et al. 2009, The High Impact of Astronomical Data Archives, ASTRO2010 Decadal Survey in Astronomy and Astrophysics position paper, downloaded from http://adsabs.harvard.edu/abs/2009astro2010P..64W
Witten, I. H., Frank, E., & Hall, M. A. 2011, Data Mining: Practical Machine Learning Tools and Techniques (3rd ed.; Amsterdam/Boston: Morgan Kaufmann)
Williams, R., 2008, Astronomical Data Analysis Software and Systems (ADASS) XVI, ASP Conf. Ser. 394 (Chicago: Astronomical Society of the Pacific), 173
Williams, R., & Seaman, R. 2008, in The National Virtual Observatory: Tools and Techniques for Astronomical Research, ASP Conf. Ser. 382 (Chicago: Astronomical Society of the Pacific), 425
Williams, R., Bunn, S., & Seaman, R. 2010, Hot-Wiring the Transient Universe, available from http://hotwireduniverse.org/
Wolf, C., et al. 2004, A&A, 421, 913
Wu, X., & Kumar, V. 2009, The Top Ten Algorithms in Data Mining (Chapman and Hall/Boca Raton: CRC)
Acknowledgments
This research has been supported in part by NASA AISR grant number NNX07AV70G. The author thanks numerous colleagues for their significant and invaluable contributions to the ideas expressed in this chapter: Jogesh Babu, Douglas Burke, Andrew Connolly, Timothy Eastman, Eric Feigelson, Matthew Graham, Alexander Gray, Norman Gray, Suzanne Jacoby, Thomas Loredo, Ashish Mahabal, Robert Mann, Bruce McCollum, Misha Pesenson, M. Jordan Raddick, Keivan Stassun, Alex Szalay, Tony Tyson, and John Wallin. The author is grateful to Dr. Hillol Kargupta and his research associates for many years of productive collaborations in the field of distributed data mining in virtual observatories.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this entry
Cite this entry
Borne, K. (2013). Virtual Observatories, Data Mining, and Astroinformatics. In: Oswalt, T.D., Bond, H.E. (eds) Planets, Stars and Stellar Systems. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-5618-2_9
Download citation
DOI: https://doi.org/10.1007/978-94-007-5618-2_9
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-5617-5
Online ISBN: 978-94-007-5618-2
eBook Packages: Physics and AstronomyReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics