Cultivating a research agenda for data science

Chris A Mattmann
I describe a research agenda for data science based on a decade of research and operational work in data-intensive systems at NASA, the University of Southern California, and in the context of open source work at the Apache Software Foundation. My vision is predicated on understanding the architecture for grid computing; on flexible and automated approaches for selecting data movement technologies and on their use in data systems; on the recent emergence of cloud computing for processing and storage, and on the unobtrusive and automated integration of scientific algorithms into data systems. Advancements in each of these areas are a core need, and they will fundamentally improve our understanding of data science, and big data. This paper identifies and highlights my own personal experience and opinion growing into a data scientist.
