Skip to main content
Top

2018 | Book

Data Science Landscape

Towards Research Standards and Protocols

Editors: Dr. Usha Mujoo Munshi, Dr. Neeta Verma

Publisher: Springer Singapore

Book Series : Studies in Big Data

insite
SEARCH

About this book

The edited volume deals with different contours of data science with special reference to data management for the research innovation landscape. The data is becoming pervasive in all spheres of human, economic and development activity. In this context, it is important to take stock of what is being done in the data management area and begin to prioritize, consider and formulate adoption of a formal data management system including citation protocols for use by research communities in different disciplines and also address various technical research issues. The volume, thus, focuses on some of these issues drawing typical examples from various domains.

The idea of this work germinated from the two day workshop on “Big and Open Data – Evolving Data Science Standards and Citation Attribution Practices”, an international workshop, led by the ICSU-CODATA and attended by over 300 domain experts. The Workshop focused on two priority areas (i) Big and Open Data: Prioritizing, Addressing and Establishing Standards and Good Practices and (ii) Big and Open Data: Data Attribution and Citation Practices. This important international event was part of a worldwide initiative led by ICSU, and the CODATA-Data Citation Task Group.

In all, there are 21 chapters (with 21st Chapter addressing four different core aspects) written by eminent researchers in the field which deal with key issues of S&T, institutional, financial, sustainability, legal, IPR, data protocols, community norms and others, that need attention related to data management practices and protocols, coordinate area activities, and promote common practices and standards of the research community globally. In addition to the aspects touched above, the national / international perspectives of data and its various contours have also been portrayed through case studies in this volume.

Table of Contents

Frontmatter
Data Science LandscapeData science landscape : Tracking the Ecosystem
Abstract
The big data phenomenon is continuously evolving, so is its entire ecosystem. In the recent past due to the advancing technologies/resources cropping up on all fronts, we have moved from data deficit to data deluge. The real challenge is in deriving benefits from the data tsunami for public good. Thus, it is imperative to build infrastructure to store and process humongous data. It is equally important to evolve innovative mechanisms for data analytics to draw inferences that can facilitate smart research and good decision making landscape. The paper dwells on some of the core elements of the big data ecosystem and endeavors to present the current scenario by identifying and portraying various initiatives to address big data boom.
Usha Mujoo Munshi
Open Data Infrastructure for Research and Development
Abstract
Open data are the idea originated from philosophy that certain data should be freely available for everyone to use, reuse, and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The intent of the open data movement is on the same lines as that of other “open” movements such as open source, open content, and open access. Open data have caught attention of research community, government, industry across the world, due to its huge potential to bring constructive changes in the socioeconomic and scientific domain by developing and disseminating information within a vibrant mixed ecosystem comprising of research and developer community, government bodies, business houses, and hybrid solutions of various forms fueled by the sharp elevation of information and communications technologies (ICT) and digital governance. Open data along with data analytics service give a huge opportunity for development and innovation in citizen service delivery. Open data have also formed a part of core strategy of businesses across the world, big or small, digital or non-digital. Research community has also highlighted the potential of data for research and development. Reuse of research data can enable its analysis from a different perspective; processing the data in combination with other related datasets may provide a different perspective of the research findings.
Neeta Verma, M. P. Gupta, Shubhadip Biswas
Managing Research Data by R&D Community in Nuclear Data Science in India
Abstract
Nuclear data science is part of big data science, which uses several scientific and engineering databases. Many aspects of knowledge management in nuclear data science are generic and apply across numerical databases in various disciplines. This write-up presents some of the perspectives of the author in managing research data by R&D community, taking into account author’s experiences in nuclear data science in the Indian context. In India, since 2004, many activities in nuclear data science for energy and non-energy applications have been successfully nucleated. Also presented are some remarks on citation practices on nuclear data sets in India. In the field of nuclear data science, internationally, the good citation practices by and large are being followed. The citation practices in India are also being improved. In this short write-up, we share some perspectives based upon author’s own positive experience in India.
S. Ganesan
Big Data in Astronomy and Beyond
Abstract
Developments in experimental and observation techniques over recent years have led to enormous increase in the scientific data available across many disciplines. While scientists in earlier decades worked on a handful of objects and their properties to understand the whole, modern scientific discoveries are data driven. It is now possible to work with data on millions or even billions of objects to look for patterns and relationships that remained obscured in the past. But dealing with the voluminous and varied data poses formidable technological challenges in implementing the required computer hardware and software, and the development of tools for the data and statistical analysis and the comparison of theoretical predictions with data. The Virtual Observatory techniques developed by astronomers over the last ten years or more provide a framework for meeting some of these challenges and can be adapted to other domains.
Ajit Kembhavi
Preserving for a More Just Future: Tactics of Activist Data Archiving
Abstract
The ownership of the life cycle of data—who creates it, processes it, visualizes it, preserves it—is a question of enormous power in our data-saturated society. The task of collecting, analyzing, and safeguarding large sets of data typically falls within the purview of data professionals: data scientists or statisticians tasked with deciphering the insights of large numbers or variables. Acting like modern-day oracles, they must know what metadata to collect, anticipate the needs of future users, and carefully track changes as data is worked upon and contexts change.
Morgan Currie, Joan Donovan, Brittany Paris
Little Data from Big Data for Disaster Risk Reduction in India
Abstract
The environment and natural disasters are closely related. Most of the disasters are due to miss-management of our natural resources and environment. With the new threat of climate change, the frequency and impact of disasters are increasing substantially. The most affected people are the poor people having less resources and capacity. Sendai Framework for disaster risk reduction (2015–2030) has realized the importance of science and technology in disaster risk reduction (DRR) and also emphasized the need of good database for proper disaster planning and preparedness. The paper deals with the importance of good database in understanding disaster vulnerability, risk, forecasting and warning system and overall preparedness for disasters particularly flood, which affects 40 million hectares of area annually and about 8 million hectares of the country. This data-centric approach to DRR is to endorse the UN Sendai framework for DRR and evolve a model for India and South-Asian region.
Vinod K. Sharma, Ashutosh Dev Kaushik
Data Marketplace as a Platform for Sharing Scientific Data
Abstract
Data marketplace is an emerging service model to facilitate data exchange between its producers and consumers. While the service has been motivated by a business model for data and has established itself in the commercial sector over the last few years, it is possible to build a data sharing platform for the scientific community on this model. This article analyzes the motivational and technical challenges for scientific data exchange and proposes use of data marketplace service model to address them.
Hiranmay Ghosh
ICSSR Data ServiceICSSR data service : A National Initiative for Sharing of Social Science Research Data
Abstract
The ICSSR Data Service emerged as an answer to persistent demand across the globe for open availability and sharing government and research data collected or developed through public funding. Objectives of the open data movement are perfectly aligned to other “open movements” such as open source, open access, open standards, open hardware and open content. The ICSSR Data Service was initiated by ICSSR, New Delhi, with the aim to promote data sharing and reuse, and to provide open access to social science research data/datasets at a national level. The article elaborates on background, genesis and implementation strategy adopted for setting up ICSSR Data Service. It enumerates on data sets available in the repository, its types and format, stakeholders, access and reuse policies of data and metadata, data versioning and current status of ICSSR Data Service. The article enunciates on organization of data in repository and features and functionality of ICSSR Data Service including search, browse and discovery options. The article elaborates on “Explore Online” tool that facilitated online visualization of data as charts, tables, graphs and maps using various visualization tools and techniques. Further, it elaborates on the inbuilt online “ICSSR Data Analytic” tool developed using “R” language. Lastly, the article states practices to be followed for citing data so as to help other researchers to identify and locate the source of referenced data.
Jagdish Arora, Pallab Pradhan, Yatrik Patel, Miteshkumar Pandya, Hiteshkumar Solanki, Divyakant Vaghela
Prismatic Consumer InsightsPrismatic consumer insights Through Big DataBig data : A Case Study of National Consumer HelplineNational consumer helpline
Abstract
The paper epitomizes how big data helps consumers and how its analysis assists in understanding the changing taste and preferences of consumers. It clearly brings out the modus operandi of data management and states that to a large extent the analytics can also bring out the characteristics and other key issues in different sectors of business. The systemic analysis of the data is of paramount importance to various stakeholders such as industry associations, standards and law making bodies, Government and policy makers, social science researchers and the like. While highlighting several provisions of the National Consumer Helpline for facilitating consumers, it highlights that the redressal system of NCH fosters networking by collaborative endeavour of over 200 partnering companies under the Convergence@NCH feature of the helpline. Data deluge requires enormous resources for standardization, interpretation, communication and presentation, and possess challenges which have been presented in the paper.
Suresh Misra, Deepika Sur
Big DataBig data in the Context of Smart CitiesSmart cities : Exploring Urban PlanningUrban planning and GovernanceGovernance
Abstract
Big data are generated and employed for many ends, including governing societies, managing organizations, leveraging profit, and regulating places (Kitchin, 2014:165). In the last few decades, cities have embraced ICT as a principal component of their development strategies. The latest incarnation of such an ICT-led vision of urban development is the notion of smart cities, which conceives of places being increasingly composed of and being monitored by pervasive computing and its economy and governance is driven by knowledge, innovation and entrepreneurship, attained/accomplished by smart people in smart ways (ibid.: 123). In this backdrop, this chapter seeks to provide a synoptic, conceptual, and critical analysis of the role of big data in the rise of smart city. It provides a descriptive account of the potentials and opportunities that big data offers in urban planning and city development. In doing so, the chapter discusses on the enabling technologies adopted by the national, municipal, and local governments in the smart city. In a nutshell, the chapter highlights the role of big data in promoting urban governance.
Sushma Yadav, Gadadhara Mohapatra
Crowd SourcingCrowd sourcing for Municipal GovernanceMunicipal governance
Abstract
This paper examines the incidence and impact of crowd sourcing in the urban governance in India. It is noted that intergovernmental consensus and cooperation for crowd sourcing for municipal governance have produced a set of models. Yet the scope of expansion for crowd sourcing as a tool to improve municipal systems and citizen centricity is fairly wide. In this regard, the role of city government as a mother institution in the sector is crucial. It is also noted that mobile users hold the key to enhance the citizen access to crowd sourcing. It is worth noting that city-specific use of crowd sourcing in the areas of accounting and revenue administration, governance and management of roads and related services, solid waste and assets has improved governance significantly in terms of decentralization, accountability, transparency, equity, efficiency and civic participation. Finally focus areas are identified for intergovernmental agenda on use of crowd sourcing for municipal governance.
K. K. Pandey
Effective Business DevelopmentEffective Business Development for In-Market IT InnovationsIn-Market IT Innovations with Industry-Driven APIIndustry-Driven API Composition
Abstract
Businesses around the world are recognizing the need for continuous in-market IT innovations to differentiate their products and services with competition and drive business success. However, they find it difficult to invest and incorporate innovative solutions because, by definition, the innovations have not been field tested enough before to become off-the-shelf offerings and provide proven cost versus benefit business case. On the other hand, the providers of innovative solutions also face the problem of communicating business value while making their technology available for demonstration because they do not have access to the proprietary client data and business processes to estimate their technology’s impact on client’s dynamic environment and cannot part with the solution without adequate compensation for their intellectual property in the innovations. To resolve these twin issues, in this paper, we propose a framework to develop and incorporate innovative solutions in the context of industry business processes and metrics using (Web) API composition. We have implemented this approach into a prototype, and early experience shows that it is able to expedite business development for in-market innovations.
Biplav Srivastava, Malolan Chetlur, Sachin Gupta, Mitesh Vasa, Karthik Visweswariah
The Data that Get Forgotten
Abstract
Progress evolves. Like mortality, it has a unique timeline. New technology can enhance previous methodology and bring new insight to research; new interpretations build on the shoulders of what has already been learned or divined. In all matters pertaining to science and research, we move onwards and upwards and the arrow of time is always assumed to be pointing forwards. Nevertheless, there are occasions when critical information can only be gleaned from the past, when the inclusion of heritage data is pivotal to a research problem. Modern data management is not on our side in this respect, and the matter is growing more serious daily. In this chapter, we (a) examine the present state of affairs regarding those heritage data which deserve and need far more attention than they have been afforded until now, (b) outline some of the science that can only be made possible by including past observations along with new ones, and (c) discuss how we might focus a global effort on this problem before it is too late.
Elizabeth Griffin
Big DataBig Data and Predictive AnalyticsPredictive Analytics : A Facilitator for Talent ManagementTalent Management
Abstract
Big data has become a business buzz word. This paper explores the role and significance of big data in talent management. It also delineates various challenges faced in using big data for HR professionals. Convergence of talent and technology is bringing paradigm shift in trends pertaining to management of human resources. The emergence of big data and predictive analytics is reverberating the human resource domain. Today workforce in organizations is multigenerational, multicultural, and multilingual. Arrival of Gen Y and Gen Z has posed diverse challenges to the HR professionals. Customization of policies and opportunities to cater to the needs of diverse workforce has created the need to integrate technology as an indispensable part of HRM strategy. The data-driven, changing business environment requires correct analytics for business survival and sustainability. Human resource professionals are ready to embrace this revolution of big data in HR areas. Towers Watson survey conducted on more than 1,000 organizations last year found that human resource data and analytics are among the top three areas of HR technology. Analytics can be adopted as a strategic tool for enrichment, empowerment, and engagement of employees. Recruitment and retention of employees can also be improved to win the war for talent. Business strategy realizations are based on the available pool of talent. Human capital brings business strategies to life. Role of human capital professionals is transforming as strategic business partners and changes drivers. Talent management insights based on big data can be a big business differentiator. Right human capital management can prove as competitive advantage for organizations. Manpower analytics provides agility to organizations which results in their survival and growth. This paper attempts to throw light on application of big data and predictive analytics tools for human resource management. Inclusion of big data analytics can be a big deal for human capital management. Big data along with predictive analytics can bring lot of opportunities for human capital management. Discussion in international conferences and seminars has helped in developing the content of the paper. Insights from various articles on the said topic have also been incorporated. In order to develop this paper, key learning is also derived from white papers of reputed organizations.
Neetu Jain, Maitri
Privacy Preserving Data Mining TechniquesPrivacy Preserving Data Mining Techniques for Hiding Sensitive DataHiding Sensitive Data : A Step Towards Open DataOpen Data
Abstract
Privacy Preserving Data Mining (PPDM) is an area that deals with data mining techniques that allow the data to be mined while keeping its privacy intact. PPDM comes into play when different parties want to involve in collaborative data mining and wish to collectively mine their data to extract knowledge while keeping their data private. The preservation of privacy of data is also of utmost importance when we want to publish any data as open data. This study includes a detailed review of some of the recent techniques proposed in this area. The biggest challenge in devising any privacy preserving data mining technique is to achieve a good balance between privacy and utility of the privacy preserved data.
Durga Toshniwal
Role of Credible DataCredible data in Economic Decision MakingEconomic decision making
Abstract
Economic statistics play a critical role not only in understanding and tracking the economy but also in framing policy. The careful and transparent collection, reporting and analysis of credible economic data are essential to evidence-based policy making. It is important to analyze precisely what’s going on in the economy and statistics play a key role in not only analyzing problems but also helping policy makers to take suitable measures to address the complex issues and set targets and use indicators for monitoring and evaluation purposes. However, in recent times, the credibility of the economic data published by the government and other statistical agencies is increasingly being questioned on grounds that the data putout in the public domain is not factually correct and does not follow standard international practices in its compilation. Hence, this chapter makes an attempt to not only highlight the importance of compiling data that is credible for effective decision making but also identifies the data gap mainly in economic indicators and recommends various measures that the government and other agencies would need to follow to ensure that the data compiled is credible and usable leading to authentic results. The study also brings the experiences of best practices followed by various countries to be utilized in India for effective data management.
Geethanjali Nataraj, Ashwani Bishnoi
Big Data in the International System: Indian, American, and Other Perspectives
Abstract
The paper engages the reader with the objective of elaborating upon the nature, content and the attendant debates and situations related to the cyber world. The paper attempts a phantasmagorical striving for elucidating upon the Simulacra impact of Jean Baudrillard, the French Philosopher. The realism-ordained nature of the legal, political and the technological nature of the larger cyber world is pithily brought out for the understanding of the ground zero scenario of the matrix of simulations and social Media taken together. The constitutional aspects of the American situation are brought out tersely which attempts to balance out between the liberty and individual argument of the American dissent riders along with the argument of the establishment in order to strengthen the national security standpoint of the American establishment. The paper develops as a backgrounder, the phantasmagorical approach of the matrix like cyber world and the social media space. However it does not offer solutions, but attempts a delineation of the various themes such as anti-trolling legislations in the USA, the Cinema narrative which emboldens the narrative of the Big Data along with a simple enumeration and elaboration of the idea and fact of Big Data.
Manan Dwivedi
Evolving an Industrial Digital Ecosystem: A Transformative Case of Leather Industry
Abstract
Advances in information technology have made delivering e-content easier to subject experts and to publish on the Web with minimal technical skills and assistance. The development of multimedia e-Learning content is no more labour-intensive process, requiring a team of Web designers and developers responsible for the technical development of these resources, thereby limiting its widespread adoption. Tools such as Adobe Captivate, Camtasia, Snap, Articulate, Lectora, etc., provide an easy way of developing e-contents. This exercise has triggered the idea of creating a knowledge system for an industry, namely leather industry. Digital libraries are large organized collection of digitized objects comprising of texts, images, graphs, data which in new technical terms are referred to as “Big Data”. Well-defined digital software has the potential to disseminate this information worldwide through Internet technologies. The emergence of World Wide Web makes the unprecedented volumes of data freely available to the entire society. This helps in building a domain-specific knowledge portal. The digital economy (also called the Internet economy, the new economy, or the Web economy) is an economy based on digital technologies, including communication networks (the Internet, Intranets, and Extranets), computers, software, and other related technologies. The digital infrastructures provide a global platform over which people and organizations interact, communicate, collaborate, and search for information. Creating new sources of revenue, rationalizing cost structures, enhancing the speed of technology adaptation are considered to be the three wheels of digital ecosystems. Knowledge management is about creating an environment where information can be readily shared. This information is not just the text information but also includes the data at the unit level generated out of research endeavours. Creating a learning organization culture is critical. The Web can now integrate learning and mission-critical business applications while delivering timely knowledge to each desktop. The end result is a knowledge management structure which includes an inventory of knowledge objects and a system in which these can be shared. This is an approach paper to illustrate the content development for a few aspects in leather industry, taking advantage of tools available and a portraying schematically the knowledge portal for this industry. The paper considers knowledge management system (KMS) and the IT tools for leather and allied industry and networking of academic institutes with industries to create knowledge hub and technology innovation centre using digital ecosystem eventually leading to digital transformation.
Latha Anantharaman, M. R. Sridharan
Applying Big Data Analytics in Governance to Achieve Sustainable Development Goals (SDGs) in India
Abstract
One of the chief obligations of governance in democratic countries is creating and co-creating value in public service delivery, the process of which is undertaken in a participatory manner so as to ensure an accountable, responsive, and transparent ecosystem (Good Governance, The United Nations Economic and Social Commission for Asia and the Pacific—UNESCAP). To translate these good governance ideologies into practical implementation, it is necessary to ensure the achievement of development targets defined as Sustainable Development Goals such as no poverty, zero hunger, and complete gender equality (SDGs, UNDP, 2015). Further, to achieve these development targets in a sustained manner, converged governance efforts are required at the grassroots, which in turn would inevitably result in the generation of continuous baseline data. This colossal amount of data thus generated at the grassroots when coupled with unstructured citizens’ data generated through other digital devices holds immense potential to revolutionize governance processes by providing a foundation for data-backed decision making. Hence, such structured baseline data and unstructured citizens’ data must be continuously combined and analyzed by application of big data analytics and other emerging ICTs (information and communication technologies).
Charru Malhotra, Rashmi Anand, Shauryavir Singh
Open Data: India’s Initiative for Researchers, Research, and Innovation
Abstract
The idea of open data in correlation with the idea of Open Government Data has gained a lot of prominence. Data are freely available to anyone for use, reuse, and redistribution while consuming it commercially or noncommercially. It has potential to increase levels of transparency and accountability and scope to promote higher level of public participation for social or economic growth of the country. Seeing the Open Government Data Benefits, Government of India came out with Open Data Policy (NDSAP) in 2012 and created Open Government Data Platform (https://​data.​gov.​in) for proactive release of shareable government data. It is made open to public to explore the data potential and innovate with it using APIs and visualization engine. This initiative has reached a stage where more than one lakh open datasets have been published for the exploration of data hungry people. Besides lots of benefits of open data to various stakeholders, it has huge potential for public research and private innovation. Opening up of data for researchers removes many unproductive barriers and avoids duplication of efforts. It improves greater access to data, accelerates the pace of discovery, and helps in working smarter and faster by sharing and analyzing data, enhancing visibility of one’s research and much more. Viewing the shrinking budget and increasing research cost and with an objective to validate new techniques, calibrate computer models, and show comparative progress, availability of good open datasets becomes critical and becomes need of the hour. Scientific research being increasingly data-driven and expensive, potential of open data should be tapped to enhance the efficiency and quality of research by reducing the costs of data collection, by facilitating the exploitation of dormant or inaccessible data at low cost. With this in the backdrop, the paper is endeavoring to explore potential of open data for research.
Neeta Verma, Alka Mishra
Data Management, Sharing and Services: Issues of Attitude Towards Data Citation and Role of Data Stakeholders
Abstract
With the advances in information technology, the incidences of allowing other researchers reuse or refer earlier data used by researchers, who use that data first time and publish, have grown overtime. There are opportunities as well as challenges to contribute as data providers which have affected the pace of growth in data sharing in public domain.
O. P. Wali
UK-India Research and Innovation Collaboration: Taking Forward Collaboration on Big Data and E-infrastructure
Abstract
We are living in an era of ‘Big Data’. That is to say data sets so large, complex and rapidly generating that they cannot be processed by traditional information and communications technologies. In 2012, the USA announced the National Big Data Research and Development Initiative to address the challenge and opportunity of ‘Big Data’. In the UK, the Big Data Institute at the Li Ka Shing Centre for Health Information and Discovery was launched by Prime Minister David Cameron in May 2013.
Nafees Meah
The Urgent Need to Overcome a Generation Gap in Public DataPublic Data ManagementManagement Practices for Digitalized India
Abstract
It is a non-brainer in today’s knowledge-driven economy that data is king [1]. In this context, India has been collecting data for decades in areas as diverse as agriculture, weather, energy, rivers, health, and space. However, the data management practices are at least a generation behind where they need to be to match the global state-of-the-art and deliver much needed economic results.
Biplav Srivastava
Data SecurityData Security in Cloud-Based ApplicationsCloud-Based Applications
Abstract
Global business challenges and expeditious growth of Internet services especially in the last decade have driven business organizations to seek an emerging technology for doing business.
Surabhi Pandey, G. N. Purohit, Usha Mujoo Munshi
Backmatter
Metadata
Title
Data Science Landscape
Editors
Dr. Usha Mujoo Munshi
Dr. Neeta Verma
Copyright Year
2018
Publisher
Springer Singapore
Electronic ISBN
978-981-10-7515-5
Print ISBN
978-981-10-7514-8
DOI
https://doi.org/10.1007/978-981-10-7515-5

Premium Partner