Introduction
Driven by globalisation and increasing market competitions, various industries have turned to big data analytics (BDA) for its ability to transform enormous raw data into decision-making tools [
1]. BDA consists of a set of advanced analytical techniques adapted from related fields, such as artificial intelligence, statistics, and mathematics, which are used to identify trends, detect patterns, and unveil hidden knowledge from a huge amount of data [
2]. This technology has been applied in different fields, including finance [
3], insurance [
4], and cyber security [
5], to name a few. The emergence of BDA can be linked to the inability of traditional database management tools to handle structured and unstructured data simultaneously [
6]. Structured data refers to data that have a scheme, metadata, rules, and constraints to follows, whilst unstructured data have no structure at all or unknown structure to follow [
7]. These types of data are collected or received from diverse platforms, such as network sensors, social media, and the Internet of Things.
Although it is vital to exploit structured and unstructured data for BDA, they are usually incomplete, inaccurate, inconsistent, and vague or ambiguous, which could lead to false decisions [
8‐
11]. Salih et al. [
12] and Wamba et al. [
13] have highlighted the lack of data quality mechanisms being applied in BDA prior to data usage. Several studies have considered the potential of data quality for BDA application [
14‐
18], yet, specific questions about what drives the dimensions of data quality remain unanswered. Nevertheless, studies on data quality and BDA are still underway and have not reach a good level of maturity [
7]. Thus, there is an urgent need to conduct in-depth study on data quality to determine the most important dimensions for BDA application.
Several theories or models for understanding data quality problems have been suggested, such as resource-based theory (RBT), organisational learning theory (OLT), firm performance (FPER), and data quality framework (DQF). However, these theories or models do not fit into BDA application since they concentrate primarily on service quality as opposed to data quality [
19]. Moreover, most studies related to BDA are focused on the perspective held at the organisational or firm level [
8,
10,
20,
21] and studies focusing on the individual perspective are lacking. Since academics are encouraged to participate in research on pedagogical support for teaching about BDA [
22], this study has determined that university students can represent the perspectives at the individual level. Students were chosen because it is crucial to prepare and expose them to BDA, especially in the mandatory setting [
23].
Meanwhile, numerous traits have been studied to explain the characteristics of big data, such as 3Vs [
24], 4Vs [
25], 5Vs [
26,
27], 7Vs [
28], 9Vs [
29], 10Vs [
30], 10Bigs [
31], and 17Vs [
32]. These attempts to assign the maximum number of characteristics to big data show the lack of uniform consensus regarding the core of big data characteristics [
33]. Although big data characteristics and data quality are viewed as distinct domains, several studies have found that these two domains are interconnected and closely related [
9,
14,
17]. A better understanding of the core characteristics of big data and the dimensions of data quality is needed. Hence, this study seeks to expand the knowledge on big data characteristics, hereafter known as big data traits (BDT) and data quality dimensions (DQD), as well as to explore how they could affect the application of BDA.
Literature review
Big data and analytics are two different fields that are widely used to exploit the exponential growth of data in recent years. The term ‘big data’ represents a large volume of data, while the term ‘analytics’ indicates the application of mathematical and statistical tools on a collection of data [
34]. These two terms have been merged into ‘big data analytics’ to represent various advanced digital techniques that are formulated to identify hidden patterns of information within gigantic data sets [
35,
36]. Scholars have suggested varying definitions for BDA. For instance, Verma et al. [
23] defined BDA as a suite of data management and analytical techniques for handling complex data sets, which in turn lead to a better understanding of the underlying process. Faroukhi et al. [
37] defined BDA as a process of analysing raw data in order to obtain information that is understandable to humans, which are hard to observe using direct analysis. Davenport [
38] simply defined BDA as a “focus on very large, unstructured and fast moving data”.
Nowadays, BDA application has helped numerous organisations improve their performance because it can handle problems instantly and assist organisations in making better and smarter decisions [
35,
39]. The advantages of BDA application for organisational performance have been proven by numerous studies. For instance, Mikalef et al. [
20] found four alternative solutions surrounding BDA that can lead to higher performance, whereby different combinations of BDA resources either play a greater or lesser importance to organisational performance. Similarly, Wamba et al. [
40] applied the RBT and sociomaterialism theory to examine organisational performance. Their empirical work showed that the hierarchical BDA has both direct and indirect impacts on organisational performance. Based on this same set of views, Wamba et al. [
13] highlighted the importance of capturing the quality dimensions of BDA. Their findings proved the existence of a significant relationship between the quality of data in BDA and organisational performance.
Some scholars perceive data quality as equivalent to information quality [
41‐
44]. Data quality generally refers to the degree to which the data are fit for use [
45]. Meanwhile, the concept of information quality is defined as how well the information supports the task [
46]. Haryadi et al. [
14] asserted that data quality is focused on data that have not been analysed, while information quality is focused on the analysis that has been done on the data. This study, however, opines that data quality should focus on the wellness and appropriateness of data, which encompasses either before or after it has been analysed, in which it should meet the requirements of organisations [
12].
The notion of quality represents a multidimensional construct, whereby it is essential to combine its dimensions and express them in a solid structure [
46]. Initially, Wang and Strong [
45] used factor analysis to identify DQD and found 179 dimensions that were eventually reduced to 20. Then, they organised these dimensions into four primary categories, namely intrinsic, contextual, representational, and accessibility. The intrinsic category denotes datasets that have quality in their own right, while the contextual category highlights the requirement of the task that data quality must be considered within the context. The representational category describes data quality in relation to the presentation of the data, and the accessibility category emphasises on the importance of computer systems that provide access to data [
18]. Each category has several dimensions that are used as specific data quality measurements. For instance, accuracy and objectivity are the dimensions in the intrinsic category, while relevance and timeliness are the dimensions in the contextual category. Interpretability and understandability are the dimensions in the representational category, and access security and ease of operations are the dimensions in the accessibility category. Table
1 presents all DQD according to their categories.
Table 1
DQD and their categories [
17]
Accuracy | Intrinsic |
Objectivity |
Believability |
Reputation |
Value-added | Contextual |
Relevancy |
Timeliness |
Completeness |
Appropriate amount of data |
Interpretability | Representational |
Understandability |
Concise representation |
Consistent representation |
Accessibility | Accessibility |
Access security |
Ease of operations |
Various studies have been conducted to analyse the relationships between DQD and BDA application. For instance, Côrte-Real et al. [
8] analysed the direct and indirect effects of DQD on BDA capabilities in a multi-regional survey (European and American firms). Their findings showed that the DQD, primarily completeness, accuracy, and currency, have significant effects on BDA capabilities when process complexity was low. Thus, these authors have demonstrated the emergent need for firms to have effective data quality mechanisms to be able to derive sufficient value from BDA application. Ghasemaghaei and Calic [
47] used OLT and the DQD compiled by Wang and Strong [
45] to explain the effect of BDA on data quality categories. They found that while many organisations have invested in BDA application, they need to pay more attention to the quality of their data in order to enhance the quality of the solutions. Meanwhile, Ji-fan Ren et al. [
48] examined the quality dynamics (system quality and information quality) in BDA using business value and FPER theories. Their study revealed that system quality can enhance information quality, which in turn, would affect organisational values and performance in the BDA environment. While these studies offer insights into the relationship between DQD and BDA, they have not highlighted the critical DQD that could impact BDA application.
DQD are also associated with the characteristics of big data, which are commonly known as big data traits (BDT). The BDT were originally defined by 3Vs (volume, velocity, and variety) [
24]. These traits have been extended over the years, which include 4Vs (volume, velocity, variety, and value) [
25], 5Vs (volume, velocity, variety, value, and veracity) [
26,
27], 7Vs (volume, velocity, variety, veracity, validity, volatility, and value) [
28], 9Vs (veracity, variety, velocity, volume, validity, variability, volatility, visualisation, and value) [
29], 10Vs (volume, value, velocity, veracity, viscosity, variability, volatility, viability, validity, and variety) [
30], 10Bigs (big volume, big velocity, big variety, big veracity, big intelligence, big infrastructure, big service, big value, and big market) [
31], and 17Vs (volume, velocity, value, variety, veracity, validity, volatility, visualisation, virality, viscosity, variability, venue, vocabulary, vagueness, verbosity, voluntariness, and versatility) [
32].
Several studies have investigated the influence of BDT on DQD. Noorwali et al. [
49] argued that there is a lack of scientific understanding of the general and specific requirements of BDT and DQD. They suggested that a more systematic analysis for both BDT and DQD is essential for reducing the number of missing quality requirements while accounting for BDT. Likewise, Lakshen et al. [
50] argued on the various technical challenges that must be addressed before the potential of BDT and DQD can be fully realised. Haryadi et al. [
14] confirmed that BDT and DQD are the central issues for implementing BDA.
Discussion
The present study has explored the BDT and DQD constructs for BDA application. The findings showed that the accessibility of DQD (ease of operation) can significantly influence BDA application. This result showed that the ease of obtaining data plays an important role in providing users with an effective access to reduce the digital divide in BDA application endeavour. This result is corroborated by the findings by Zhang et al. [
61], who considered the ease of functional properties would ensure the quality of BDA application. Janssen et al. [
9] similarly proposed that the easier it is to operate BDA, the more application systems would be integrated and are sufficient for handling this technology.
Akter et al. [
57] found significant influence of DQD (completeness, accuracy, format, and currency) on BDA application. On the other hand, the results of this study showed that accuracy, believability, completeness, and timeliness have no significant influence on the decision to apply BDA. These results were unexpected. These outcomes could be because the respondents were novice users, whom assumed the availability of technical teams to solve any accuracy, believability, completeness, and timeliness problems in BDA application.
Meanwhile, the four indicators of BDT (velocity, veracity, value, and variability) have shown significantly high impact on all constructs of DQD (accuracy, believability, completeness, timeliness, and ease of operation). These findings are in agreement with the results obtained by Wahyudi et al. [
17], whereby high correlation was found between BDT, and timeliness and ease of operation. The significant influence of BDT on DQD showed interesting results, which demonstrated how users recognise the importance of BDT for assessing quality assessment results. This observation is in agreement with Taleb et al. [
62], who claimed that BDT could enforce quality evaluation management to achieve quality improvements. The findings also showed that while many researchers have proposed numerous BDT, in this context, velocity, veracity, value, and variability are more critical for assessing data quality in BDA application.
Conclusion
This study has proposed the practical implications based on perspectives at the individual level. Individual perspectives are imperative since the resistance to use technology commonly originates from this level of users. Hence, the results of this study may be beneficial for organisations that have not yet agreed to implement BDA. They could use the results to have a sense of the possibilities from embracing this technology. This study has also shown the theoretical implications based on the incorporation of BDT as a single construct and DQD as an underpinning theory for the development of a new BDA application model. This study is the first to investigate the influence of BDT and DQD towards BDA application by individual level users.
Several limitations apply to the interpretation of the results in this study. First, the intrinsic and contextual data quality categories are inadequate to specify the DQD included in the proposed model. Future studies may include other DQD, such as objectivity and reputation to represent the intrinsic category. Meanwhile, value-added, relevancy, and appropriate amount of data can be used for measuring the contextual category. Second, the chosen undergraduate students who have knowledge on BDA were insufficient to generalise the individual level perceptions towards BDA application. Hence, future studies could include more experienced respondents, such as lecturers or practitioners. Third, although the sample size was statistically sufficient, a larger sample may be useful to reinforce the results of this study. Finally, although this study has attempted to bridge the gaps between BDT and DQD, future studies are encouraged to explore other constructs for better understanding of BDA application. For instance, future studies could explore the role of security and privacy concerns in BDA application since data protection is becoming more crucial due to recent big open data initiatives. Therefore, a novel BDA application model that can address security and privacy concerns may be worth exploring. Overall, the findings of this study have contributed to the body of knowledge in the BDA area and offered greater insights for BDA application initiators.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.