Skip to main content

Open Access 10.04.2024 | Computer Assisted Approaches in Production Engineering

Machine learning implementation in small and medium-sized enterprises: insights and recommendations from a quantitative study

verfasst von: Peter Burggräf, Fabian Steinberg, Carl René Sauer, Philipp Nettesheim

Erschienen in: Production Engineering

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Machine learning (ML) offers high potential in manufacturing industry; moreover, for example the effectiveness of quality prediction and evaluation can be greatly improved using Machine Learning, which can generate significant competitive advantages. However, the potentials of ML are not fully exploited by small and medium-sized enterprises. A qualitative empirical study was conducted with 60 companies from different industry sectors to determine when SMEs are more likely to use ML. Here, it is shown that the willingness to invest in applications is substantial for the implementation of ML. Also, the availability of sufficient qualitative data within the SME is imperative for applying ML. Furthermore, recommendations for action for SMEs are established to close the technology adoption gap in SMEs and to leverage the benefits of ML.
Hinweise

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Machine learning (ML), as a subfield of artificial intelligence (AI), continues to grow in importance [1]. The consistent growth potential of ML is closely linked to the relentless increase in data volumes [2]. In 2012, the global data volume generated amounted to 6.5 zettabytes. By 2020, this figure had surged to 64.2 zettabytes, reflecting a persistent upward trend [3]. Data-driven intelligence has revolutionized the way companies do business in recent years. Informed data-driven decisions are being made and, with the help of Big Data, more and more information exists that can be processed [4]. In addition, hardware for ML has become more affordable [5]. There are many reasons for the significant increase in ML applications and their greater prevalence in industrial and scientific environments today compared to a few years ago [6].
However, contrary to all the positive aspects, real-world applications of ML in manufacturing companies are common for large enterprises but less frequently in small and medium-sized enterprises (SMEs) [7]. Companies with fewer than 500 employees used ML four times less frequently than companies with more than 500 employees, because SMEs fail to apply ML technologies due to insufficient ML know-how [7]. In contrast, large companies typically focus on product development and process optimization using data-driven analytics [8]. However, for SMEs, on the other hand, it is often challenging to implement ML in their companies in the long term and to leverage its benefits. For example, a study by Thiée (2021) investigated that 22% of small companies do not even deal with the topic of ML, only 10% have already implemented ML in their company, and just 29% of companies evaluate ML as an opportunity for better innovation or development of existing products [8]. This highlights the disparity in ML adoption and the need for targeted strategies to support SMEs in embracing ML technologies. Contrary to the negative attitude toward ML, two out of five SMEs are nevertheless starting to plan digitization projects [8]. Overall, it is apparent that SMEs have a deficit in the use of ML. SMEs are still not sufficiently concerned with the topic and are not yet exploiting their full potential, as many large companies do [9]. Especially in Germany the performance of small and medium-sized enterprises is responsible for the country's economic success. Thus, 99% of German companies are medium-sized [10].
While the relevance of ML in the manufacturing industry is unquestioned, its application in small and medium-sized enterprises (SMEs) lags behind its potential. This research gap is particularly significant as SMEs form the backbone of the German economy [9]. Previous studies have explored the application of ML in SMEs, focusing mainly on identifying barriers and success factors. For example, Bauer et al. (2020) have already conducted an empirical study identifying the enablers and success factors of ML in SMEs [7]. Pazhayattil and Konyu-Fogel (2023) have used this methodology to confirm five factors delaying the implementation of AI in the pharmaceutical industry, and Jayashree et al. (2021) were able to propose a framework that significantly influences the implementation of Industry 4.0, aiming for economic, social and environmental sustainability [11, 12]. Furthermore, studies have mainly focused on surveys to investigate and quantify the use of ML. For example, according to a recent Bitkom study only 9% of companies in Germany are using AI, while 25% of the business community is planning or discussing the use of AI [13].
However, there is a research gap regarding the in-depth analysis of specific factors that influence the likelihood of ML implementation in SMEs. This gap is particularly crucial, considering the potential of effective ML technologies to provide SMEs with a competitive edge in the digital economy. Therefore, our study not only identifies the influencing factors but also quantifies their impact on the likelihood of ML implementation, extending beyond previous studies to examine these specific factors. Our approach utilizes a statistical model to evaluate the strength and significance of these factors, enabling us to offer more precise recommendations for SMEs. This study contributes to narrowing the gap between the theoretical recognition and practical implementation of ML in SMEs, highlighting key elements such as investment willingness and data availability, thereby offering a new perspective on the challenges and opportunities ML presents to SMEs.
To bridge this research gap, the empirical study aims to investigate the following research question: What are the influencing factors regarding the implementation probability of ML in SMEs? To conduct the empirical study, 60 SMEs from different industry sectors were interviewed using an online survey. The survey results were then statistically analyzed using multiple linear regression (MLR) to gain insights into the influencing factors regarding the implementation probability of ML in SMEs. Here, our research primarily discovered that the influencing factors machine learning importance, the willingness to pay, and data mining readiness increase the likelihood of ML implementation in SMEs, while the factor machine learning experience has no significant impact on the likelihood of implementation. This study emphasizes the importance of a preliminary and fundamental identification of pain points and the development of alternative solutions, advocating for the implementation of ML only when it provides tangible added value and is the optimal response to a specific business challenge.
This paper is organized as follows. Section 2 first formulates relevant hypotheses, as predictors for the research question to clarify the dependence of the same concerning the research question. In Sect. 3, the study design is introduced and the data, which were collected quantitatively through a web survey, are presented. Section 4 covers the validation of the residuals as well as the measurement data to ensure the validity of the study. Section 5 presents and interprets the results of the empirical study. Section 6 discusses the results and, following on from the results of the regression analysis derives recommendations for action for SMEs. Section 7 contains concluding remarks and a summary of the study conducted.

2 Hypothesis development

According to Hair (2010), an empirical study typically commences with the formulation of hypotheses [14]. The hypotheses serve as the basis for planning and conducting the study and for analyzing and interpreting the data collected. In this section, four research hypotheses are presented to investigate the relationship between the implementation of ML in SMEs: Machine Learning Experience, Willingness to Pay, Data Mining Readiness, and ML Importance.

2.1 Machine learning experience

The term ML, which is often applied in combination with artificial intelligence, is related to the excessive complexity and uncertainty of the new techniques [15]. Out of respect for the complexity, in many cases the technology is not implemented, perhaps because the benefits appear to be less than the potential gains. To generate realistic risk-opportunity awareness, a profound understanding of the strengths and weaknesses of ML technology is necessary.
Two main experiences are distinguished, one is the internal experience, and the other is the external experience. The main part of the internal experience is the existing experience with available data in the context of the respective ML application, which is also described as a success factor of central relevance for the implementation of further machine learning applications [16]. In addition, there is the external experience, which describes the recourse to external knowledge bases. This serves as a common entry point into ML and is, consequently, equally crucial. This includes, for example, the exchange of know-how with non-competing SMEs and companies that have specialized in this subject area [16]. Through in-company experience, SMEs can develop an in-depth knowledge of best practices, challenges, and potential solutions. They thus know competencies and existing relationships that can be leveraged for implementation. Organizations with experience have a deeper understanding of best practices, potential challenges, and effective strategies. This knowledge can inform decision-making, guide the implementation process, and increase the likelihood of success. By knowing about the different existing levels of experience in a company as well as the assumption that the use of ML is probably related to the existing knowledge in a company, the following hypothesis arises:
H1
The better the machine learning experience (EXP), the higher the implementation probability of ML in SMEs.

2.2 Willingness to pay

The experience of the beneficial value, such as more efficient processes, productivity increase or cost reduction, is not the sole prerequisite for ML implementation. It also requires an awareness that implementation cannot be achieved without financial expenditure, sometimes substantial, for which there must be a willingness to pay. The definition of willingness to pay was created to provide a reference for investing in ML. Large companies, which have greater financial resources at their disposal, often pay more for capital in ML technologies and this favors and reinforces the higher usage of ML in comparison to SMEs [17]. There is also a fundamental correlation between the costs of innovation and the probability of introducing it [18]. The less cost-intensive the implementation is, the higher the probability of willingness to pay for it [19]. However, new technologies with potential often require an initial investment that yields returns only after a specific period of productive use. Accordingly, a company must be willing to make the necessary investment in new technology. If the willingness to pay does not exist, no new technology like ML will be implemented. The hypothesis is therefore as follows:
H2
The greater the willingness to pay (WTP), the higher the implementation probability of ML in SMEs.

2.3 Data mining readiness

In addition to the existing wealth of experience in a company and the willingness to pay for innovation, the ability to mine data is an important factor in the use of ML. Within this study, Data Mining Readiness of a company was defined as a combination of psychological and structural readiness. Here psychological readiness describes the beliefs and attitudes of organizational members [20]. Psychological readiness is essential for generating and analyzing company data, while structural readiness is more significant in direct comparison [20, 21]. For example, a corresponding open-mindedness and willingness to deal with hurdles and improvements in the context of implementing machine learning are decisive [16]. Generating high-quality data therefore seems to be of crucial importance for the implementation of ML as well as the meaningfulness of the results. Structural readiness, on the other hand, describes the existence of sufficient data volumes, given IT infrastructure, and the availability of tools, which are indispensable for the implementation of ML approaches [21]. Therefore, it can be inferred that data mining readiness has a higher positive impact on the implementation of technologies such as ML and thus the third hypothesis is:
H3
The more comprehensive the data mining readiness (DMR), the higher the implementation probability of ML in SMEs.

2.4 Machine learning importance

For the study, the machine learning importance was created as a separate factor that might influence the probability of SMEs to implement ML. Here, the focus is on the expected added value, which SMEs can expect from ML. The expectation can be higher at the personal level and at the corporate level, the higher the expected added value, the greater might be the chances of a successful ML implementation. As soon as incentive values in the form of success are missing, no willingness to deal with a topic can be expected [22]. Consequently, to enhance the likelihood of adoption, SMEs need to perceive the introduction of ML as promising and economically viable for their business [21]. Thus, it is assumed that the subjectively perceived importance of ML, therefore, has an influence on implementation of ML in SMEs. This results in the fourth hypothesis derived as follows:
H4
The more strongly the machine learning importance (IMT) is perceived, the higher the implementation probability of ML in SMEs.

3 Field study on implementation probability

In the following, the methodology used to conduct the empirical study is presented. In line with the guiding interest to increase the implementation probability of ML in SMEs, a quantitative survey was conducted using a web survey in which practitioners from SMEs in Europe were interviewed. Through the web survey, access was gained to a wide variety of companies in a very short time and quickly generated anonymous responses through our questionnaire. The design and focus of the empirical study, which is described in detail in the following, places the predominantly scientific-theoretical analyses in an application-oriented context with real problems from industry. Data collected through this survey were analyzed using MLR.

3.1 Study design

The online survey method was chosen to collect the data relevant to the research interest. The survey was created with the help of the SoSci-Survey tool. Based on extensive literature research, a formulation of content-relevant questions was feasible, which completely maps the dependent variable, the criterion implementation probability of the four independent variables, and the predictors: EXP, WTP, DMR, and IMT (see Table 1).
Table 1
Hypothesis model—the relationship of predictors
Research question
What are the influencing factors regarding the implementation probability of ML in SMEs?
Hypothesis
Statement
Description
Hypothesis 1 (H1)
Machine learning experience
The more pronounced the machine learning experience, the higher the implementation probability
Hypothesis 2 (H2)
Willingness to pay
The greater the impulse Willingness to pay the higher the implementation probability
Hypothesis 3 (H3)
Data mining readiness
The greater the data mining readiness, the higher the implementation probability
Hypothesis 4 (H4)
Machine learning importance
The higher the perceived machine learning importance, the higher the implementation probability
These formulated questions, hereafter described as items, were combined in a questionnaire (see Table 2). The questionnaire comprised 21 items, with 13 items relating directly to the above variables. The remaining eight items were intended to generate data from participants and companies, such as the age of the participant or the industry of the company. These are marked as PQ in Table 2. Predominantly 5-point Likert scales were used since these make personal attitudes measurable in detail due to their multilevel nature and have therefore also established themselves sustainably in science [23]. The different items were assigned to the predictors to guarantee a manageable analysis. The items EXP1—engaged ML, EXP2—current use of ML, and EXP3—ML competencies measure the construct EXP, the items WTP1—budget, WTP2—external service, and WTP3—internal staff the construct WTP, the items DMR1—employees, DMR2—departments, and DMR3—self equipped the construct DMR and the items IMT1—general importance, IMT2—potential areas and IMT3—relevant use cases measure the construct IMP. The target item IMPL measures the implementation probability of ML. The following Fig. 1 summarizes the initial overall framework, including the one dependent variable, the four predictors and the underlying hypotheses, as well as the various items.
Table 2
Items hypothesis and questions of the web survey
Segment
Question
Answer
ML-experience
EXP1—engaged ML
How intensively have you already dealt with the topic of "Machine Learning /ML" for your company, for example by participating in training courses or workshops?
A: not at all
B: very little
C: partly
D: intensive
E: very intensive
EXP2—current usage ML
How many departments do you already use ML in your company?
A: 1 department
B: 2 department
C: 3 department
D: 4 department
E: 5 or more departments
EXP3—competencies
How do you assess your ML competencies? For example, have you already learned about ML or have you already had some project experience?
A: very low
B: low
C: moderate
D: high
E: very high
Willingness to pay
WTP1—budget
Is a budget amount set for the development or implementation of ML solutions?
A: up to € 15,000
B: over € 15,000 to € 25,000
C: over € 25,000 to € 50,000
D: over € 50,000
up to € 100,000
E: over € 100,000 to € 500,000
F: over € 500,000
WTP2—external service
Will you invest in external service providers for the implementation of the ML solution in the future?
A: definitely not
B: rather unlikely
C: not yet decided
D: rather likely
E: definitely
WTP3—internal staff
Will you invest in more internal staff to implement ML solutions?
A: definitely not
B: rather unlikely
C: not yet decided
D: rather likely
E: definitely
Data mining readiness
DMR1—
employees
How many employees in the company have the necessary qualifications for ML and are responsible for it?
A: 1 employee
B: 2 employees
C: 3 employees
D: 4 employees
E: 5 or more employees
DMR2—departments
How many departments in your company are involved in the introduction or expansion of ML?
A: no departments
b: one department
c: two departments
d: three departments
e: four or more departments
DMR3—self-equipped
How well do you see yourself equipped for the introduction or expansion of ML in your company? Please think here, for example, of personnel, IT equipment, or similar
A: very poor
b: poor
c: satisfactory
d: good
e: very good
ML-importance
IMT1—general importance
How important is ML for your company?
A: not at all important
b: not important
c: not yet decided
d: important
e: very important
IMT2—potential areas
How great is the potential in your company for the ML deployment?
Average of the ranges:
A: no potential at all
To I: extremely high potential
IMT3—relevant use cases
Which use cases do you see as relevant for your company?
A: use case 1
B: use case 2
C: use case 3
D: use case 4
E: use case 5
Implementation probability
 
Would you like to establish further ML solutions in your company in the next few years? For example, for the prediction of machine failures, anomaly detection in machine conditions, or the prediction of building quality?
A: not planned at all
B: not planned
C: not yet decided
D: planned
E: already initiated
Personal questions
PQ1
How old are you?
A: under 20 years
B: 20–29 years
C: 30–39 years
D: 40–49 years
E: 50–59 years
F: 60–69 years
G: over 70 years
PQ2
What is your highest educational qualification?
A: master craftsman/technician
B: university graduate
C: vocational training
D: doctorate
E: habilitation
F: no school-leaving
qualification
PQ3
What is your position within your company?
A: manager with personnel
responsibility
B: managers without personnel
responsibility
C: employee
PQ4
How long have you been working in your company?
A: 0–5 years
B: 6–10 years
C: 11–15 years
D: 16–20 years
E: over 20 years
PQ5
In which department within your company do you work?
[Free text]
PQ6
How many employees does your company have?
A: Less than 10 employees
B: 10–49 employees
C: 50–249 employees
D: 250–499 employees
E: Over 500 employees
PQ7
What is the turnover of your company?
A: €0–€2 mil
B: Over € 2 mil up to € 10 mil
C: Over €10–€50 mil
D: Over € 50 mil
PQ8
In which industry does your company operate?
A: automotive suppliers
B: electrical industry
C: precision mechanics and optics
D: aircraft and spacecraft
construction
E: mechanical engineering and
plant construction
F: metal industry
G: other
Table 3
Participants and company data
Characteristic
Expression
Frequency
Age
(in Ages)
 
< 20
1
20–29
8
30–39
18
40–49
13
50–59
12
60–69
3
> 70
1
Company Turnover
(in m. €)
 
< 2
14
2–10
22
10–50
20
 > 50
0
Employees
(Number)
 
< 10
9
10–49
19
50–249
28
> 250
0
Industry
(–)
 
Automotive
8
Electrical industry
6
Machine and plant construction
24
Metal industry
3
Others
15
An overview of the criterion as well as the predictors and the associated questions is listed in Table 2 below. According to Allen (2022), the formulated questions must be additionally reviewed by scientists to ensure content validity [24]. Content validity was verified by four scientists as well as four experts from the industry.

3.2 Respondent profile

The data from the web survey was collected in the period from March 2023 to September 2023. Overall, there was a response rate of 20% of the international SMEs contacted, which amounts to a net sample size of 60 participants.
Table 3 provides an overview of the subjects and company data of the present sample. These data show that there is a broad diversification of the 56 companies regarding their industrial sectors. In terms of sectors, mechanical engineering and plant construction is the largest sector (43%), after that automotive (14%), the electrical industry (11%), and the metal industry (5%) as the smallest sector. Further industries such as for example medical, financial, and mining industry were combined in others (15%). The evaluated parameters also provide information on the turnover and the number of employees, which are within the definition of an SME. The general conditions for an SME, with a maximum of 249 employees and a maximum turnover of 50 million euros, are met. In summary, the responses collected represent a valid sample of SMEs covering a wide spectrum of industries.

3.3 Model fit and survey bias

The data collected via the online survey was entered into an Excel spreadsheet and then analyzed with the program SPSS 28.0.1.1. In the first step, the dataset was checked for missing data. The item WTP1, which deals with the topic of Engagement on ML, had to be excluded from the analysis, as only 30% of all respondents answered the corresponding questions. All other items were completely answered by all participants, so no further action was necessary due to missing entries. Subsequently, an exploratory factor analysis was conducted to understand which items can be statistically combined into one factor [14]. In general, items with high factor loadings from the exploratory factor analysis were considered more representative of a given factor, indicating a stronger correlation to IMPL. Due to the problem, that exploratory factor analysis results could not be meaningfully interpreted in terms of content, the single-item construct approach was used, to consider the issue of multicollinearity. Multicollinearity can occur in the context of MLR, and the variance inflation factor (VIF) was used as a measure to test for multicollinearity. Because high multicollinearity can distort the results of a regression analysis, making it difficult to isolate the effect of individual predictors, factors with a VIF value greater than 10 were excluded from the regression analysis [25]. Therefore, due to their VIF values exceeding 10, the items EXP2, EXP3, WTP3, DMR1, DMR3, IMT2 and IMT3 were excluded from the regression analysis. This exclusion was necessary because these items had a high degree of overlap in their conceptual content, i.e. they essentially measured similar or redundant aspects of the predictors, potentially biasing the results of the analysis.
After the re-run, the VIF values for the remaining factors were as follows: EXP: 2.24, WTP: 1.65; DMR: 1.74; IMT: 2.13. The highest value of a significant predictor is 2.24, which is marginal and in an acceptable range, below a VIF value of 10 [25].
Only the four items EXP1 (engaged ML), WTP2 (external service), DMR2 (departments in the company), and IMT1 (general importance of ML) were found to be statistically significant and independent of other items after the explorative factor analysis (see Table 4).
Table 4
Selection of items for the regression analysis
Predictor
Item
VIF value
Included?
EXP
EXP1
2.24
Yes
EXP2
17.34
No
EXP3
19.31
No
WTP
WTP1
Too few responses
No
WTP2
1.65
Yes
WTP3
21.65
No
DMR
DMR1
15.18
No
DMR2
1.74
Yes
DMR3
13.47
No
IMT
IMT1
2.13
Yes
IMT2
12.92
No
IMT3
15.45
No
Therefore, according to Hair's (2010) best practice, all requirements to apply MLR to a dataset have been fulfilled. Filtering out the confounding variables is essential to represent, describe, and interpret the system as accurately as possible [14]. The confounding variables are typically either the actual pattern of the dataset or significantly affect the regression results, skewing the results. Accordingly, the more precisely a system can be described, the better it can be controlled [26].
First, outliers were identified utilizing an analysis of the standardized residuals. As part of this review, four observations were removed that exceeded the threshold of 1,96 (the critical t value at the 0.05 confidence level) and were thus identified as outliers. The value of 1,96 is widely used in the literature when considering standardized residuals for small to medium sample sizes [14]. For sample sizes (of 50 or more), the standardized residuals approximately follow the t-distribution, such that residuals exceeding a threshold of 1.96 (the critical t-value at a 0.05 confidence level) are considered statistically insignificant. The reduction in the number of valid participants from 60 to 56 after excluding outliers does not represent an exceedance of the recommended sample size for MLR, which must have a minimum sample size of 50. An important parameter for the detection of further outliers is the Mahalanobis distances. To analyze the Mahalanobis distances, the data is examined for leverage values, which are a measure of how far the value of an independent variable is from other values. A general rule for a leverage value is that the highest leverage value is twice as large as the lowest value [14]. In this empirical study, there was no evidence of leverage values, which is why no other outliers can be identified and the final sample size remains at 56 subjects. Finally, Cook distances were used to test the model for negative influential observations. According to Hair (2010), no Cook distance of more than one can exist, or there is a negative influence. However, no observation was found that had a Cook distance of more than one. Consequently, there is no reason to assume further influential confounding variables.
Non-response bias occurs when those who do not participate in a survey differ systematically from those who complete the survey in a way that is significant to the research study. Here, Armstrong and Overton's (1977) recommendation that the sample be divided into early and late responders was followed [27]. These are then tested for differences with known information about the population, such as age, using a T-test [27]. In this case, no significant differences were found, and thus, non-response bias was considered as not present (p = 0,836).
Summarized, the 60 participants were reduced to 56 to be considered within the empirical study by filtering out outliers.

4 Validation and analysis

In the following chapter, the analysis data are presented and explained. Preliminary model goodness, regression coefficients, significance, and multicollinearity are discussed. In addition, the validity of the study is examined. For this purpose, the linearity is examined, and the residuals are analyzed for four residual regressions: Linearity of the relationship between criterion and predictors, the constant variance of the residuals (homoscedasticity), normal distribution of the residuals, and independence of the residuals. This is done utilizing scatterplots and normal probability plots.

4.1 Validation of the study

Several important steps were taken to ensure the validity of this study. To begin, content validity was checked, and multicollinearity and bivariate correlations were also excluded. These were examined in the previous chapter as a limitation to conducting the regression analysis. In addition, the residuals are examined below to identify any erroneous data points that may be present, since in this case, endogeneity bias would negatively affect the analysis. The procedure that has proven successful for the residual analysis is that of Pagan and Hall (1983) [28]. Four essential criteria are considered in particular: Linearity of the relationship between criterion and predictors, the constant variance of the residuals (homoscedasticity), normal distribution of the residuals, and independence of the residuals (see Fig. 2). The test is performed with the help of statistical and graphical diagnostics and the residuals were standardized; this makes the residuals more comparable.

4.1.1 Validation of the study

First, the assumption of linearity was tested. For this purpose, residual plots were created and examined. When looking at the scatter plots in Fig. 3, no pattern can be seen. The residues are randomly distributed and do not cluster at any point. In addition, no curvilinear patterns, or triangles can be seen. This indicates that the variance is not random, and the linearity is proofed.

4.1.2 The constant variance of the residuals

One of the most violated assumptions is the constant variance of the residuals [14]. To test the overall model, the scatter plot was again examined. In this plot, one can see very well the linear and uniform distribution of the points of the residuals. No pattern in the form of a triangle can be seen (Fig. 4), which would have resulted if the assumption had not been fulfilled. Thus, the homoscedasticity assumption is not violated, and the model is valid.

4.1.3 Normal distribution of the residuals

The third assumption tested relates to the normal distribution of the residuals. Especially for small samples, the usual verification by histograms is not recommended [14]. Instead, the assessment of a normal probability plot of the overall model is resorted too [14]. For the assumption to be fulfilled, the residuals must be located as accurately as possible along the diagonal line representing the normal distribution. Figure 5 illustrates a fulfillment of the previously mentioned condition The analysis shows that the residuals follow the normal distribution so that overall, the assumption of a normal distribution of the residuals can be considered fulfilled.

4.1.4 Independence of the residuals

The last assumption to be tested concerns the independence of the residuals. The so-called autocorrelation depends strongly on the design of the study. It is very unlikely that the measurements or the residuals are dependent if they are not time series data [29]. In this study, the time series of the residuals are not followed, and inferential statistical analysis of time series and prediction of trends to their future development is also not performed. In such cases, the test of independence of the residuals can be omitted. Therefore, in this study design, it is not necessary to test for independence of the residuals, and thus the last assumption can also be declared confirmed.

4.2 Analysis

The bivariate correlations from the Pearson correlation matrix (Table 5) were used to test for excess collinearity. This test method is limited to the jointly explained variance range of the criterion of only two predictors. In this case, all combinations of predictors were below Hair's critical threshold of 0.7 [14]. Therefore, disproportionate collinearity could be rejected (Table 5). The system has the same accuracy for all reference values, so there are no problems with the linearity of the overall system. In this empirical study, the F-test yielded a high value of 26.2, which allowed the system to be declared statistically significant. In addition, 65% of the variance of the predictors was explained during the examined combination of independent variables (Corr. R2 = 0.65). This showed that the result did not arise by chance, nor was there a negative correlation between the criterion and the predictors. The information on the individual parameters is summarized in Table 5.
Table 5
Data of model goodness, regression coefficients, significance, and multicollinearity
Var
Beta
Sigma
VIF
R2
Corr R2
F
Model
   
0.67
0.65
26.20
EXP
− 0.07***
0.548
2.24
   
WTP
0.33**
0.002
1.65
   
DMR
0.27*
0.013
1.74
   
IMT
0.44
< 0.001
2.13
   
*p < 0.05; **p < 0.01; ***p < 0.001
Because all necessary requirements have been met, the model can be statistically analyzed using MLR.
$$Y = 0.33*WTP + 0.27*DMR + 0.44*IMT$$
Within this model, WTP (beta = 0.33; p < 0.01), DMR (beta = 0.27; p < 0.05), and IMT (beta = 0.44; p = 0.001) with a positive correlation to the implementation probability of machine learning in SMEs proved to be statistically significant, thus the null hypothesis can be confirmed. This means that the observed effects of the predictors WTP, DMR, and IMT on the observed criterion IMPL are significant and therefore a practical relationship exists. A beta value of 0.33 for the WTP predictor implies that each one-unit increase in WTP increases the probability of implementation by 0.33 times the standard error of WTP. Similarly, a one-unit increase in DMR increases the probability of implementation by 27% of the corresponding standard error. Similarly, an increase in IMT by one unit leads to an increase in the probability of implementation by 44% of the standard error of IMT. The predicator EXP (beta =  − 0.07 p = 0.55) has a negative beta value, which means that the probability of implementing machine learning in SMEs decreases slightly with each unit increase in EXP. However, this effect was not considered statistically significant, as the threshold value of p > 0.05 was exceeded here. Due to the non-significant values of EXP, no statistically proven statements can be made. The null hypothesis is therefore rejected due to the non-significant influence.
To emphasize the practical implications and relevance of the predictors, this analysis was enriched by extending the focus beyond the statistical significance of the findings. Therefore, a detailed examination of how the predictors align with the initial hypotheses was undertaken, providing a more nuanced understanding of the study's practical implications.
Answer to H2 Hypothesis 2 deals with the WTP and could be confirmed. Likewise, if the companies are willing to take a possible risk of an investment, i.e., the impetus for investment is high, the more likely they are to invest in ML solutions (WTP, beta = 0.33; p < 0.01). The investment of capital on the part of SMEs in necessary technologies as well as in the further training of employees is assigned a high priority and ranks second. This finding may guide SMEs prioritize investments in technology and staff training.
Answer to H3 Hypothesis 3 could be confirmed as well. If the SMEs have an open-mindedness and willingness to deal with hurdles and improvements in the company, the implementation of ML is likely. (DMR, beta = 0.27; p < 0.05). This highlights the importance of a proactive and adaptable organizational mindset and can help SMEs to encourage a working environment that embraces innovation and change.
Answer to H4 Finally, the last hypothesis 4 was also confirmed as true. The IMT (IMT, Beta = 0.44; p = 0.001) has the strongest positive correlation and is, therefore, to be verified as the most important confirmed hypothesis for the implementation. Accordingly, as soon as the introduction of ML is seen as promising for the company and thus the expected value creation increases, the likelihood of using ML technologies also increases. SMEs could use this finding to evaluate and improve their perception of ML, aligning them with their strategic goals to increase the likelihood of a successful ML implementation.
In summary, it can be said that three of the four hypotheses from the empirical study have been confirmed in industrial practice. Willingness to pay, data mining, and the importance of ML are necessary foundations for the implementation of ML technologies. The IMT has the greatest effect on SMEs with a beta of 0.44. Followed by the DMR with high quality and quantity of given data in the company. So DMR is suitable for the use of ML with a beta of 0.33. Finally, and almost as meaningful as the DMR, it can be said that an SME must be willing to make high investments in new technologies. The WTP has a beta of 0.27. Furthermore, these hypotheses are in synergy with each other. The higher the perceived importance, the more the topic will be addressed, and the more data will be collected for a possible implementation. In addition, the higher the expected added value and the higher the chances of success, the more likely it is that investments will be made in ML technologies.
H1, on the other hand, must be regarded as falsified, since no statistically significant correlation between EXP and the implementation probability of ML in SMEs could be established in the context of the study.
In conclusion, hypotheses 2, 3, and 4 were confirmed as significant by the model, only hypothesis 1 had to be falsified.

5 Discussion

Regarding the formulation of recommendations for action to increase the likelihood of ML adoption in SMEs, the following could be pronounced based on the results of the regression analysis: To increase the importance of ML, a detailed examination of the topic on a theoretical level is necessary. Only with the help of an active examination it is possible to recognize both the potential and the current relevance of ML. This can be of great importance as an impulse generator for the introduction of ML systems in SMEs. Also, the first practical applications in one's own company as well as the cooperation with research institutions and other, non-competing SMEs promote the understanding of the importance of ML. It is helpful to identify a concrete problem or area to be improved with ML. Data collection and preparation are of particular importance. In addition, DMR should be considered and the data to be used should be sufficient and representative. After collecting relevant data from internal or external sources needed for training the ML system, it must be cleaned, such as removing outliers. Data must be generated from a variety of sources, depending on the problem that ML is intended to solve. This can include internal data from enterprise systems, publicly available data, or specialized data sets compiled by third parties for specific tasks. The quality of the data has a significant impact on the performance of the ML model. The better the data quality and representativeness, the better the model can learn and make accurate predictions.
In addition to the DMR and IMT, WTP also benefits from a theoretical discussion of the topic, because a realistic idea of the possibilities and the realization that the introduction of ML is realistic and by no means unaffordable scenario increases the willingness to make financial investments. ML can help to improve the performance of systems and processes and thus greatly improve the efficiency of company operations. Using ML models, complex patterns and relationships can be identified in large amounts of data that are difficult or impossible for humans to perceive. SMEs should therefore look at their processes and how they can be improved, making ML a very good option.
Furthermore, SMEs need to realize the potential of ML applications and make ML adoption more likely by actively influencing the statistically proven constructs IMT, WTP, and DMR in terms of adoption probability. For the simple reason that adoption leads to a significant increase in efficiency as well as business competitiveness [1].
In addition, it becomes clear that in contrast to previous studies, which focus on the quantitative facts of implementation or non-implementation, it can be clearly emphasized that companies do not simply want to blindly implement ML, but rather concentrate on identifying and solving real problems. The focus here is on the preparatory, fundamental identification of paint points and the development of corresponding alternative solutions. ML solutions should only be implemented if they offer demonstrable added value and represent the best solution for a real challenge for the respective company. A technology-pull approach should be followed instead of a -push approach.

6 Limitations and future research

The present study is not without limitations. First, the focus of the international, empirical study conducted was on SMEs. It is noteworthy that a substantial portion of the responses received originated from Germany, which may have influenced the findings. As economic regions are not identical and therefore cannot necessarily transfer to other international manufacturing companies, other economic markets might be subject to different constraints. However, the results are demonstrably reliable and statistically free of bias. To eliminate the existing limitations, future in-depth research is encouraged that includes larger sample sizes and covers more diverse, international areas. A larger survey could strengthen the statistical significance of the statements.
A single-item scale had to be used to evaluate the data. This was necessary because the results of the explorative factor analysis could not be interpreted in a meaningful way and many factors had to be excluded as a result. Since each category consists of only one item, the regression analysis was therefore based on a single-item scale, for which only one item per predictor is needed. Due to their lower reliability compared to multi-item scales, the results are less meaningful but still reliable and informative. The reliability for many single-item measures is not quite as high as for multi-item measures, but it is still sufficiently high, as studies have shown [30]. Thus, adequate measurement accuracy can be assumed for these measures. Moreover, research shows that many single-item measures are very valid, as they show high convergence to conventional multi-item measures or predict theoretically relevant criteria well [24, 30, 31].
Furthermore, as a result of the choice of a single-item scale, conventional reliability and validity tests could not be used. The corresponding alternatives require correlation analyses between the results of the same survey at two measurement time points or between the present single-item scale and a corresponding multi-item scale. However, these data were not available in the context of this study, so they could not be performed [24]. The results should therefore be interpreted with caution, as latent constructs are not as well reflected with the single-item scale as with the multi-item scale. Therefore, there is the possibility to revalidate the measurement instrument so that the use of a multi-item scale can be applied, which has a higher reliability. It is crucial to maintain applied research with the participating companies.
Nevertheless, measures were taken to ensure a high level of knowledge among the participants in the online survey, and a representative group of industry practitioners was assembled for the online survey. All respondents are directly involved in the design of ML in their respective manufacturing companies. Regarding the hypotheses, it can be noted that they are compatible with the existing literature and the resulting findings are conclusive.

7 Conclusion

This paper addresses the critical research gap in the highly dynamic applied research field of ML implementation in SMEs, focusing on the factors that influence its likelihood and quantifying their individual impacts. Despite the growing recognition of the potential of ML in SMEs, there has been a lack of comprehensive analysis of the specific predictors that drive ML adoption in this sector. Our study fills this gap by combining scientific analysis with industrial practice through an extensive web survey, incorporating feedback from practitioners in 56 unique companies across key economic sectors and industries such as mechanical engineering, plant construction, automotive, and electrical industries.
In this paper, we considered a multiple linear regression model with four predictors and analyzed whether they have an impact on the implementation of ML in SMEs. It was observed that IMT, WTP, as well as DMR, increase the implementation probability of ML in SMEs. The higher the perceived importance for an SME, the more the topic will be addressed, and the more data will be collected for a possible implementation. In addition, the higher the expected added value and the higher the chances of success, the more likely it is that investments will be made in ML technologies. Furthermore, it was found that machine learning experience did not have a significant impact on the probability of implementation. These findings can be used to help SMEs set priorities. This study reflects the current situation of the companies surveyed. The situation may change in the coming years. Research can help further facilitate SMEs' access to ML technologies through appropriate framework conditions that reduce the need for technical knowledge and are adapted to the SMEs' requirements. It is also recognized that the survey was conducted among a relatively small group of companies. Therefore, only derive qualitative statements can be derived. However, the results are consistent with surveys involving a larger number of respondents and are logically comprehensible.

Acknowledgements

The research leading to these results received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Literatur
1.
Zurück zum Zitat Iftikhar N, Nordbjerg FE (2022) Implementing machine learning in small and medium-sized manufacturing enterprises. In: Andersen A-L, Andersen R, Brunoe TD et al (eds) Towards sustainable customization: bridging smart products and manufacturing systems. Springer International Publishing, Cham, pp 448–456CrossRef Iftikhar N, Nordbjerg FE (2022) Implementing machine learning in small and medium-sized manufacturing enterprises. In: Andersen A-L, Andersen R, Brunoe TD et al (eds) Towards sustainable customization: bridging smart products and manufacturing systems. Springer International Publishing, Cham, pp 448–456CrossRef
2.
Zurück zum Zitat Mahesh B (2019) Machine learning algorithms-a review. Int J Sci Res (IJSR) 9(1):381–386 Mahesh B (2019) Machine learning algorithms-a review. Int J Sci Res (IJSR) 9(1):381–386
6.
Zurück zum Zitat Schölkopf B (2022) Causality for machine learning. In: Geffner H, Dechter R, Halpern JY (eds) Probabilistic and causal inference, vol 27. ACM. New York, NY, USA, pp 765–804CrossRef Schölkopf B (2022) Causality for machine learning. In: Geffner H, Dechter R, Halpern JY (eds) Probabilistic and causal inference, vol 27. ACM. New York, NY, USA, pp 765–804CrossRef
7.
Zurück zum Zitat Bauer M, van Dinther C, Kiefer D (2020) Machine learning in SME: an empirical study on enablers and success factors Bauer M, van Dinther C, Kiefer D (2020) Machine learning in SME: an empirical study on enablers and success factors
8.
Zurück zum Zitat Thiée LW (2021) A systematic literature review of machine learning canvases. Gesellschaft für Infomatik e.V. (GI) GI. (Hrsg.): INFORMATIK 2021, Lecture Notes in Informatics (LNI) Thiée LW (2021) A systematic literature review of machine learning canvases. Gesellschaft für Infomatik e.V. (GI) GI. (Hrsg.): INFORMATIK 2021, Lecture Notes in Informatics (LNI)
10.
Zurück zum Zitat Fahrenschon G, Kirchhoff AG, Simmert DB (2015) Mittelstand - Motor und Zukunft der deutschen Wirtschaft. Springer Fachmedien Wiesbaden, WiesbadenCrossRef Fahrenschon G, Kirchhoff AG, Simmert DB (2015) Mittelstand - Motor und Zukunft der deutschen Wirtschaft. Springer Fachmedien Wiesbaden, WiesbadenCrossRef
14.
Zurück zum Zitat Hair JF (2010) Multivariate data analysis: a global perspective, 7th edn. Prentice Hall, Upper Saddle River Hair JF (2010) Multivariate data analysis: a global perspective, 7th edn. Prentice Hall, Upper Saddle River
22.
Zurück zum Zitat Hootstein EW (1994) Enhancing student motivation: make learning interesting and relevant. Education 3–13(114):475 Hootstein EW (1994) Enhancing student motivation: make learning interesting and relevant. Education 3–13(114):475
26.
Zurück zum Zitat Nicklas SJ, Paetzold K (2020) Informationsaustausch in Prototypingprozessen: Bestimmung und Beschreibung von Störgrößen. In: Proceedings of the 31st Symposium Design for X (DFX2020). The Design Society, pp 151–160 Nicklas SJ, Paetzold K (2020) Informationsaustausch in Prototypingprozessen: Bestimmung und Beschreibung von Störgrößen. In: Proceedings of the 31st Symposium Design for X (DFX2020). The Design Society, pp 151–160
29.
Zurück zum Zitat Allen MP (1997) Understanding regression analysis. Springer, US, Boston, MA Allen MP (1997) Understanding regression analysis. Springer, US, Boston, MA
Metadaten
Titel
Machine learning implementation in small and medium-sized enterprises: insights and recommendations from a quantitative study
verfasst von
Peter Burggräf
Fabian Steinberg
Carl René Sauer
Philipp Nettesheim
Publikationsdatum
10.04.2024
Verlag
Springer Berlin Heidelberg
Erschienen in
Production Engineering
Print ISSN: 0944-6524
Elektronische ISSN: 1863-7353
DOI
https://doi.org/10.1007/s11740-024-01274-2

    Marktübersichten

    Die im Laufe eines Jahres in der „adhäsion“ veröffentlichten Marktübersichten helfen Anwendern verschiedenster Branchen, sich einen gezielten Überblick über Lieferantenangebote zu verschaffen.