Skip to main content
Top

Multi-objective Clustering Algorithm Applied to the MathE Categorization Problem

  • Open Access
  • 27-01-2026

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This article explores the application of a multi-objective clustering algorithm to categorize questions on the MathE platform by difficulty level, incorporating both student and lecturer perspectives. The study highlights the limitations of traditional two-level categorization and introduces a novel approach using a multi-objective clustering algorithm (MCA) to group questions more accurately. The MCA is compared with the k-means algorithm, demonstrating its superiority in balancing the distribution of questions across difficulty levels. The article also discusses the importance of active learning and personalized education, emphasizing the role of technology in enhancing mathematical learning. Key findings include the effectiveness of the MCA in providing diverse optimal solutions, its ability to automatically define the optimal number of centroids, and its potential to improve student engagement and academic performance. The study concludes with recommendations for future research and development in educational personalization and clustering algorithms.
Gabriel Leite, M. Fátima Pacheco, Florbela P. Fernandes, Ana Maria A. C. Rocha and Ana I. Pereira contributed equally to this work.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Active learning encompasses strategies where students actively engage in activities beyond passive listening to lecturers (Alhawiti, 2023; Meyers & Jones, 1993). Students participate in the discovery process, information processing, and application tasks during active learning. Active learning is based on two fundamental assumptions: first, that learning inherently involves active participation, and second, that individuals who prefer diverse learning styles should be supported (Meyers & Jones, 1993). Research has consistently demonstrated enhanced learning outcomes when students actively participate in the learning process (Meyers & Jones, 1993). Nevertheless, it is crucial to acknowledge that traditional lecturers are significant and that active learning should always be accompanied by well-defined content and learning objectives (Meyers & Jones, 1993).
The development of learning skills and the learning process are strongly dependent on the active involvement of students in their education (Alhawiti, 2023). The case study research shows that active teaching strategies increase lecturers’ attendance, engagement, students’ acquisition of expert attitudes towards the subject, and engagement improves students’ cognitive, learning, and skills while also improving engagement (Alhawiti, 2023). Besides, students expressed a high level of satisfaction with active learning methods (Alhawiti, 2023; Deslauriers et al., 2019).
Individuals differ in their profiles of strengths and weaknesses across different types of intelligence, and educational and professional success depends on the ability to enhance their strengths and compensate for weaknesses (Gardner, 1999, 2011). The recognition of individual learning styles and the implementation of personalized learning resources has been shown to improve educational experiences and results in an increasing number of studies (Lwande et al., 2021). Nonetheless, there are still notable differences in the data about how well this method works for various learning styles (Villegas-Ch et al., 2024). There is a gap in the thorough understanding of its application and advantages because some current research only concentrates on a particular style or in limited contexts (Pritalia et al., 2020). The work (Villegas-Ch et al., 2024) presents the implementation of a machine learning system designed to identify students’ learning styles and adapt educational content accordingly. Additionally, the authors assess the tangible effects of these customized approaches on students’ engagement with the subject matter and academic achievement. The findings imply that personalized learning is a potent and successful instrument that can enhance both the educational experience and student performance at the same time. As a result, the study (Villegas-Ch et al., 2024) provides a workable model for the effective application of artificial intelligence-supported educational personalization while reiterating its importance.
Under this scenario, the MathE platform emerges as an innovative, dynamic, and intelligent digital tool for teaching and learning Mathematics. MathE is a collaborative e-learning platform that enhances students’ mathematical learning processes in higher education. Its core objective is to cultivate virtual learning and foster knowledge exchange (Azevedo et al., 2021, 2024). MathE is a pioneering platform that significantly departs from traditional learning, ushering in an interactive and engaged learning paradigm. MathE operates as a nonprofit initiative, offering unrestricted access 24 hours a day, serving as a valuable resource for anyone eager to increase their mathematical knowledge and comprehension.
Currently, the MathE platform has 1824 multiple-choice questions, organized into two difficulty levels: basic and advanced (Azevedo et al., 2024). This rating is assigned by the registered lecturer who inserted the question on the platform. Nevertheless, previous research (Azevedo et al., 2021) has shown that the two-level categorization is inadequate for effectively distinguishing the available content. Additionally, it has been verified that most of the students who use the MathE platform only perform the basic level questions; they rarely attempt to answer the advanced level ones, causing demotivation in using the platform. Moreover, from previous work (Azevedo et al., 2024) and students’ feedback, it is known that the opinion of students and lecturers about the complexity of the questions is often divergent: while for a lecturer, a question may be very basic, for a student, the same question may be considered very difficult, and vice versa. To group the questions and obtain a more accurate distribution by difficulty level, it is important to consider both perspectives: students and lecturers.
The dataset used in this work is based on the answers stored on the MathE platform system, which is stored in terms of correct or incorrect answers. Therefore, the historical data for each question is evaluated, taking into account the number of correct or incorrect answers to define two different variables (the error rate and the cumulative score for each question), which will define the difficulty of the question from the students’ point of view. After that, the students’ opinion is combined with the lecturers’ instruction through a weighted average, defining a score for each question. This score will be used to group the questions into different difficulty levels. To perform this procedure, the MCA is proposed, and the classical k-means algorithm (Arthur & Vassilvitskii, 2007) is used for results comparison. This work presents an improved version of the MCA in Azevedo et al. (2024), which was dependent on an initial parameterization and used the NSGA-II in the optimization process. In this paper, a refined mathematical problem formulation is put forth, which does not require parameterization, and for optimization procedure this work advocates the adoption of MOPSO (Coello-Coello & Lechuga, 2002), deeming it to be more robust and more aligned with the proposed approach.
The MCA proposed in this work is a bio-inspired clustering algorithm that uses a multi-objective strategy to provide a set of optimal solutions, allowing the decision maker to select the most fitting solution for the problem since it recognizes that the decision maker possesses vital information essential to addressing the issue effectively (Azevedo et al., 2025). The MCA intends to minimize an intra-clustering measure and simultaneously maximize an inter-clustering measure to automatically define the optimal number of centroids and the element’s optimal distribution. The MCA is detailed in Section 5. This paper represents a noteworthy contribution by introducing an automatic bio-inspired clustering algorithm. Through an optimization process, the algorithm autonomously determines the number of cluster centroids and the distribution of elements. This addresses major concerns commonly associated with clustering algorithms documented in the literature (Azevedo et al., 2025). Since the MCA is a multi-objective algorithm, different from traditional clustering algorithms, it defines a set of optimal solutions. This provides decision-makers with various solutions, introducing flexibility to handle the complexities of the problem. It is often challenging to incorporate certain knowledge about the data into the mathematical model. Therefore, providing the decision-maker with a set of optimal solutions is valuable, allowing them to use their understanding of the data to choose the best solution for the problem. Additionally, one of the main advantages of MCA is the ability to customize the algorithm according to specific problem constraints, such as the minimum and maximum number of clusters and the quantity of elements in each cluster, for example. A realistic scenario is presented and discussed to demonstrate the added value of the approach.
This paper is organized as follows: after the introduction, Section 2 presents some concepts of active learning and research involving the development of algorithms and digital systems involving this concept. After that, the MathE platform is defined in Section 3. Section 4 presents the problem statement and the methods used to evaluate the question score, combining the lecturers’ and students’ opinions regarding the question’s difficulty level. The same section also defines the considered dataset. Section 5 describes the MCA and the presentation of the results and discussion achieved with the proposed methodology is shown in Section 6. Finally, this research’s main conclusion and future direction are described in Section 7.

2 Active Learning and e-learning Platforms

There are many differences in the way each student learns. Some of them are more logical and learn more using their logical abilities. Others prefer to use their bodies, hands, and sense of touch to learn (Gardner, 1999). According to Gardner (1999, 2011), people exhibit varying combinations of strengths and weaknesses in different areas of intelligence, and educational and professional success depends on the ability to leverage their strengths and compensate for weaknesses (Gardner, 1999, 2011). Hence, achieving effective teaching tailored to each student’s unique learning style is a formidable challenge.
The conventional teaching approach, often called lecturer-centered or passive learning, may not be the most effective method for all students. In the passive learning model, the lecturer is the primary authority figure, while students act as passive recipients (Rodríguez, 2012). Consequently, they passively acquire information through lectures and direct instruction, with the primary objective being positive outcomes in tests and assessments. On the other hand, active learning emphasizes the active role of students in the learning process, although the lecturer remains an authority figure. In active learning, the lecturer’s primary duty is to guide and facilitate students’ learning and understanding of the material, assessing their progress through various assessment methods, including group projects, student portfolios, and class participation. In this educational model, the classroom, teaching, and assessment are interconnected, as students’ learning is consistently assessed during lecturers’ instruction. This approach fosters a more equitable relationship between the lecturer and the student, each playing a crucial role in the learning process (Alhawiti, 2023; Farrow & Wetzel, 2021).
The Principles and Standards for School Mathematics, published by NCTM in 2000 (NCTM, 2000), outlines the essential components of a high-quality school mathematics program relevant to 21st century skills. The document lists six principles (Equity, Curriculum, Teaching, Learning, Assessment, and Technology), which, although defined more than 20 years ago, continue to make perfect sense in the present. Among the principles, Equity and Technology stand out. Equity does not imply uniformity in instruction for every student. Rather, it requires providing reasonable and appropriate adaptations and including appropriately challenging content to ensure access and success for all students. High-quality Mathematics instruction can enable all students to learn Mathematics while respecting their individual characteristics, backgrounds, or physical challenges (NCTM, 2000). Technology is essential in teaching and learning since the students can develop a deeper understanding of mathematics with the appropriate use of technology (NCTM, 2000) and teacher support.
Recently, there has been a significant increase in the integration of emerging technologies in education, particularly during the COVID-19 pandemic, which lasted for over two years (Treve, 2021; Al-Kumaim, 2021). This transformation has profoundly impacted conventional educational practices, leading to greater diversity in teaching and learning methods and fundamentally altering the trajectory of higher education in the future (Rangel-de Lázaro & Duart, 2023).
With the pandemic, the discussion about e-learning, active learning, and related terms has been accelerated dramatically (Engelbrecht et al., 2023). Thus, it is recommended that the educational system and learning environments should adapt their teaching approaches to address the increasing demand for students to engage in critical thinking, problem-solving, and skill development. This can be accomplished by emphasizing active learning strategies, fostering students’ personal growth, and enhancing their professional abilities (Alhawiti, 2023; Rangel-de Lázaro & Duart, 2023; Lara-Lara et al., 2023).
Several cases of using e-learning and active learning have emerged in this context. Duolingo, one of the world’s largest language learning platforms (Portnoff et al., 2021), employs these techniques to personalize the user experience and achieve more satisfactory results. Through gamification and adaptive learning, this platform makes the process of learning a new language easier, addressing individual difficulties and customizing the level of challenge for each user.
In the healthcare field, it has become evident that the application of active learning is beneficial for exercise recommendations and user adaptation (Mahyari et al., 2022). Personalization, particularly for new profiles when the training dataset is unavailable, represents a challenge that can be overcome with an active methodology. The concept of specialized and real-time active learning has proven to be more accurate, using feedback to estimate the system’s uncertainty, and involving an expert when it falls below a certain threshold.
Furthermore, in research-oriented teaching, the use of e-learning has shown significant advantages, as highlighted in the study carried out on the e-learning course Mathematical Analysis at Borys Grinchenko University of Kiev (Astafieva et al., 2020). This example illustrates the potential of e-learning to promote Mathematical competence and highlights the importance of appropriate pedagogical and methodological approaches to take full advantage of its benefits. In addition to these advantages, it is worth highlighting that the effective development of mathematical competence is only achievable through the active participation of students and collaborative interaction, as demonstrated by the Moodle platform used in the Mathematical Analysis subject. This emphasizes the importance of promoting active engagement and partnership between students and lecturers. Consequently, it is clear that, despite the advantages of e-learning, there are challenges associated with its effective use. As a result, the study exposes the need for further research to explore additional e-learning opportunities in the development of mathematical skills among students.
Despite being in different domains, the three reported cases in references (Portnoff et al., 2021; Mahyari et al., 2022; Astafieva et al., 2020), use e-learning tools for individual enhancement and adaptability, making the learning experience more rewarding and tailored to the user. Consequently, applying these learning techniques, even in diverse fields, has proven highly advantageous, indicating significant potential and the need for further research and development.

3 MathE Platform

MathE is an international platform with the goal of enhancing the quality of teaching, learning, and assessment methods in higher education Mathematics content. This platform is a non-commercial tool, being completely free for those interested in improving their knowledge and understanding of Mathematics. MathE has been online since 2019 at mathe.ipb.pt. The platform is currently being used by a significant number of users; to be more precise, there are currently 1468 students enrolled on MathE from 21 nationalities: Portuguese, Brazilian, Turkish, Tunisian, Greek, German, Kazakh, Italian, Russian, Lithuanian, Irish, Spanish, Dutch, Slovenian, Swedish, Ukrainian, French, Finnish, Bulgarian, Algerian and Romanian. Additionally, there are 119 lecturers from 21 countries and 60 higher education institutions registered.
Moreover, MathE has a YouTube channel (youtube.com/@matheproject4778) where all videos available on the platform are linked to each corresponding topic. It should be mentioned that there are two types of videos available on the platform; those carefully selected from the internet for the MathE lecturers’ team members and those exclusively produced by the MathE partnership to meet the platform’s specific needs. For more details about the platform, refer to Azevedo et al. (2024, 2021).
At its current stage, the platform covers fifteen topics and twenty-two subtopics, among those that constitute the classical core of graduate courses: Analytic Geometry, Complex Numbers, Differential Equations, Differentiation (including 3 subtopics: Derivatives, Partial Differentiation and Implicit Differentiation and Chain Rule), Discrete Mathematics (with 2 subtopics: Recursivity and Set Theory), Fundamental Mathematics (with 2 subtopics: Elementary Geometry and Expressions and Equations), Graph Theory, Integration (with 5 subtopics: Integration Techniques, Surface Integrals, Triple Integration, Definite Integrals and Double Integration), Linear Algebra (including 5 subtopics: Matrices and Determinants, Eigenvalues and Eigenvectors, Linear Systems, Vector Spaces, and Linear Transformations), Numerical Methods, Optimization (with 2 subtopics: Linear Optimization and Nonlinear Optimization), Probability, Real Functions of a Single Variable (with 2 subtopics: Limits and Continuity and Domain, Image and Graphics), Real Functions of Several Variables (with 1 subtopic: Limits, Continuity, Domain and Image) and Statistics, as presented in Fig. 1.
Fig. 1
Contents of MathE platform
Full size image
When selecting a topic and subtopic, students can answer a set of questions related to their chosen subject. At the end of the test, they receive feedback, and the questions are accompanied by supporting material such as videos and teaching materials produced by experts collaborating on the platform’s development. Figure 2 illustrates an example of the MathE supporting materials system. In this particular case, since the student provided an incorrect answer to the question, the platform showed the correct answer and also suggested a video and written material related to the content addressed in the question.
Fig. 2
Examples of MathE supporting materials
Full size image

4 Problem Statement

This section presents more details regarding the problem addressed in this paper and the methods applied to combine the lecturers’ and students’ opinions on the questions’ difficulty level. Initially, the procedures to evaluate the question’s score are defined. After that, the dataset utilized is presented.

4.1 Question Score Definition

To group the questions according to their difficulty level, it is necessary to assign the score of each q question that is associated with a subtopic of the platform. This score must express the lecturers’ and students’ opinions about the question’s difficulty.
Thus, to develop the approach, when submitting each question, the lecturer assigns a score between 1 and 5 based on the question level of difficulty. This score is denoted as \(Pscore_q\). Thus, the questions with the lowest difficulty level are assigned a score of 1, while those with higher difficulty levels receive a score of 5. After that, \(Pscore_q\) is normalized to the range [0, 1], to be used in this approach.
To evaluate the students’ opinions, historical data from the MathE platform collected between 2019 and 2023 was used. Based on this data, it is possible to define two variables (the question’s error rate and the question’s cumulative score) that reflect the students’ opinion. The average of these two variables will generate the students’ score \(Stdscore_q\), for each question q. More details can be seen in Azevedo et al. (2024).
Therefore, the error rate (\(ER_q\)) is described as the ratio between the incorrect answers and the total number of questions answered for question q, as presented in Eq. 1.
$$\begin{aligned} ER_q=\frac{NIA_q}{NA_q} \end{aligned}$$
(1)
where \(NIA_q\) represents the number of incorrect answers obtained in question q, and \(NA_q\) is the total number of answers considered for question q.
To avoid the stagnation problem around the value 0.5, as described in Azevedo et al. (2024), it is defined as a maximum number of answers \(NA_{max}\) that can be used in the question evaluation. Thus, only the last \(NA_{max}\) answers computed in the platform system are considered to define the question score. In this way, it is guaranteed variability in terms of different students’ perspectives.
On the other hand, the question cumulative score (\(S_q\)) represents the score over time achieved by the questions. To better explain this concept, consider that each correct answer is represented by 1 and incorrect answers by 0. Thus, 1 is added to the current question score for each correct answer. Similarly, for each incorrect answer, 0 is added. The higher the number of correct answers, the higher the question’s score, with the maximum score obtained if all answers are correct. After getting the cumulative score of each question, \(S_q\), the values are normalized to the range [0, 1], obtaining the value \(CS_q\) (2). More information about the cumulative score variable can be found in Azevedo et al. (2024).
$$\begin{aligned} CS_q= 1 - \left( \frac{S_q \times NA_{max}}{NA_q}\right) \end{aligned}$$
(2)
Thus, the student score (\(Stdscore_q\)) for each question is defined as the average between the \(ER_q\) and \(CS_q\) value, as denoted by Eq. 3.
$$\begin{aligned} Stdscore_q= \frac{ER_q + CS_q}{2} \end{aligned}$$
(3)
Therefore, the final question score (\(Score_q\)) is calculated for each question by weighing the opinions of both students and lecturers, as presented in Eq. 4, for each question q.
$$\begin{aligned} Score_q = \frac{\alpha \times Pscore_q + \beta \times Stdscore_q}{\alpha + \beta } \end{aligned}$$
(4)
Note that \(\alpha\) and \(\beta\) represent weights for each score corresponding to the opinions of students and lecturers for each question.
When there are few answers, less than a fixed value \(NA_{min}\), the lecturer’s score prevails (\(\beta =0\)), i.e., the value \(Score_q\) is obtained considering only the lecturer’s opinion.

4.2 Datasets

Currently, the MathE platform contains 1824 questions, categorized into 15 topics and 22 subtopics. Among the subtopics, “Linear Transformations” and “Vector Spaces” stand out as the two most frequently accessed platform areas, thus they were selected to be considered in this work. Additionally, the subtopic “Fundamental Mathematics” was also selected since it presents very basic Mathematics concepts, making it widely used by students struggling with Mathematics, who are the platform’s focus. The details of the three datasets for each subtopic are described in Table 1. In this table, the “N. Questions” column reveals the number of questions examined in this study, the “N. Answers” column shows the total number of answers gathered, and the “N. Students” column indicates the number of students who answered these questions. It is important to note that a single question may receive multiple answers from the same student (Azevedo et al., 2021). Besides, it is important to mention that although all the questions are multiple-choice, the type of the answer (1-correct or 0-incorrect) is the only information available to be analyzed.
Table 1
Datasets description
Subtopic
N. Questions
N. Answers
N. Students
Linear Transformation
40
2067
96
Fundamental Mathematics
100
349
50
Vector Space
40
2737
31
In this work, a bi-objective programming problem is analyzed and discussed. The bi-objective function is based on intra- and inter-clustering measures defined in Section 5.2.

5 Multi-objective Clustering Method

Multi-objective Clustering Algorithm is an unsupervised learning method that aims to divide the dataset into groups (clusters) based on the similarities and dissimilarities of the elements, focusing on discovering underlying patterns in an unsupervised manner (Rehman & Belhaouari, 2022). The following sections begin by presenting the bi-objective problem and the notation used in the algorithm, followed by the clustering measures and the Multi-objective Clustering Algorithm.

5.1 Multi-objective Problem

In this work, a bi-objective programming problem is analyzed and discussed, whose objective functions are based on intra-clustering and inter-clustering measures.
Thus, the bi-objective programming problem can be defined as the simultaneous minimization of an intra-clustering measure and the maximization of an inter-clustering measure:
$$\begin{aligned} \min \left\{ f_i,-g_j\right\} \end{aligned}$$
(5)
where \(f_i\) (\(i=1,2\)) is a intra-clustering measure and \(g_j\) (\(j=1,2,3\)) is the inter-clustering measure.
To better understand the following sections, some notation needs to be defined, namely:
  • X is the dataset, in which \(X = \{x_1, x_2,...,x_m\}\) where \(x_i\) is an element of the dataset;
  • m is the number of elements x that the set X is composed;
  • c defines the set of centroids of the form \(c=\{c_1, c_2,...,c_k\}\);
  • k is the number of centroids in which X is partitioned;
  • \(c_j\) defines the centroid j;
  • \(x_i^j\) represents an element i that belongs to cluster j;
  • \(C_j\) defines the cluster j, in which \(C_j=\{x_1^j,x_2^j,...,x_i^j\}\);
  • \(\#C_j\) is the number of elements of cluster j;
The MCA flowchart can be observed in Fig. 3.
Fig. 3
Multi-objective clustering algorithm flowchart
Full size image

5.2 Clustering Measures

In order to group the dataset into distinct sets, it is essential to define criteria for calculating the distances between individual elements. The selection of distance measures plays a pivotal role in determining the algorithm’s effectiveness, significantly impacting the outcomes of clustering. Numerous established methods, including single linkage, complete linkage, and average linkage, among others (Institute, 2025; Sokal & Michener, 1958; Sorensen, 1948), have been extensively explored in the literature. This study analyze several traditional measures and explores potential variations in intra- and inter-clustering measures, as outlined below. For this, the Euclidean distance was considered since its popularity in clustering measures, intuitive and easy computation. Additionally, Euclidean distance measures are effective when clusters are roughly spherical and features are on similar scales, which aligns with partitioning clustering strategies (Azevedo et al., 2024). In this work, the Euclidean distance is represented by \(D(\cdot ,\cdot )\).

5.2.1 Intra-clustering measures

Intra-clustering measures refer to the distance among elements of a given cluster. There are many forms to compute the intra-clustering measure. Two of them are explored in this paper, as presented below:
  • SAxc: it is the average distance between the elements to their centroid, in terms of the number of elements belonging to each cluster set \(\#C_j\), where \(Sxc_j\) represents the sum of the distance between the elements and the centroid j as defined in Eq. 6.
    $$\begin{aligned} SAxc = \sum _{j=1}^{k} \frac{Sxc_j}{\#C_j} \end{aligned}$$
    (6)
  • FNc: it is the sum of the furthest neighbor distance of each cluster \(C_j\), where \(x_i^j\) and \(x_l^j\) belong to the same cluster j, as described in Eq. 7.
    $$\begin{aligned} FNc = \sum _{j=1}^{k} \max \{ D(x_i^j, x_l^j) \} \ \ \text {for} \ \ {i=1,...,\#C_j, \ l=1,...,\#C_j, \ i\ne l} \end{aligned}$$
    (7)

5.2.2 Inter-clustering measures

Inter-clustering measures define the distance between elements that belong to different clusters or the distance between different centroids \(c_j\). In this case, three inter-clustering measures were considered:
  • Acc: it is the average distance between all centroids, as presented in Eq. 8. When the clustering process is based on this measure, it is referenced in the literature as centroid method (Sokal & Michener, 1958).
    $$\begin{aligned} Acc = \frac{1}{k} \displaystyle \sum _{t,j=1, t \ne j}^{k} \ D(c_t, c_j) \end{aligned}$$
    (8)
  • AFNcc: it is the average of the distances of the furthest neighbors among the different clusters in terms of the number of clusters, which is described in Eq. 9.
    $$\begin{aligned} AFNcc = \frac{1}{k} \sum _{j=1}^{k} \sum _{t>j}^{k} \max \{ D(x_{i}^j, x_{l}^t) \} \ \text {for} \ \ i=1,...,\#C_j, l=1,..,\#C_t, i \ne j \end{aligned}$$
    (9)
  • ANNcc: it is the average nearest neighbor distance between elements of the different clusters, which is defined in Eq. 10.
    $$\begin{aligned} ANNcc = \frac{1}{k} \sum _{j=1}^{k} \sum _{t>j}^{k} \min \{ D(x_{i}^j, x_{l}^t) \} \ \text {for} \ \ i=1,...,\#C_j, \ l=1,..,\#C_t, i \ne l \end{aligned}$$
    (10)

5.3 Multi-objective Clustering Algorithm

The Multi-objective Clustering Algorithm evaluates some intra- and inter-clustering measures to automatically define the optimal number of centroids and their optimal positions (Azevedo et al., 20242025). This is achieved by simultaneously minimizing intra-clustering distances and maximizing inter-clustering distances by integrating various pairs of intra- and inter-clustering measures throughout the evolutionary process.
Given a dataset \(X=\{x_1, x_2,..., x_m\}\) composed of m elements, where \(x_i \in \mathbb {R}^d\) (d is the number of variables of the dataset), the idea is to partition X into k optimal groups (clusters). The MCA can automatically define the optimal number of cluster partitions, however, the range of possible partitions must be set initially, that is, the minimum and the maximum number of centroids. For this purpose, it was defined \(k_{min}\) as the minimum number of centroids, and \(k_{max}\) as the maximum number of clusters into which the dataset can be partitioned.
The algorithm starts with the input of the dataset. After that, a pair of measures is automatically selected, one intra-measure and another inter-measure, among the ones presented in Section 5.2. Considering this information, the MCA uses a Multi-objective Particle Swarm Optimization (Coello-Coello & Lechuga, 2002) to identify the Pareto front associated with that bi- objective function of the problem, where \(f_i\) represents the intra-clustering measure, and \(g_j\) represents the inter-clustering measure. The procedure is repeated until all combinations of pairs of measures have been evaluated.
Note that, a Pareto front is obtained for each pair of measures. As the measures considered are based on sums and averages, with different magnitudes, it is necessary to normalize them in order to compare the solutions fairly.
After that, all Pareto front solutions generated were evaluated regarding dominance, and the non-dominated solutions were selected to compose a Hybrid Pareto front (HPF). The Hybrid Pareto front is the output optimal set of the MCA algorithm.
For a better understanding of some procedures involved in the MCA, the Centroids Calculation (CC) is presented in Algorithm 1.
Algorithm 1
Centroids calculation.
Full size image
The Centroids Calculation procedure randomly selects k candidates to be evaluated as possible centroids. The range of this k is an integer between \(k_{min}\) and \(k_{max}\). The value \(k_{min}\) and \(k_{max}\) can be given by the decision-maker or considered by default \(k_{min}=2\) and \(k_{max}=[\sqrt{m}]\) (Pal & Bezdek, 1995). Also, it is important to clarify that the \(k_{min}\) and \(k_{max}\) definition is required to reduce the time-consuming nature of the algorithm. Also, the decision-maker can attribute the range according to their knowledge and preferences. Further discussion about these parametrizations can be seen at Azevedo et al. (20242025).
Following, the Euclidean distance between all the elements of X up to each centroid j is evaluated. The closest elements of each centroid j define a cluster set C. To avoid small cluster sets, the centroids j that have less than \(\zeta\) associated elements are automatically removed from the set of centroids, and the elements become part of other remaining centroid, which is the closest one in terms of Euclidean distance of the elements considered (Rocha et al., 2021). As default, it is considered \(\zeta = [\sqrt{m}]\), with \(\zeta \in \mathbb {N}\). After that, the remaining centroids are definitively denoted as the centroid of each subset \(c_j\), in which X is partitioned.
After all, the elements are associated with their centroid \(c_j\), so adjusting a position in each j coordinate is necessary to improve the algorithm’s performance. Thus, the coordinates of each centroid j assume the coordinates of their barycenter cluster \(c_j\), composed of its \(x^{j}\) elements.
The pseudocode of the MCA algorithm is presented in Algorithm 2. The MCA starts by calculating the centroids for each pair of intra-measure (\(i=1,2\)) and inter-measures (\(j=1,2,3\)), using Algorithm 1, followed by the MOPSO to calculate a Pareto Front \(P^{*,i,j}\).
Algorithm 2
Multi-objective clustering algorithm.
Full size image
Then, the Pareto front is normalized. That is, each point x of the Pareto front is individually normalized between [0, 1], using Eq. 11, where \(x_{min}\) and \(x_{max}\) are respectively the smallest and the largest solution value belonging to the Pareto front considered.
$$\begin{aligned} x'=\frac{x - x_{min}}{x_{max} - x_{min}} \end{aligned}$$
(11)
After that, all normalized solutions of the Pareto fronts were evaluated regarding dominance, and the non-dominated solutions were selected to compose a hybrid Pareto front (HPF), which is the output of the MCA algorithm. The hybrid Pareto front is the set of non-dominated solutions, taking into account all the (normalized) solutions of the Pareto fronts obtained for each pair of measures.
The MCA (Azevedo et al., 2025) was compared with four other clustering algorithms: the k-means (Arthur & Vassilvitskii, 2007), the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (Ester et al., 1996), the Clustering based on Differential Evolution Algorithm (CDE) (Heris, 2015; Storn & Price, 1997), and the Game-based k-means (GBK-means) algorithm (Jahangoshai Rezaee et al., 2021). The work also presents the advantage of combining measures through hybrid methods, especially with the HPF strategy, which is able to provide greater flexibility and a variety of optimal solutions, thereby adapting to the needs of the decision-maker when confronted with the problem. For all the comparisons, presented at Azevedo et al. (2025), the MCA demonstrated that it is competitive, mainly for providing a set of optimum solutions for the decision-maker. Furthermore, the work (Azevedo et al., 2025a) provides a performance comparison between the MCA and the k-means and DBSCAN using benchmark datasets, and at Azevedo et al. (2024) a comparison using a real case study involving the same algorithms is presented. Both works highlight the novelty and efficiency of the HPF concept and the circumstances of its best usability. Finally, a recent approach to a more robust version of the MCA empowered with a split and merge strategy is available at Azevedo et al. (2025b). This outperforms classical algorithms such as k-means and DBSCAN in terms of clustering performance.

6 Results and Discussion

The MathE platform aims to offer a dynamic and compelling way of teaching and learning Mathematics, relying on interactive digital technologies that enable autonomous study (Azevedo et al., 2021). This work focuses on developing a strategy to group the questions considering students’ and lecturers’ opinions. For this, a score for each question is assigned, and the MCA algorithm is used to group the questions according to their difficulty level. Besides, the k-means is used to compare the clustering results.
The parameters used in the question score evaluation are: \(NA_{min} = 3\), and \(NA_{max} = 30\), \(\alpha = 1\) and \(\beta = 1\). Regarding the number of samples to be used, \(NA_{min}\) and \(NA_{max}\) were respectively defined equal to 3 and 30, since these values are the most appropriate values to obtain CS values and represent the students’ opinions, according to the statistical test and evaluation presented in Azevedo et al. (2024).
Regarding the choice of \(\alpha\) and \(\beta\) equal to 1, as previously mentioned, these parameters represent the lecturers’ and students’ weight, respectively, in the question categorization. In this work, both parameters were considered equal to 1 due to the dataset restrictions in terms of the number of answers computed per question.
The MCA main parameters are: population size and maximum of iterations equal to 100, the MOPSO repository size equal to 30, minimum number of centroids \(k_{min} = 3\), and maximum number of centroids \(k_{max} = \left[ \sqrt{m} \right]\). The \(k_{min} = 3\) is defined based on some previous research that reports two levels are not enough for the platform organization Azevedo et al. (2021, 2024).
The following sections present the results for each subtopic considered: Linear Transformation (Section 6.1), Fundamental Mathematics (Section 6.2), and Vector Space (Section 6.3).
It is essential to explicitly state that each cluster generated by the algorithms signifies a distinct difficulty level for the questions within the dataset. Cluster 1 denotes the lowest difficulty level, with subsequent clusters progressively increasing in complexity. The question score represents the complexity, thus, the higher the question score, the greater its complexity, as defined in Section 4. Thus, the hierarchical arrangement ensures a clear understanding of the difficulty progression within the clusters.

6.1 Results of the Linear Transformation Subtopic

The Linear Transformation subtopic comprises a set of 40 questions; thus, considering the previously mentioned MCA parameters, the number of difficult levels could vary between three and six; this is the minimum and maximum number of clusters that the dataset can be partitioned into. Besides, with the MCA it is possible to define the minimum number of questions per cluster, which by default is represented by the integer square root of the dataset size; so, six questions is the minimum number of questions that a cluster set can have in this case.
Figure 4 presents the Hybrid Pareto front obtained for Linear Transformation subtopic, where the \(x-\) axis represents the intra-clustering measure and the \(y-\)axis the inter-clustering measure. This Hybrid Pareto front has 17 solutions that are provided by six different pairs of measures, as detailed in Table 2.
Fig. 4
Hybrid Pareto front of linear transformation subtopic
Full size image
Table 2
Details of linear transformation’ hybrid Pareto front
Pareto front Measures
Num. Solutions
Num. Clusters
Pareto Front Measures
Num. Solutions
Num. Clusters
\(SAxc - Acc\)
3
4 or 5
\(FNc -Acc\)
4
5
\(SAxc - AFNcc\)
3
4 or 5
\(FNc - AFNcc\)
3
5
\(SAxc - ANNcc\)
1
4
\(FNc - ANNcc\)
3
5
As can be seen in Table 2, there are two division suggestions for difficulty level; it is into four or five levels, represented by the clusters division. However, there are 17 different optimal ways to perform this division, as suggested by the MCA algorithm and represented in the Hybrid Pareto front (Fig. 4). Thus, even if two solutions indicate an optimal cluster number of four (four difficulty levels), the distribution of questions among the clusters differs. Considering this, Fig. 5 presents three examples of optimal solutions chosen to be better analyzed. The choice of these solutions is strongly influenced by the decision-maker’s previous knowledge regarding the dataset. Figure 5a presents a solution that suggests four clusters provided by the combination of the measures \(SAxc - AFNcc\). Figure 5b and c show two MCA solutions with 5 clusters each, provided by the measures \(SAxc - AFNcc\) and \(FNc - Acc\), respectively.
Fig. 5
MCA results for linear transformation subtopic
Full size image
By Fig. 5, it is possible to note the main contribution of the MCA, that is, providing for the decision-maker diversity into the optimum solution. In Fig. 5a, there are more questions in the first level (cluster 1), and the remaining questions were equally distributed into the other levels. In turn, Fig. 5b, follows the same rule, but considering five difficult levels. Thereby, the first cluster has the highest number of questions, and the remaining clusters have a similar number of questions. On the other hand, in Fig. 5c, the MCA tries to equally distribute the number of questions into five difficult levels (clusters).
For comparison, the k-means algorithm was also applied to the Linear Transformation dataset. In this case, a value of \(k = 4\) and \(k = 5\) was considered, and the results are presented in Fig. 6a and b, respectively. In both cases, the k-means results allocated the major questions in the intermediate clusters, which means that the major questions are in the intermediate difficult levels.
Fig. 6
k-means algorithm results for Linear Transformation subtopic
Full size image
Considering previous research, it is known that in the Linear Transformation subtopic, the students spend more time practicing the more basic questions. So, it is more interesting for this subtopic to have more questions in the initial levels since a greater diversity of questions is guaranteed for students at the initial levels, who require a greater demand for questions. Considering this, both solutions generated by the k-means algorithm are not satisfactory for the problem since cluster 1, in both solutions, has the smallest quantity of questions.
As a decision-maker, the solution for the Linear Transformation could be the ones presented in Fig. 5a or b. However, this subtopic has 40 questions, and it is more interesting considering four difficult levels (Fig. 5a) until the dataset achieves a higher quantity of questions to be redefined in more difficult levels, if necessary.

6.2 Results of Fundamental Mathematics Subtopic

The Fundamental Mathematics subtopic comprises 100 questions, and covers elementary contents of Mathematics. According to the MCA parameters, the number of difficult levels could vary between three and ten for this subtopic. Moreover, the minimum number of questions per cluster equals ten, which is the square root of the dataset size.
Figure 7 presents the Hybrid Pareto front obtained for this subtopic. This Hybrid Pareto front is provided by six different pairs of measures, as detailed in Table 2. According to the MCA results, it is possible to divide this dataset into three, four, or five clusters, considering 23 ways of distributing the questions (Table 3).
Fig. 7
Hybrid Pareto front of fundamental mathematics subtopic
Full size image
Table 3
Details of fundamental mathematics’ hybrid Pareto front
Pareto front Measures
Num. Solutions
Num. Clusters
Pareto Front Measures
Num. Solutions
Num. Clusters
\(SAxc - Acc\)
3
3, 4 or 5
\(FNc - Acc\)
3
4 or 5
\(SAxc - AFNcc\)
7
3, 4 or 5
\(FNc - AFNcc\)
2
5
\(SAxc - ANNcc\)
5
3, 4 or 5
\(FNc - ANNcc\)
3
4 or 5
Figure 8 presents three cluster results proposed by the MCA, considering three, four, and five clusters in Fig. 8a (using \(SAxc - Acc\) measures), Fig. 8b (using \(SAxc - Acc\) measures), and Fig. 8c (using \(SAxc - AFNcc\) measures), respectively.
Fig. 8
MCA results for Fundamental Mathematics subtopic
Full size image
In turn, Fig. 9 presents the k-means results for the Fundamental Mathematics dataset, also considering three clusters (Fig. 9a), four clusters (Fig. 9b), and five clusters (Fig. 9c).
Fig. 9
k-means results for Fundamental Mathematics subtopic
Full size image
As can be seen by comparing Figs. 8 and 9, the MCA solutions tend to be more balanced concerning the number of questions per cluster than the solutions presented by k-means. This happens due to the MCA can guarantee a minimum number of questions per cluster, which contributes to balancing the questions distribution.
As a decision maker, we know that Fundamental Mathematics is a relatively less complex subtopic than the other subtopics on the platform, thus, the recommendation that the majority of questions should be at the initial level does not need to be strictly followed in this subtopic. However, if the decision-maker wants to maintain the same pattern followed in the Linear Transformation subtopic, the most appropriate solution for this case is the solution presented by the k-means with four or five clusters (Fig. 9b and c). However, both solutions leave the last cluster with only a few questions, which may not be suitable for the problem and the platform logic systems requirements.
If it were to choose a solution among those presented by the MCA, as a decision-maker, the chosen solution would be the one presented in Fig. 8c, that considers five clusters. Although this solution has only 13 questions in cluster 1, the following clusters of medium difficulty present a great diversity of questions, around 30. Due to previous knowledge, students tend to spend more time at these medium-difficult levels clusters before acquiring sufficient knowledge to answer questions at a more complex level.

6.3 Results of the Vector Space Subtopic

Finally, this section presents the result of the Vector Space subtopic, which is composed of 40 questions. Thus, the number of difficult levels could vary between three and six, and six is also the minimum number of questions per cluster in the MCA.
Figure 10 presents the Hybrid Pareto front obtained for this subtopic. This Hybrid Pareto front comprises 18 solutions that are provided by five different pairs of measures, as detailed in Table 4.
Fig. 10
Hybrid Pareto front of vector space subtopic
Full size image
Table 4
Details of vector space’ hybrid Pareto front
Pareto front Measures
Num. Solutions
Num. Clusters
Pareto Front Measures
Num. Solutions
Num. Clusters
\(SAxc - Acc\)
5
3 or 5
\(FNc - Acc\)
3
3 or 5
\(SAxc - AFNcc\)
3
3 or 5
\(FNc - AFNcc\)
2
5
\(SAxc - ANNcc\)
5
3 or 5
\(FNc - ANNcc\)
0
For the Vector Space dataset, there are two possibilities to divide the dataset, three clusters or five clusters. So, two MCA solutions were chosen, and they are presented in Fig. 11, considering three clusters (Fig. 11a) and five clusters (Fig. 11b), both provided by the \(SAxc - ANNcc\) measures. In both cases, the MCA balances the number of questions per cluster. So, the main decision is only to define the number of difficult levels wished, it is three or five.
Fig. 11
MCA results for Vector Space subtopic
Full size image
Figure 12 presents the results achieved with the k-means algorithm, considering three and five clusters, as suggested by the MCA.The solution presented in Fig. 12a by the k-means is very similar to that presented in Fig. 11a, generated by the MCA. In both of them, the quantity of questions per clusters are similar. Concerning the k-means solution with five clusters, in Fig. 12b, the solution can be considered less satisfactory than the MCA solution with five clusters (Fig. 11b), since the latter balances the number of questions better than the k-means solution.
Fig. 12
k-means results for Vector Space subtopic
Full size image
Due to prior knowledge of the questions that make up the dataset, it is known that this subtopic presents a high degree of complexity. Therefore, a greater number of questions are needed at the initial levels. Considering this, the most appropriate solution can be one that considers three clusters independently of the algorithm (Figs. 11a or 12a), or the MCA solution with five clusters (Fig. 11b), which presents a better distribution between the number of questions, considering four difficult levels.

7 Conclusion

The use of e-learning platforms has become more prominent in recent years. Technological tools play a key role in facilitating teaching and learning, particularly in the realm of Mathematics, a subject often perceived as challenging. Thus, the students can develop a deeper understanding with the appropriate use of technology (NCTM, 2000) and the support of their lecturer. This work was addressed to solve the MathE questions categorization problem. Initially, all the questions available on the platform are divided into two difficult levels (basic and advanced), but the student reports several complaints (Azevedo et al., 2021, 2024) about using the platform, considering only two difficult levels that were defined only by the lecturer knowledge.
Thus, this work proposes a way to consider the student’s and lecturers’ opinions, to group the questions into different difficult levels. So, a question score was established, considering both viewpoints: students and lecturers. Moreover, the question score was evaluated by the Multi-objective Clustering Algorithm, which is a novelty multi-objective approach proposed in this work, and posterior, the MCA results were compared with the k-means clustering algorithm, which is one of the most well-known partitioning clustering algorithms.
The MCA offers a notable advantage through its application of multi-objective optimization, presenting a diverse set of optimal solutions. This versatility empowers decision-makers to select the most fitting solution for a given problem. When confronted with challenges related to the MathE platform, certain information proves challenging to encapsulate within a mathematical model. Consequently, the incorporation of multi-objective solutions becomes particularly compelling. This approach grants decision-makers the flexibility of choice and facilitates the integration of crucial insights through a human-in-the-loop collaborative system.
From the range and variability of the Hybrid Pareto front generated, it is possible to perceive the impact of combining different measures to solve a problem. In this way, if only one pair of measures was considered in the model, the solution would be restricted to the optimum provided by one combination of measures and could be inappropriate for the decision-maker. So, the Hybrid Pareto fronts strategy enriches the model’s final solution.
Furthermore, the MCA does not require the prior indication of the cluster number, which is a common complaint in the literature regarding k-means and other partitioning clustering algorithms (Azevedo et al., 2024). In general, the results presented by the MCA are very promising. Regarding the solutions proposed by k-means, the MCA achieved a better balance in terms of the number of questions within the clusters, as shown by the results presented in Fig. 5c in relation to Fig. 6b, and also, Fig. 11b in relation to Fig. 12b.
As far as future developments are concerned, the goal is to optimize and determine the weight assigned to students and lecturers when calculating question scores. Furthermore, with regard to MCA, it is intended to investigate alternative multi-objective strategies that go beyond MOPSO, to refine the optimization of the proposed model further. Additionally, it is necessary to implement and assess the question categorization strategies using other MathE datasets.
Finally, future approaches involving hybrid algorithms are also necessary. The existing literature has focused on comparing bio-inspired or hybrid methods with traditional ones through mathematical analysis of runtime, convergence, and parameter configurations. Few of these studies have systematically compared the performance of different bio-inspired algorithms in machine learning tasks or different machine learning techniques in bio-inspired optimization algorithms. This leads to a lack of experimental results in selecting the most suitable method for a particular combination. The unavailability of such surveys may be due to the lack of publicly available source code, variation of encoding techniques, different objective functions, and evolutionary operators. As a result, there is a vast amount of published work since numerous metaheuristic algorithms can be combined with machine learning. Nevertheless, it remains difficult to point out which combinations are most appropriate or even why one is more advantageous than the other. Thus, additional research effort is necessary to facilitate the integration between deep learning, increased explainability, meta-learning, cross-domain applications, real-time adaptation, and human expertise. These trends aim to improve hybrid algorithms’ performance, efficiency, and versatility in tackling complex real-world problems.

Declarations

Ethics approval

‘Not applicable’
‘Not applicable’
‘Not applicable’

Conflict of interest

‘Not applicable’
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Beatriz Flamia Azevedo

is a researcher in Intelligent Systems Engineering at the Polytechnic Institute of Bragança (IPB). She is a member of the Research Center for Digitization and Intelligent Robotics (CeDRI - IPB. She obtained a Ph.D degree in Industrial and Systems Engineering from the University of Minho (Portugal), a master’s degree in Industrial Engineering from IPB (Portugal), and a bachelor’s degree in Control and Automation Engineering from the Federal Technological University – Paraná (Brazil). She is a multidisciplinary researcher and has been participating in several research projects in the areas of Optimization, Mathematics Modeling, Machine Learning, Recommendation Systems, and STEM.

Gabriel A. Leite

is a PhD student in Intelligent Systems Engineering at Polytechnic Institute of Bragança (IPB). He obtained his Bachelor’s degree in Electronic Engineering at the Federal Technological University of Paraná (UTFPR), Brazil. He participated in the double degree program between UTFPR and IPB, Portugal, obtaining his Master’s in Industrial Engineering at IPB in 2023. He currently works as an associate researcher at the Center for Research in Digitization and Intelligent Robotics (CeDRI-IPB). His interests are focused on the biomedical area and projects involving Machine Learning and artificial vision.

Maria de Fatima Pacheco

received an M.Sc. and PhD in Mathematics from the University of Porto and the University of Aveiro, respectively. She is a member of the Center for Research & Development in Mathematics and Applications (CIDMA) of the University of Aveiro and a collaborator of the Research Centre in Digitalization and Intelligent Robotics (CeDRI) of the Polytechnic Institute of Bragança. She is an associate professor at the Technology and Management School of the Polytechnic Institute of Bragança. Her main research interests are Graph Theory and Combinatorial Optimization with a special focus on some classical NP-complete problems, such as the determination of graphs with perfect matchings and Hamiltonian cycles.

Florbela P. Fernandes

Coordinator Professor at the Department of Mathematics at Polytechnic Institute of Bragança, has been teaching several math subjects since 1998. She obtained her Bachelor in Mathematics (Education) at University of Coimbra, her MSc in Applied Mathematics at University of Minho, and her PhD in Sciences --- Mathematics at University of Minho. Her major research area is Optimization, focusing on developing algorithms to solve problems of nonlinear and nonconvex nature with mixed variables. Another major interest is the application of the developed algorithms in real-world problems that arise, for instance, in the industry. Her research results have been published in journals, chapters, and conference proceedings.

Ana Maria A. C. Rocha

Associate Professor at the Department of Production and Systems (DPS), School of Engineering, University of Minho. She has a PhD in Production and Systems Engineering from the University of Minho in 2005, a Master’s degree in Computer Engineering from the University of Minho in 1997, and a degree in Systems and Computer Engineering from the University of Minho in 1993. She has been developing her scientific activity in the areas of Systems Engineering, Optimization, and Operational Research. In particular, her research interests are in the Global optimization, Nonlinear optimization, and mixed-integer programming areas.

Ana Isabel Pereira

Coordinator Professor at the Department of Mathematics at Polytechnic Institute of Bragança, is vice-coordinator of Research Centre in Digitalization and Intelligent Robotics (CeDRI) - Polytechnic Institute of Bragança - and member of Algorithm Research Centre – Minho University - in the Systems Engineering and Operational Research (SEOR) R&D group. She is also Coordinator Member of Bragança “Ciência Viva” Science Centre. Ana I. Pereira got her PhD from Minho University (2006) in the ‘Numerical Optimization’ area. She is the author, or co-author, of more than one hundred and twenty journal papers, book chapters, and conference proceedings. She participated in more than twenty research projects in the areas of robotics, optimization, and innovative tools in teaching.
Download
Title
Multi-objective Clustering Algorithm Applied to the MathE Categorization Problem
Authors
Beatriz Flamia Azevedo
Gabriel A. Leite
M. Fátima Pacheco
Florbela P. Fernandes
Ana Maria A. C. Rocha
Ana I. Pereira
Publication date
27-01-2026
Publisher
Springer US
Published in
Information Systems Frontiers
Print ISSN: 1387-3326
Electronic ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-025-10674-3
go back to reference Alhawiti, N. M. (2023). The influence of active learning on the development of learner capabilities in the college of applied medical sciences: Mixed-methods study. Advances in Medical Education and Practice, 14, 87–99. https://doi.org/10.2147/AMEP.S392875CrossRef
go back to reference Al-Kumaim, N. H. (2021). Exploring the impact of the COVID-19 pandemic on university students’ learning life: An integrated conceptual motivational model for sustainable and healthy online learning. Sustainability, 13, 2546. https://doi.org/10.3390/su13052546CrossRef
go back to reference Arthur, D., & Vassilvitskii, S. (2007). K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. SODA ’07 (pp. 1027–1035). Society for Industrial and Applied Mathematics, USA.
go back to reference Astafieva, M. M., Zhyltsov, O. B., Proshkin, V. V., & Lytvyn, O. S. (2020). E-learning as a mean of forming students’ mathematical competence in a research-oriented educational process. CTE Workshop Proceedings, 7, 674–689. https://doi.org/10.55056/cte.421CrossRef
go back to reference Azevedo, B. F., Rocha, A. M. A. C., & Pereira, A. I. (2024). A collaborative multi-objective approach for clustering task based on distance measures and clustering validity indices. In Lecture notes in computer science - 6th international conference on dynamics of information systems (DIS 2023). https://doi.org/10.1007/978-3-031-50320-7_4.
go back to reference Azevedo, B. F., Rocha, A. M. A. C., & Pereira, A. I. (2025). A multi-objective clustering algorithm integrating intra-clustering and inter-clustering measures. In Dorronsoro, B., Zagar, M., & Talbi, E.-G.E. (Eds.) Optimization and learning. OLA 2024. Communications in Computer and Information Science (vol. 2311). Springer, Croatia. https://doi.org/10.1007/978-3-031-77941-1_8.
go back to reference Azevedo, B. F., Rocha, A. M. A. C., & Pereira, A. I. (2024). Hybrid approaches to optimization and machine learning methods: A systematic literature review. Journal of Machine Learning. https://doi.org/10.1007/s10994-023-06467-x
go back to reference Azevedo, B. F., Rocha, A. M. A. C., Fernandes, F. P., Pacheco, M. F., & Pereira, A. I. (2024). Comparison between single and multi-objective clustering algorithms: Mathe case study. In Pereira, A. I., et al. (Eds.) Optimization, learning algorithms and applications. OL2A 2024. Communications in computer and information science (Vol. 2280, pp. 65–80). Springer, Tenerife - Spain. https://doi.org/10.1007/978-3-031-77426-3_5.
go back to reference Azevedo, B. F., Rocha, A. M. A. C., Pereira, A. I. (2025). A multi-objective clustering approach based on different clustering measures combinations. Computational & Applied Mathematics, 44. https://doi.org/10.1007/s40314-024-03004-x
go back to reference Azevedo, B. F., Pereira, A. I., Fernandes, F. P., & Pacheco, M. F. (2021). Mathematics learning and assessment using MathE platform: A case study. Education and Information Technologies. https://doi.org/10.1007/s10639-021-10669-y9CrossRef
go back to reference Azevedo, B. F., Souza, R. M., Pacheco, M. F., Fernandes, F. P., & Pereira, A. I. (2024). Application of pattern recognition techniques for MathE questions difficulty level definition. In A. I. Pereira, A. Mendes, F. P. Fernandes, M. F. Pacheco, J. P. Coelho, & J. Lima (Eds.), Optimization, learning algorithms and applications (pp. 300–315). Cham: Springer. https://doi.org/10.1007/978-3-031-53025-8_21
go back to reference Azevedo, B. F., Rocha, A. M. A. C., & Pereira, A. I. (2025). A split and merge strategy for multi-objective clustering algorithms. SN Computer Science, 6(6), 711. https://doi.org/10.1007/s42979-025-04248-yCrossRef
go back to reference Coello-Coello, C. A., & Lechuga, M. S. (2002). MOPSO: a proposal for multiple objective particle swarm optimization. In Proceedings of the 2002 congress on evolutionary computation. CEC’02 (Cat. No.02TH8600) (vol. 2, pp. 1051–1056). https://doi.org/10.1109/CEC.2002.1004388
go back to reference Deslauriers, L., McCarty, L.S., Miller, K., Callaghan, K., & Kestin, G. (2019). Measuring actual learning versus feeling of learning in response to being actively engaged in the classroom. Proceedings of the National Academy of Sciences, 116(39). https://doi.org/10.1073/pnas.1821936116.
go back to reference Engelbrecht, J., Borba, M. C., & Kaiser, G. (2023). Will we ever teach mathematics again in the way we used to before the pandemic? ZDM Mathematics Education, 55, 1–16. https://doi.org/10.1007/s11858-022-01460-5CrossRef
go back to reference Ester, M., Kriegel, H. -P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the second international conference on knowledge discovery in databases and data mining (pp. 226–231). Portland, OR, AAAI Press.
go back to reference Farrow, C. B., & Wetzel, E. (2021). An active learning classroom in construction management education: Student perceptions of engagement and learning. International Journal of Construction Education and Research, 17(4), 299–317. https://doi.org/10.1080/15578771.2020.1757536CrossRef
go back to reference Gardner, H. (1999). Intelligence reframed: Multiple intelligences for the 21st century. New York: Basic Books, USA.
go back to reference Gardner, H. (2011). Frames of mind: The theory of multiple intelligences (Vol. 3). New York: Basic Books, USA.
go back to reference Heris, M. K. (2015). Evolutionary Data Clustering in MATLAB. https://yarpiz.com/64/ypml101-evolutionary-clustering.
go back to reference Jahangoshai Rezaee, M., Eshkevari, M., Saberi, M., & Hussain, O. (2021). GBK-means clustering algorithm: An improvement to the k-means algorithm based on the bargaining game. Knowledge-Based Systems, 213, 106672. https://doi.org/10.1016/j.knosys.2020.106672CrossRef
go back to reference Lara-Lara, F., Santos-Villalba, M. J., Berral-Ortiz, B., & Martíínez-Domingo, J. A. (2023). Inclusive active methodologies in spanish higher education during the pandemic. Societies, 13(2). https://doi.org/10.3390/soc13020029
go back to reference Lwande, C., Oboko, R., & Muchemi, L. (2021). Learner behavior prediction in a learning management system. Education and Information Technologies, 26, 2743–2766. https://doi.org/10.1007/s10639-020-10370-6CrossRef
go back to reference Mahyari, A., Pirolli, P., & LeBlanc, J. A. (2022). Real-time learning from an expert in deep recommendation systems with application to mhealth for physical exercises. IEEE Journal of Biomedical and Health Informatics, 26(8), 4281–4290. https://doi.org/10.1109/JBHI.2022.3167314CrossRef
go back to reference Meyers, C., & Jones, T. B. (1993). Promoting active learning: Strategies for the college classroom. USA: Jossey-Bass Inc.
go back to reference NCTM (2000). Principles and Standards for School Mathematics vol. 1. National Council of Teachers of Mathematics (NCTM), 2000, USA.
go back to reference Pal, N. R., & Bezdek, J. C. (1995). On cluster validity for the fuzzy c-means model. IEEE Transactions on Fuzzy Systems, 3(3), 370–379. https://doi.org/10.1109/91.413225CrossRef
go back to reference Portnoff, L., Gustafson, E., Rollinson, J., & Bicknell, K. (2021). Methods for language learning assessment at scale: Duolingo case study. International Conference on Educational Data Mining, (14).
go back to reference Pritalia, G.L., Wibirama, S., Adji, T.B., & Kusrohmaniah, S. (2020). Classification of learning styles in multimedia learning using eye-tracking and machine learning. In Proceedings of the International Conference on Electrical Engineering (FORTEI-ICEE) (pp. 145–150). https://doi.org/10.1109/FORTEI-ICEE50915.2020.9249875.
go back to reference Rangel-de Lázaro, G., & Duart, J. M. (2023). You can handle, you can teach it: Systematic review on the use of extended reality and artificial intelligence technologies for online higher education. Sustainability, 15(4). https://doi.org/10.3390/su15043507
go back to reference Rehman, A. U., & Belhaouari, S. B. (2022). Divide well to merge better: A novel clustering algorithm. Pattern Recognition, 122, 108305. https://doi.org/10.1016/j.patcog.2021.108305CrossRef
go back to reference Rocha, A. M. A. C., Costa, M. F. P., & Fernandes, E. M. G. P. (2021). A simple clustering algorithm based on weighted expected distances. In Pereira, A.I., Fernandes, F.P., Coelho, J.P., Teixeira, J.P., Pacheco, M.F., Alves, P., & Lopes, R.P. (eds.) Optimization, learning algorithms and applications (pp. 86–101). Springer, Cham. https://doi.org/10.1007/978-3-030-91885-9_7
go back to reference Rodríguez, V. (2012). The teaching brain and the end of the empty vessel. Mind, Brain, and Education, 6, 177–185. https://doi.org/10.1111/j.1751-228X.2012.01155.xCrossRef
go back to reference Sokal, R. R., & Michener, C. D. (1958). A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin, 38, 1409–1438.
go back to reference Sorensen, T. A. (1948). A method of establishing groups of equal amplitude in plant sociology based on similarity of species content and its application to analyses of the vegetation on Danish commons. Biologiske Skar, 5, 1–34.
go back to reference Storn, R., & Price, K. (1997). Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341–359. https://doi.org/10.1023/A:1008202821328CrossRef
go back to reference Treve, M. (2021). What COVID-19 has introduced into education: Challenges facing higher education institutions (HEIs). Higher Education Pedagogies, 6, 212–227. https://doi.org/10.1080/23752696.2021.1951616CrossRef
go back to reference Villegas-Ch, W., García-Ortiz, J., & Sánchez-Viteri, S. (2024). Personalization of learning: Machine learning models for adapting educational content to individual learning styles. IEEE Access, 12, 121114–121130. https://doi.org/10.1109/ACCESS.2024.3452592CrossRef

Premium Partner

    Image Credits
    Neuer Inhalt/© ITandMEDIA, Nagarro GmbH/© Nagarro GmbH, AvePoint Deutschland GmbH/© AvePoint Deutschland GmbH, AFB Gemeinnützige GmbH/© AFB Gemeinnützige GmbH, USU GmbH/© USU GmbH, Ferrari electronic AG/© Ferrari electronic AG