Skip to main content

2025 | Buch

Software and Data Engineering

33rd International Conference, SEDE 2024, San Diego, CA, USA, October 21-22, 2024, Proceedings

insite
SUCHEN

Über dieses Buch

This book constitutes the proceedings of the 33rd International Conference on Software and Data Engineering, SEDE 2024, held in San Diego, California, USA, during October 21-22, 2024.

The 14 full papers presented in these proceedings were carefully reviewed and selected from 25 submissions. These papers focus on a wide range of topics within Software and Data engineering and have been categorized into the following topical sections: Software Engineering and Data Science & Artificial Intelligence.

Inhaltsverzeichnis

Frontmatter

Software Engineering and Data Science

Frontmatter
Adversarial Attack Optimization and Evaluation for Machine Learning-Based Dark Web Traffic Analysis
Abstract
Machine learning (ML) is quickly becoming one of the most transformative technologies in the field of computing. Applications of ML are wide-spread and growing exponentially, revolutionizing the future of major industries such as finance, healthcare, automotives, and more. This has made it more necessary than ever to recognize the instability created by adversarial attacks—the deliberate manipulation of data to mislead ML models. This instability must be addressed through researching the effects of adversarial attacks and how they can be better recognized. Our research explored the use of adversarial attacks in dark web network traffic analysis by first improving our understanding of how adversarial attacks could be optimized. We manipulated a dataset of dark web traffic data through the analysis of confusion matrices and Euclidean distances, aiming to cause maximum confusion for each of our models. We then trained and tested each model in a variety of scenarios to further our understanding of weaknesses in both the traffic data and the machine learning techniques employed.
Nyzaireyus Harrison, Heather Broome, Yaju Shrestha, Alexander Robles, Aayush Gautam, Nick Rahimi
Enhancing Software Requirements Classification with Machine Learning and Feature Selection Techniques
Abstract
Requirements engineers have the responsibility for classifying software requirements into functional and nonfunctional variants. As software architects need quality requirements to be known to get their job done, machine learning is employed to speed up and add consistency to the process of identifying and categorizing requirements so that effort may be spent more effectively. We experimented with the effects of different machine learning algorithms, as well as different pre-processing and feature selection techniques. It was determined that, for this application, stop words should not be removed and that performing lemmatization on words provides the most effective features for classification. Furthermore, after finalizing our choices of pre-processing techniques and algorithm to use, we proposed a modification to the Extensive Feature Selector by gathering the most distinctive words in each category and using a list of those as our main features. By using a threshold of 0.013, we obtained an F1 score of 0.787, which is an improvement on the base Enhanced Feature Selector’s F1 score of 0.761 with the same number of word features.
Daniel Lanfear, Mina Maleki, Shadi Banitaan
Embracing Residuality Theory in Software Architecture to Address Uncertainty: Key Challenges and Strategies
Abstract
The source of uncertainty in software architecture isn’t impossible to predict, but it is certainly challenging given its inherent complexity and the dynamic environments of technology, external factors and events that can potentially impact the system’s operation and stability. Residuality theory, in particular, provides a new ideology that challenges conventional approaches to software design. In this paper, we propose a Residual Dynamic Management (RDM) framework for software architecture to manage residual components and stressors that constitute a residual system. RDM ensures that the system not only remains robust and capable of thriving but also flourishes in the face of uncertainty, dynamic changes, and unpredictable conditions. Furthermore, we propose a model called Residual Finite State Machines (R-FSM) to incorporate residuality complexity into software architecture, enhancing the overall system’s ability to manage unforeseen changes and effectively benefit from them through the concept of antifragility.
Aziz Fellah
Zoned Role-Based Approach to System Design, Implementation, and Access Control of Integrated Web Applications
Abstract
In today’s world almost all organizations heavily depend on Web-based systems or web applications for their day-to-day operations. In this paper, we will present zoned role-based (ZRB) approach to the design and implementation of integrated web-based systems for organizations and enterprises. In contrast to Role-Based Access Control (RBAC), well-known in computer security, this approach can be used throughout the entire life cycle of a web-based system, and can make the design, implementation, deployment and maintenance of integrated web system more efficient and effective for all organizations and enterprises. In this approach, areas of business, or divisions, departments or designated groups of employees for specific missions are called zones, and for each zone a set of roles are defined; for each role, some web apps, each of which consists of a set of operations, are designed and implemented for users in their respective roles to conduct their business in each associated zone; and control of user access to each operation can then be done explicitly by associating each operation with roles by inference based on the relationships between roles. Within such a zoned role-based integrated system, once a user has roles assigned in each zone he or she is affiliated, he will be able to access, precisely, all the apps and operations needed to fulfill his or her role or roles in respective zone, with only one authentication. Such integration is rather important and convenient especially when users may be affiliated with multiple zones or play multiple roles.
Harris Wang
Enhancing IoT Network Defense: A Comparative Study of Machine Learning Algorithms for Attack Classification
Abstract
As the Internet of Things (IoT) continues to expand rapidly, securing these interconnected devices and networks from cyber threats has become a critical challenge. This research investigates the application of machine learning techniques for accurately classifying IoT network traffic data to discriminate between benign activities and various types of cyber-attacks targeting IoT systems. We propose a program that employs multiple machine learning algorithms, including Decision Tree, Logistic Regression, Naive Bayes, and Random Forest, trained on a comprehensive IoT network traffic dataset the CICIoTDataset2023. Through extensive experiments, we evaluate the performance of these classification models in detecting different IoT attack categories such as web-based attacks, spoofing, denial-of-service, Mirai, reconnaissance, distributed denial-of-service, and brute force attacks. Our results demonstrate the efficacy of machine learning approaches, with the Random Forest algorithm emerging as the top performer, achieving an overall accuracy of 98.41%. We also address challenges like class imbalance through hybrid sampling techniques and implement strategies like regularization and hyperparameter tuning to mitigate overfitting and enhance model generalization. Additionally, we conduct a performance analysis of the classification models on different IoT attack categories to gain insights into their specific strengths and weaknesses. By leveraging machine learning for accurate IoT attack classification, this research contributes to developing robust security solutions that can proactively identify and mitigate cyber threats, enabling a more secure IoT ecosystem. The findings pave the way for safeguarding interconnected devices, protecting user privacy, and fostering confidence in the widespread adoption of IoT technologies.
Alkendria McNair, Divine Precious-Esue, Soundra Newson, Nick Rahimi
A Survey and Insights on Modern Game Development Processes for Software Engineering Education
Abstract
The video game industry has a fast-growing multi-billiondollar market. Due to the fast evolution of game technologies and industry, there is a pressing need to survey and analyze the current game development processes so that students who have an interest in game development can have better knowledge and skills for their projects in game software engineering education. In this paper, we present our survey and analysis of multiple aspects of modern game development and provide useful insights for students who want to work on game development. We also present a model of the common components of the game development process as well as the amount of the workload involved. This can help students, who are interested in developing their own games, craft a realistic plan for such projects.
Aakanksha Shrestha, Fei Zuo, Gang Qian, Junghwan Rhee
Evaluating the Impact of Combinatorial Interaction Testing on Test Automation: A Case Study from Industry
Abstract
Software testing regularly involves numerous setups and user inputs, leading to a combinatorial explosion of test cases. While Combinatorial Interaction Testing (CIT) has been theoretically investigated, its effectiveness in real-world scenarios remains unclear [1]. This research fills that gap by utilizing CIT in some live software projects. We led two studies: the first focuses on optimizing user input testing in jTrac, and the second focused on managing system configurations in Redmine, a comparative web application. We looked at CIT to customary testing strategies, breaking down components like test design time, test automation, test execution, suite size, and defect detection. The investigation gave valuable insights into enhancing CIT execution and reception. The results are promising. With CIT, the number of required test cases is significantly reduced, but at the same time, defect detection is improved. In the first study, the average time to detect a defect was 1.40 h (design, automation, execution, and evaluation) compared to 0.35 h with CIT. Similar patterns emerged in the second study. These findings have important implications for both researchers and organizations. They highlight CIT’s promise for software testing, including decreasing test case burden and perhaps improving defect detection rates. This study provides practical evidence for organizations and testers looking to improve their testing procedures.
Feras Daoud, Miroslav Bures, Zdenek David, Petr Syrovatka
JSMBox—A Runtime Monitoring Framework for Analyzing and Classifying Malicious JavaScript
Abstract
In recent years, there has been a notable increase in the prevalence of malicious websites, leading to a majority of cyber-attacks and data breaches. Malicious websites often incorporate JavaScript code to execute attacks on web browsers. Despite existing methodologies documented in the literature, the analysis and detection of malicious JavaScript pose significant challenges due to the dynamic nature of JavaScript and the use of advanced evasion techniques. These challenges motivate the need for an innovative and efficient approach to comprehensively analyze the code to identify its malicious intent. In this paper, we introduce a monitoring approach for analyzing JavaScript code, which can capture all of the code’s features at runtime. Our method leverages the security reference monitor technique to mediate JavaScript security-sensitive executions, including function calls and property accesses. Therefore, the proposed method can capture behaviors at runtime regardless of how the code is written, even with recent advanced evasion techniques like WebAssembly diversification. We have implemented our approach as a JavaScript dynamic analysis framework called JSMBox in a Chromium-based browser extension. Our experiments demonstrated that JSMBox is capable of effectively countering sophisticated evasion techniques found in modern malicious JavaScript code, including WebAssembly diversification. We have also evaluated the framework’s ability to classify malicious behaviors based on a large-scale raw dataset comprising about 20,000 malicious and benign webpages. Our developed tool automatically launches the browser to execute these webpages, records JavaScript code execution events, and captures their execution frequency as extracted features. We have tested the extracted dataset with various machine-learning models, yielding promising experimental results that confirm the effectiveness of our approach and achieve a high accuracy rate.
Phu H. Phung, Allen Varghese, Bojue Wang, Yu Zhao, Chong Yu
Securing Wireless Sensor Network from Rank Attack Using Fast Sensor Data Encryption and Decryption Protocol
Abstract
Wireless sensor and actuator networks (WSANs) are of great significance in the realm of industrial automation systems. However, the aspect of security in WSANs has been somewhat overlooked. One particular security concern is the rank attack, where malicious actors actively manipulate the transmission of messages from neighboring nodes. This undermines the entire network's data collection and routing operations, resulting in a significant degradation of network performance. This attack adversely affects crucial metrics such as packet delivery ratio (PDR), latency, and power consumption, ultimately reducing the network's overall lifespan. In order to foster trust among nodes, ensure accurate delivery of data to end users, safeguard shared data in the cloud from security breaches, and prevent rank attacks within the network, it is crucial to protect the network against such malicious activities. This research paper aims to introduce an enhanced version of the Routing Protocol for Low-Power and Lossy Networks (RPL) protocol, specifically tailored to identify and eliminate rank attacks within existing WSANs. The effectiveness of the new protocol will be assessed through experimentation using Zolertia (Z1) sensors in the Cooja network simulator. To minimize network overhead on the sensors’ side, the proposed scheme limits cryptographic operations to symmetric key-based mechanisms such as XORing, hash functions, and encryption. These operations will be implemented using a C-compiler and verified through the ModelSIM Altera SE edition 11.0 simulator.
Eden Teshome Hunde

Artificial Intelligence

Frontmatter
Enhancing Transparency and Privacy in Financial Fraud Detection: The Integration of Explainable AI and Federated Learning
Abstract
The pervasive issue of fraudulent transactions presents a considerable challenge for financial institutions globally. Developing innovative fraud detection systems is critical to maintaining customer confidence. However, several factors complicate the creating of effective and efficient fraud detection systems. Notably, fraudulent transactions are infrequent, resulting in imbalanced transaction datasets where legitimate transactions vastly outnumber instances of fraud. This data imbalance can concede the performance of fraud detection. Additionally, stringent data privacy regulations prevent the sharing of customer data, hindering the development of high-performing centralized models. Furthermore, fraud detection mechanisms must remain transparent to avoid impairing the user experience. This research proposes an approach utilizing Federated Learning (FL) with Explainable Artificial Intelligence (XAI) to overcome these obstacles. FL allows financial organizations to train fraud detection models collaboratively without requiring direct data sharing. So, customer confidentiality and data privacy are never compromised. Simultaneously, the incorporation of XAI guarantees that the model’s predictions are interpretable by human experts. Experimental evaluations using real-time transaction datasets consistently demonstrate that the FL-based fraud detection system performs well. This study establishes the potential of FL as a reliable, privacy-preserving tool in combating fraud.
Waquar Ahmad, Aditya Vashist, Neel Sinha, Manisha Prasad, Vishesh Shrivastava, Junaid Hussain Muzamal
Enhancing Generative AI Chatbot Accuracy Using Knowledge Graph
Abstract
In recent years, generative AI chatbots have significantly improved in their ability to simulate human-like conversations. However, ensuring the accuracy and contextual relevance of their responses remains a challenge. This paper presents an innovative approach to enhancing the accuracy of generative AI chatbots by integrating knowledge graphs using Neo4j. We demonstrate how combining structured data from Knowledge Graphs with advanced large language models can result in more accurate and context-aware chatbot interactions. By implementing this approach, we aim to provide a robust framework for developing intelligent chatbots that can deliver precise and contextually appropriate responses. We created three categories of test cases: Data-Relevant Inquiries, Non-Contextual Queries, and Contextually Relevant but Data-Irrelevant Questions. The accuracy obtained for the data-relevant test cases was 91.44%.
Ajay Bandi, Jameer Babu, Ruida Zeng, Sai Ram Muthyala
ReVisE: Emulated Visual Outfit Generation from User Reviews Using Generative-AI
Abstract
The fashion industry faces significant challenges due to overproduction and waste, often driven by uncertainty about consumer preferences. This paper presents ReVisE, a novel framework leveraging generative AI to address this issue by emulating outfit generation from user reviews. ReVisE combines a text-to-text Large Language Model (LLM) and a text-to-image Stable Diffusion (SD) model to create virtual outfits based on customer feedback. The LLM consolidates user reviews to extract desired improvements and feedback, and the SD model utilizes these insights to produce realistic visual representations of the improved product. Our framework allows designers to evaluate potential designs and identify areas for improvement without physically producing multiple prototypes, thereby reducing waste and accelerating the design process. Experimental results conducted on the Amazon fashion item reviews demonstrate the effectiveness of ReVisE, showing promising results with both multimodal and human evaluations.
Samar Rahimi Rosas, Subash Neupane, Shaswata Mitra, Sudip Mittal
A Case Study on AI to Automate Simulation Modelling
Abstract
We explore the use of Large Language Models (LLMs) for Discrete Event Simulation (DES). While DES typically involves both domain and technical expertise, our study demonstrates the potential of LLMs in generating queueing models in Python. The code outputs generated by the LLMs are compared to solutions implemented in GPSS (General Purpose Simulation System), a simulation language for DES. Prompt engineering is also reviewed, showcasing its impact on the quality of code generated by LLMs. Our results show that while LLMs assist in speeding up DES, they are far from replacing human experts. However, considering the steady advancements in Artificial Intelligence (AI), there is a promising future for more sophisticated and capable models.
Uchechukwu Obinwanne, Wenying Feng
Racial Disparity in Breast Cancer Prognosis
Abstract
In this work, we looked at the significance of the race factor in breast cancer prognosis, using Association rules data mining technique. We utilized XLMiner data mining tool for our experiments. The data used is the National Cancer Institute’s SEER Public-Use Data. Several experiments were conducted based on the prognostic factors including those of Age, Behavior code, Stage of cancer, Grade, and Marital status with respect to Race. Our discovered association rules indicate that Japanese patients have better survival rate than White patients and White patients have better survival rate than Black patients. The racial disparity in breast cancer prognosis is shown to be statistically significant.
M. Mehdi Owrang O, Fariba Jafari Horestani
Backmatter
Metadaten
Titel
Software and Data Engineering
herausgegeben von
Wenying Feng
Nick Rahimi
Venkatasivakumar Margapuri
Copyright-Jahr
2025
Electronic ISBN
978-3-031-75201-8
Print ISBN
978-3-031-75200-1
DOI
https://doi.org/10.1007/978-3-031-75201-8