Zum Inhalt
Erschienen in:

Open Access 2025 | OriginalPaper | Buchkapitel

VOCE: A Virtual On-Call Engineer for Automated Alert Incident Analysis Using a Large Language Model

verfasst von : Jia Chen, Xiaolei Chen, Jie Shi, Peng Wang, Wei Wang

Erschienen in: Fundamental Approaches to Software Engineering

Verlag: Springer Nature Switzerland

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Im Bereich der Online-Servicesysteme ist die Erkennung und Analyse von Systemfehlern für die Aufrechterhaltung der Verfügbarkeit und Kundenzufriedenheit von entscheidender Bedeutung. Dieses Kapitel vertieft die Komplexität der Analyse von Alarmereignissen und zeigt die Grenzen bestehender Methoden auf, die auf manueller Intervention und statistischen Merkmalen beruhen. Es stellt VOCE vor, einen virtuellen Bereitschaftsingenieur, der mit großen Sprachmodellen ausgestattet ist und die analytischen Prozesse von Betriebsexperten nachahmen soll. Der innovative Ansatz von VOCE umfasst das Erfassen lokaler Anomalien, die durch Alarme aufgezeichnet werden, die Rückschlüsse auf die Ausbreitung dieser Anomalien ziehen und den Ursprungsalarm vorschlagen, der andere Anomalien auslöst. Das Kapitel bietet eine eingehende Untersuchung der wichtigsten Informationsfaktoren, die bei der Analyse von Alarmereignissen berücksichtigt werden, wie Systemschicht, Aufprallumfang und Schwere. Außerdem wird eine hierarchische Strategie zur Kausalitätserfassung vorgestellt, die die Analyse in überschaubare Teilaufgaben unterteilt und die Genauigkeit und Robustheit der Ergebnisse verbessert. Durch umfangreiche Experimente zu realen Alarmvorfällen demonstriert das Kapitel die überlegene Leistung von VOCE in Bezug auf Genauigkeit und Effizienz und macht es zu einem wertvollen Werkzeug für die automatisierte Alarmvorfallanalyse. Die Bewertung umfasst einen Vergleich mit anderen großen Sprachmodellen und eine detaillierte Fallstudie, die VOCE in der Lage zeigt, die Entstehungswarnungen präzise zu identifizieren und Fehlerausbreitungsdiagramme zu erstellen. Das Kapitel schließt mit einer Diskussion über die Grenzen und zukünftigen Richtungen zur Verbesserung der Fähigkeiten von VOCE, die den Weg für fortschrittlichere und autonomere operative Lösungen ebnet.

1 Introduction

In an online service system, faults are usually inevitable due to the large scale and complexity of the system [30]. There are many factors that cause system faults, such as insufficient memory, hardware problem, configuration error, and software bugs [58]. System faults can damage system availability and reduce customer satisfaction, resulting in a huge economic loss for organizations [6]. Therefore, in order to detect or even predict system faults, operations engineers usually monitor various aspects of the service system, such as logs [14, 19, 32, 49], KPIs [8, 21, 37, 57], and traces [31, 33, 58]. When the monitoring data are abnormal, the corresponding alerts, also referred to as alarms, recording local anomalies of the system, will be generated. Therefore, according to alerts, operations engineers can diagnose system faults straightforwardly.
The components of a service system are interconnected through topological relationships; therefore, a fault in one component can impact the functionality of others, resulting in various local anomalies and generating multiple alerts across different components [9]. To reduce the workload of operations engineers, numerous research efforts [5, 9, 10, 52] have been dedicated to link alerts of the same fault into a group, which is also called an alert incident.
However, these methods can only help operations engineers link alerts and obtain alert incidents and cannot replace their role in conducting a thorough incident analysis. Due to the need for strong contextual understanding and logical reasoning abilities, the analysis of alert incidents still relies extensively on manual work. Operations engineers leverage anomaly information recorded in alerts in conjunction with experiential knowledge to analyze alert incidents. The process of analyzing an alert incident typically involves three main steps.
1.
Comprehending anomalies recorded by alerts.
 
2.
Inferring the propagation process of the anomalies.
 
3.
Identifying the originating alert, which records the originating anomaly that triggers other anomalies.
 
Table 1 illustrates an alert incident where the first alert indicates that the call to microservice A on host “21.99.218.233” failed. The second alert shows a drop in the success rate of microservice B accessing microservice A’s interface. The third alert reports the unexpected shutdown of host “21.99.218.233”. In this example, the originating anomaly of the system fault is the unexpected shutdown of host “21.99.218.233”, which triggered the anomaly of microservice A when invoked by “11.99.218.200”. Additionally, microservice B, deployed on “11.99.218.235”, encountered an anomaly while attempting to call microservice A. Therefore, the third alert is the originating alert.
Table 1.
An example of alert incident
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90900-9_4/MediaObjects/648501_1_En_4_Tab1_HTML.png
Dummy
Nowadays, with the ever-increasing scale of parameters and data in pre-trained models, large language models (LLMs for short) with tens or even hundreds of billions of parameters have emerged with capabilities beyond conventional models[36, 47]. These capabilities include in-context learning, instruction learning, and chain of thought (step-by-step reasoning)[55].
In-context learning capability allows large language models to generate expected output without requiring additional training or gradient update with natural language instruction and/or several task demonstrations[4]. The instruction learning capability empowers large language models to follow task instructions in input text without using explicit examples[35, 39, 46]. The chain of thought or step-by-step reasoning capability empowers large language models to solve complex tasks using the prompt mechanism that involves intermediate reasoning steps to obtain the final answer[48, 55].
The emergent capabilities enable large language models to automatically analyze and infer the expected result based on the input text provided. Therefore, in this paper, we take the lead in employing large language models to emulate the process of operations engineers analyzing alert incidents.
We first use real alert incidents to investigate the key alert information that operations experts take into account when analyzing alert incidents. Then, according to the investigation result, we propose a large language model-based method, VOCE (Virtual On-Call Engineer), to emulate the process of operations experts analyzing alert incidents. More specifically, for an alert incident, VOCE can comprehend the local anomalies recorded by each alert within the incident, infer the propagation of the anomalies, and suggest the alert recording the most likely originating anomaly that triggers other anomalies.
The contributions of this paper are as follows.
1.
We use real incident data to investigate the analysis process carried out by operations experts regarding alert incidents. Based on the result of the investigation, we summarize the key alert information that operations experts take into account when analyzing an alert incident.
 
2.
We introduce an automated method, VOCE (Virtual On-Call Engineer), which adopts a large language model to automatically analyze an alert incident. VOCE emulates the process of operations experts analyzing alert incidents, enabling a large language model to comprehend alerts and automatically suggest the alert recording most likely originating anomaly that triggers other anomalies.
 
3.
We conducted an experimental study to evaluate the performance of VOCE based on real-world alert incidents. Experimental results demonstrate the effectiveness of VOCE in automatically analyzing alert incidents.
 
For alert analysis, there are many studies with different purposes. Some focus on predicting system faults [11, 53], and some focus on linking alerts of the same system fault [5, 9, 10, 29, 52].
Studies for system fault prediction focus on predicting system faults according to signal alerts. Chen et al. propose AirAlert [11] to predict outage faults before they actually occur to minimize service downtime and ensure high system availability. Specifically, AirAlert analyzes the relationships between outage faults and alerting signals by a Bayesian network and predicts outage faults using a robust gradient-boosting tree-based classification approach. Zhao et al. [53] propose eWarn to online predict whether a system fault will occur in the near future based on alerts. Specifically, eWarn first extracts some textual and statistical features from alerts to represent omen alert patterns for a system fault and then builds a classification model to predict the occurrence of a system fault. While proactive fault prediction can help operations engineers anticipate potential system issues, the responsibility for alert analysis still resides with the engineers themselves. Thus, in this paper, we present an approach that aims to reduce the workload of operations engineers in alert analysis.
The studies for alert linking focus on linking alerts triggered by the same system fault into the same incident. Lin et al. [29] try to link alerts by alert contents to gain some insight into a system fault. Zhao et al. [52] propose AlertStorm to detect alert storm faults, which lead to overwhelming numbers of alerts in a short time, and link alerts of alert storm faults according to alert contents and system topology. Both the two approaches adopt Jaccard to measure textual similarity between alert contents. LiDAR [10] and OAS [5] employ neural networks to mine common semantic information between alerts, thereby linking alerts of the same system fault. Nevertheless, LiDAR additionally incorporates the topological relationships between system components associated with alerts, while OAS further takes into account the behavioral information of alerts. In addition, Chen et al. presents DyAlert [9], a dynamic graph neural network-based approach to linking alerts. These existing approaches primarily rely on statistical features or train neural networks using labeled data to learn potential relationships between alerts. However, their criteria may differ from those used by operations engineers during manual analysis, which can lead to inaccurate results. In this paper, the proposed approach leverages a large language model to simulate the alert analysis process of operations engineers. As a result, it achieves more accurate and expert-consistent outcomes.
For large language models, they have been shown to be effective for various tasks. Their success can be highly attributed to their ability to understand the intentions of users and complete tasks in a zero-shot or a few-shot fashion [4, 35, 39]. Due to these emergent abilities, several frameworks have been proposed to motivate large language models to reason before reaching the final conclusions. Kojima et al. [26] and Wei et al. [48] prompt large language models to automatically write step-by-step solutions to solve math problems and other reasoning tasks. Gao et al. [18] apply this technique in writing Python programs. Chen et al. [7] and Shinn et al. [40] instruct large language models to self-debug their generated code. In the field of system operations, there are studies [3] indicating the potentials of large language models, but none of them fully exploits the inherent reasoning abilities of large language models or leverages prior knowledge for alert incident analysis.

3 Background

In this section, we define alerts and alert incidents, then explore key information factors considered in alert analysis, and finally discuss the current applications and potential of large language models in system operations and maintenance.

3.1 Alert

An alert usually has three basic attributes: timestamp, source, and content, which are described below.
1.
Timestamp: The time at which the alert is generated.
 
2.
Source: The system component where the alert is generated, which is usually the IP of the system component.
 
3.
Content: The text that records an anomaly of the system component.
 
An alert incident is a group of alerts that are caused by the same system fault. There are some existing studies [5, 9, 10, 29, 52] to automatically link alerts into incidents. Some companies also have their own specific alert linking approaches. These approaches usually link alerts by measuring the semantic similarity of alert contents or the topological distance between alert sources. Since the research goal of this paper is to analyze incidents instead of linking alerts, we will not go into the details of linking alerts.
In this paper, for an alert incident, we formally define it as \(I=[a_1,a_2,\cdots ,a_n]\), where \(a_i\) (\(1\le i\le n\)) is an alert in the incident. Moreover, \(a_i=(t_i, s_i, w_i, e_i)\), where \(t_i\) is the timestamp, \(s_i\) is the alert source, \(w_i\) is the alert content, and \(e_i\) is the template id of \(a_i\) parsed by Section 4.1, which represents the type of anomaly recorded by the alert.

3.2 Key Alert Information

In addition to the above three basic attributes, some other inherent or deduced information factors are also considered in the process of alert incident analysis. To figure out how operations engineers analyze alerts, drawing from previous studies [41, 53, 54] and insights from operations engineers at Company A, we summarize the following four types of information factors that are typically taken into account during alert analysis.
1.
Order: The chronological order of an alert in an incident according to timestamps. It may reflect the propagation sequence of anomalies corresponding to the same system fault.
 
2.
System Layer: The infrastructure layer to which the anomaly of an alert belongs within a system. The structure of a service system is often divided into different layers [12], such as database layer, network layer, application layer, etc. System components in higher layers usually rely on the functionality of system components in lower layers during task execution. Typically, anomalies in lower-layer components can impact the proper functioning of higher-layer components. For example, a network connection anomaly at the network layer may lead to a service execution failure at the application layer.
 
3.
Impact Scope: The impact scope of the anomaly indicated by an alert. For example, a single microservice or the entire service system.
 
4.
Severity: The extent of damage to system services caused by the anomaly of an alert. Generally, more severe alerts result in greater damage.
 
In this paper, we investigate alert incidents from a real service system of Company A over a one-month period (2022/09/01 00:00 to 2022/09/30 23:59), which includes 10,680 alerts and 827 incidents. We enlist two operations engineers to retrospectively analyze an alert incident and label the originating alert, which indicates the most likely originating anomaly that triggers other anomalies in the incident. If a consensus cannot be reached, a third expert will be engaged to review the annotations, with the minority deferring to the majority opinion. Then, for each incident, we assess whether the labeled originating alert has the first order, the lowest system layer, the broadest impact scope, and the highest severity. Table 2 demonstrates experimental results, which statistics the proportion of incidents with the highest priority of originating alerts for each type of information factor.
Table 2.
Statistical results for key information factors
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90900-9_4/MediaObjects/648501_1_En_4_Tab2_HTML.png
Dummy
Based on Table 2, we can find that, for the alert incidents of Company A over a one-month period, the originating alerts for more than 93% of incidents have the lowest system layers, the broadest impact scope, and the highest severity. Additionally, only about 45% of the originating alerts are the first generated. Therefore, the experimental findings reveal that system layer, impact scope, and severity are significant information factors in assessing whether an alert indicates the originating anomaly of a system fault. Nonetheless, the order of an originating alert does not necessarily precede other alerts. This is because, in a real production environment, due to varying sensitivities in monitoring mechanisms for different anomalies, originating anomalies may not activate monitoring mechanisms and generate alerts first.

3.3 Large Language Models

The Large Language Model (LLM) possesses the ability to comprehend provided textual information from users and systematically analyze this information based on user instructions, thereby progressively deriving reasonable outputs [55]. Such large language models have demonstrated notable achievements across various domains, such as chat bots [23], search engines [22], software testing [25], and system maintenance [3].
The study of Microsoft [3] shows that, when provided with a phenomenon description of a system fault, large language models can effectively infer a plausible fault reason. However, the fundamental research objective of this study differs from our work. In this study, the model is fed a pre-summarized description of a system fault rather than raw alert data. Its output is a speculative explanation of a potential fault reason, rather than a detailed suggestion of the originating alert and its source. Nonetheless, existing study demonstrates the potential application of large models in the field of system operations.
According to Section 3.2, operations engineers, when analyzing an alert incident, tend to extract several specific information factors from alerts. Therefore, in this paper, with the emergent abilities of LLMs, we use LLMs to extract these factors and to emulate real operations engineers in: comprehending anomalies recorded by alerts; inferring the propagation process of the anomalies; suggesting the originating alert.
Fig. 1.
The overview of VOCE (Virtual On-Call Engineer).

4 Approach

The objective of VOCE (Virtual On-Call Engineer) is to utilize the emergent capabilities of large language models [55] to emulate how operations engineers analyze alert incidents, thereby enhancing operational efficiency. Fig. 1 shows the overview of VOCE. It first preprocesses each alert in an incident as introduced in Section 4.1. Then, according to the investigation in Section 3.2, in key information factor extraction, VOCE utilizes a large language model to extract key information factors from alerts.
In causality mining, VOCE analyzes the propagation process of system faults underlying an alert incident by examining the topological relationships between different system components. It is important to note that while these relationships may be directional in some systems, the direction of fault propagation does not necessarily align with the direction of the topological relationships among the components
Fig. 2.
A toy example of system fault propagation.
Fig. 2 shows a toy example. The topological relationship between “source 1” and “source 2” is directional. “Source 1”, a virtual machine, is deployed on “source 2”, a physical machine. Both “fault 1” and “fault 2” impact “source 1” and “source 2”. However, these two faults propagate in opposite directions between the two sources. The originating anomaly of “fault 1” is the memory leak anomaly of “source 1”, which triggers the high memory utilization anomaly of “source 2”. Conversely, the originating anomaly of “fault 2” is the network interruption anomaly of “source 2”, which triggers the service offline anomaly of “source 1”.
Therefore, in causality mining, VOCE combines system topology data and key information factors to mine the causalities between anomalies of different system components. In causality correction, VOCE further employs statistical information to validate the extracted causalities and correct the inaccurate ones. Finally, in originating alert suggestion, for an alert incident, based on previous analysis results, VOCE suggests the alert which records the originating anomaly that triggers other anomalies in an incident, called originating alert.

4.1 Preprocessing

Alert content, which records the anomaly of the system component, commonly consists of two parts, variable parameters and invariant keywords [20]. Invariant keywords describe the phenomenon of the anomaly detected by a detection mechanism, and variable parameters record some system metrics, such as CPU usage and memory usage. The text composed of invariant keywords is also called the alert template [20]. For example, in Table 1, the template of the second alert is “The success rate of microservice (\(<*>\)) accessing the interface of microservice (\(<*>\)) \(\le \) \(<*>\)%”, where “\(<*>\)” is the placeholder for a variable parameter.
Thus, alerts belonging to the same template record the same type of system anomaly. To assist an LLM distinguish between different types of anomalies, we first tokenize alert contents, filter out variable parameters, and then derive templates from the processed alert contents [2, 13, 20, 27, 45]. Since Drain [20] is a widely-used online parser [5, 19, 49, 51], we adopt it to parse templates. After the parsing task, each alert will be assigned a template, which is identified by a unique number, \(e_i\) (\(1\le i \le n\)).

4.2 Key Information Factor Extraction

According to the investigation in Section 3.2, in addition to the basic alert attributes, timestamp, source, and content, there are three key information factors that are typically taken into account during engineers analyzing alert incidents: system layer, impact scope, and severity. In an alert incident, the originating anomaly usually has a lower system layer, broader impact scope, and higher severity than other anomalies recorded by the alerts in the incident.
Fig. 3.
The process of key information factor extraction.
Therefore, we adopt a large language model to emulate the process of operations engineers when analyzing alert incidents. Fig. 3 shows the chain-of-thought (CoT) prompts for querying the model. Specifically, we first provide an alert incident needed to be analyzed and related knowledge retrieved from documents by semantic similarly [28]. Then, we require the model to understand the anomaly indicated by each alert. Moreover, based on the investigation in Section 3.2, we prompt the model to step-by-step carry out the following analyses [26].
We instruct the model to extract and analyze three key factors for each alert: system layer, impact scope, and severity. Then, we instruct the model to compare and suggest the originating alert based on the finding that the originating anomaly typically has a lower system layer, broader impact scope, and higher severity. Finally, we design a fill-in-the-blank task with a standardized answering structure to facilitate the automated parsing of the output of the model.
Although the prompts in Fig. 3 can instruct a large language model to extract key information factors (system layer, impact scope, and severity) from alerts and suggest the originating alert, we refrain from directly employing the prompts to analyze an alert incident. Because it may be challenging for a large language model to accurately figure out the anomalies recorded by alerts and infer precise causal relationships between the anomalies in a single task.

4.3 Causality Mining

An alert incident can encompass multiple alerts from various sources, complicating the ability of a large language model to identify the originating alert in a single task. To address this challenge, we propose a hierarchical causality mining approach that breaks down alert incident analysis into several sub-tasks, each focusing on fewer alerts and sources.
Fig. 4.
The process of causality mining.
Usually, within an individual source (system component), various anomaly detection mechanisms are deployed. As a result, in an alert incident, an individual source may contain multiple alerts. Therefore, as shown in Fig. 4, we first focus the perspective of the large language model on an individual source. Based on the prompts in Section 4.2, we can instruct the model to only analyze the relationship between anomalies within an individual source and suggest the originating alert within the source.
Then, according to the system topology, we mine the causality between the anomalies of each two neighboring sources. We instruct the large language model using the prompts in Section 4.2 to determine which of the originating alerts from the two neighboring sources is more likely to record the true originating anomaly of the underlying system fault. Such a result indicates the propagation direction of the system fault between the two neighboring sources.
To stabilize the inference performance of the large language model, each analysis task is repeated k times [50], and the alert that is selected the most times is the final result. With mined causalities, we can construct a propagation graph, \(G=\{E,V\}\), of the system fault underlying a given alert incident. V is the node set that consists of the sources involved in the incident. E is the edge set, in which each edge is a directed edge. For two neighboring sources in G, the propagation direction of the system fault between them is the “opposite” direction of the edge between them.

4.4 Causality Correction

To improve the robustness of VOCE, we validate and refine the propagation directions between sources in the propagation graph, G. Since alerts with the same template record the same type of anomaly, sources containing the same set of alert templates should have similar causal relationships in G.
Therefore, we first classify sources based on the alert templates they contain, grouping sources that share the same set of alert templates into the same type. The classifying result is denoted as \(T=[S_1,S_2,S_3,...S_o]\), where o is the number of types, and \(S_i\) (\(1\le i\le o\)) is the set of sources that belong to the i-th type. Then, we can calculate the number of edges directed from sources of the i-th type to sources of the j-th type, denoted as \(cnt_{i,j}=|{(u,v)|u \in S_i, v\in S_j, (u,v)\in E}|\). E is the edge set of G. Therefore, if \(cnt_(i,j)\)>cnt(ji), the underlying system fault is more likely to propagate from the i-th source type to the j-th source type, and vice versa. Based on such statistical results, we can further correct the directed edges in G that violates the statistical results.

4.5 Originating Alert Suggestion

The fault propagation graph, G, reveals the propagation process of the system fault indicated by a given alert incident. Thus, the source with a higher centrality in G is more likely to be where the originating anomaly occurs. There are many approaches to calculate the centrality of nodes in a graph, such as degree centrality taking the degree of a node as the node centrality [17], betweenness centrality taking the number of shortest paths through a node as the node centrality [16], and closeness centrality taking the sum of the shortest distances from a node to other nodes as the node centrality [15].
In G, for an alert source, its centrality should be determined by the fault propagation relationships in the graph. The direction of the edge between two neighboring sources in G is opposite to the propagation direction of the system fault between the sources. Thus, the more other sources can reach a source through the directed edges in G, the more likely the source is where the originating anomaly occurs.
Eigenvector centrality [34] transmits the centrality score of each node to its neighboring nodes through the edges in a graph, thus a node that can be reached by more other nodes is likely to have a higher centrality score. Therefore, we choose eigenvector centrality [34] to measure the centrality of the sources in G. As a result, we suggest that the originating alert of the source is the alert that records the true originating anomaly.

5 EVALUATION

To evaluate the performance of VOCE, we exploit real-world datasets from a large commercial company A to address the following research questions.
  • RQ1: How does VOCE perform in analyzing alert incidents?
  • RQ2: How does VOCE perform in terms of efficiency?
  • RQ3: How does the parameter k in causality mining affect the performance?

5.1 Experiment Setup

Dataset We exploit real-world alerts from Company A within one month, 2022/09/01 00:00 to 2022/09/30 23:59. A is a large commercial company, providing service for more than a billion users from hundreds of countries. The dataset contains 10680 alerts and 827 incidents. On average, each incident involves 12.91 alerts. For each alert incident, we engaged two operations engineers to conduct a retrospective analysis and label the originating alert. If consensus is not achieved, a third expert will be called in to review the annotations, with the minority deferring to the majority view.
Compared Approaches We utilize two popular large language models, GPT [1, 24] and LLaMA [43, 44], to evaluate the performance of VOCE.
1.
VOCE-GPT: We adopt GPT-4o from OpenAI [1, 24] as the based model of VOCE. we integrate VOCE with GPT-4o by the OpenAI API service.
 
2.
VOCE-LLaMA: Since LLaMA [43, 44] is a widely used open-source model for many language task[38, 42, 56], we also implement VOCE with LLaMA. We choose LLaMA-2 with 13 billion parameters from Meta [44].
 
3.
CoT-GPT: We adopt GPT-4o and the chain of thoughts in Section 4.2 to analyze an alert incident step by step.
 
4.
CoT-LLaMA: Similar to the above, we adopt LLaMA and the chain of thoughts in Section 4.2 to analyze an alert incident.
 
5.
Prompt-GPT: We instruct the model to straightforwardly suggest the originating alert for an alert incident using the prompts in Fig. 5.
 
6.
Prompt-LLaMA: Similar to the above, we adopt LLaMA as the base model to straightforwardly suggest the originating alert for an alert incident.
 
Fig. 5.
The prompts of naive approaches.
Implementation All experiments are conducted on a server with 2 * Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, 1007.0 GB physical memory and 8 Nvidia A40 GPU. Each GPU has 46 GB memory. The OS of the server is Ubuntu 22.04.2 LTS. All experimental approaches are implemented with Python 3.10, and the GPT-4o API service is provided by OpenAI [1, 24]. Meta has released three sizes of LLaMA-2 models, 7 billion, 13 billion, and 70 billion parameters [44]. Due to the limitations of computing resources, we choose the largest LLaMA-2 model that our server can support, the model with 13 billion parameters. We deploy a LLaMA-2 model on each GPU in our server. The parameter k in causality mining is set to 5. During experiments, we distribute requests for the LLaMA-2 model evenly across these 8 instances in parallel.
Metrics To evaluate the effectiveness of an experimental approach, we calculate the analyzing accuracy of the approach. Specifically, the accuracy is defined as \(\frac{N'}{N}\), where N is the total number of incidents and \(N'\) is the number of incidents whose originating alerts are correctly suggested by an experimental approach. In addition, to measure the efficiency of an experimental approach, we calculate the average time cost for the experimental approach to process an alert incident.

5.2 Evaluation Results

To answer the proposed research questions, we evaluate VOCE from three aspects, the effectiveness of VOCE, the efficiency of VOCE, and the impact of the parameter k in causality mining.
RQ1: the effectiveness of VOCE To address RQ1, Table 3 presents the performance of experimental approaches, while Fig. 6 provides a comparison. We can find that, for different base models, VOCE consistently achieves the highest accuracy (>80%), which proves the effectiveness of VOCE. More specifically, for each base model, the accuracy of VOCE is higher than that of the CoT-based approach, while the accuracy of the CoT-based approach is higher than that of the prompt-based approach. Such experimental findings prove that the key alert information we found in Section 4.2 and the hierarchical causality mining strategy we propose in Section 4.3 can effectively instruct the large language model in analyzing an alert incident and suggesting the originating alert.
Table 3.
The performance of experimental approaches.
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90900-9_4/MediaObjects/648501_1_En_4_Tab3_HTML.png
Dummy
Fig. 6.
The comparison of the performance of different approaches.
In addition, from Fig. 6(a), we can find that the accuracy of each type of GPT-based approach is higher than that of the same type of LLaMA-based approach. This demonstrates that the commercial GPT-4o model exhibits greater emergent capabilities than LLaMA-2 with 13 billion parameters.
RQ2: the efficiency of VOCE To address RQ2, as shown in Table 3 and Fig. 6(b), we calculate the average time cost for each approach in analyzing an alert incident. We can find that, for different base models, VOCE takes more time to analyze an alert incident than other approaches. This is because the key information extraction and hierarchical causality mining strategy in VOCE require more computation. Nevertheless, the average time cost of VOCE-GPT is less than 1 minute, and the average time cost for VOCE-LLaMA is less than 5 minutes. Moreover, in Company A, operations engineers typically take about 15 minutes to analyze an alert incident. Therefore, the efficiency of VOCE can meet the needs of practical operational tasks.
Moreover, we can also find that the time cost of each type of LLaMA-based approach is more than that of the same type of GPT-based approach. This is because the API service of LLaMA is deployed on 8 GPUs in our experimental environment, and each GPU has an instance of LLaMA model. In our experiments, we distribute the requests for the LLaMA model to these 8 instances as evenly as possible. However, GPT-4o is a commercial model, whose API service is provided by OpenAI, and OpenAI is supposed to have far more computing resources than our experimental environment. Therefore, due to the limitation of computing resources, the time cost of LLaMA-based approaches is greater than that of GPT-based approaches. However, the LLaMA-based approach can still analyze an alert incident in less than five minutes on average, demonstrating the efficiency of our approaches.
RQ3: the impact of \({\boldsymbol{k}}\) To address RQ3, we evaluate the performance of VOCE under different values of k. We adopt GPT as the base model and vary k from 1 to 10. The result is shown in Fig. 7, where Fig. 7(a) shows the accuracy result and Fig. 7(b) shows the time cost result. For a confidence interval of 5 at a 95% confidence level, we sample 263 incidents to evaluate the impact of k.
Fig. 7.
The impact of k on the performance of VOCE.
From Fig. 7(a), we can find that, as the value of k increases, the accuracy of VOCE also increases at first, and then it gradually stabilizes. This is because a larger k allows VOCE to rely on more analytical results from a large language model for each originating alert suggestion, enhancing the robustness of suggestion results. In addition, as shown in Fig. 7(b), despite some slight fluctuations, the time cost of VOCE does not significantly increase with k. This is because the requests to the service of a large language model in VOICE are parallel. Thus, the time cost of VOCE is not strongly related to k when there are sufficient computing resources to support the service of the model.
Overall, a larger k enables the model to analyze a greater number of outcomes, thereby enhancing the robustness of the final results. However, a larger k also incurs higher computational costs. In scenarios where the deployment resources are limited, this can lead to increased time costs. Consequently, we recommend that the selection of k should be carefully evaluated based on the available deployment resources and the desired model performance.

5.3 Case Study

Table 4 presents an incident involving an offline fault in a microservice. The alerts from “10.16.141.247” record that microservice A is offline and the call success rate for microservice D is below the threshold. The alerts from “10.16.127.162” record that the call success rate for microservice B is below the threshold, which has affected the service requests from consumer X. The alert from “10.16.150.106” also records a decreased call success rate for microservice C. These three sources belong to the same service system and have interdependent topological relationships. According to expert analysis, the fault of this alert incident stems from the offline of microservice A in “10.16.141.247”, which in turn triggers the microservices, B, C and D, to be unable to provide services normally.
Table 4.
The alert incident of a microservice offline fault
https://static-content.springer.com/image/chp%3A10.1007%2F978-3-031-90900-9_4/MediaObjects/648501_1_En_4_Tab4_HTML.png
Dummy
Fig. 8 shows the originating alert within each source and the fault propagation graph of the incident, both of which are inferred by VOCE-GPT. The direction of fault propagation is the opposite direction of the edge. We can find that VOCE successfully mined the causality that the offline of microservice A triggers microservices B, C, and D to malfunction. In the fault propagation graph, since “10.16.127.162” and “10.16.150.106” both can ultimately reach “10.16.141.247”, “10.16.141.247” has a stronger centrality than the other two sources. As a result, VOCE suggests that “10.16.141.247” is the source where the originating anomaly occurs, and thus the first alert in Table 4, which indicates the offline anomaly of microservice A, is the originating alert.
Fig. 8.
The fault propagation graph mined by VOCE.
However, VOCE mistakenly identified a causal link between the anomalies of microservice B and microservice C, as shown by the dotted line in Fig. 8. Due to the robustness of the eigenvector centrality [34] adopted by VOCE, such a redundant edge does not interfere with the final originating alert suggestion.
Fig. 9.
The analysis process of VOCE.
Although eigenvector centrality is robust to redundant edges, practical deployment can incorporate strategies to filter them out, improving the accuracy of results. For example, predefined rules can be established to eliminate edges that are definitively not traversed by faults. Additionally, since fault propagation graphs are easily interpretable by engineers, redundant edges can be identified and removed through expert review.
To further illustrate the process of VOCE in mining causalities between different sources, Fig. 9 shows the analysis process undertaken by VOCE-GPT to deduce the causal relationship between “10.16.141.247” and “10.16.127.162”. According to the underlined content in Fig. 9, we can find that a large language model does indeed exhibit the ability to emulate the processes of operations engineers analyzing alert incidents.

6 Discussion

6.1 Threats to Validity

For the internal threat to validity, the parameter k in causality mining determines the stability of the performance of the large language model. In RQ3, we investigate the impact of k on VOCE and thus choose the optimal value of k for other experiments. Moreover, the originating alert of each incident in the experimental data is manually labeled, introducing potential labeling noise. To reduce the threat, we invite two operations engineers to conduct a retrospective labeling. If they cannot reach a consensus, a third expert will review the annotations, with the minority deferring to the majority. Although mislabeling is hard to avoid during the manual process, we believe that the amount of noises in labeling is small.
For the external threat to validity, since VOCE is based on a large language model, the performance of VOCE relys on the emergent capabilities of the large language model. To mitigate this threat, we evaluate the performance of VOCE using two widely-used large language models, GPT-4o [1, 24] and LLaMA [43, 44]. GPT-4o is a popular commercial large language model and LLaMA is a prominent open-source large language model. Therefore, the effectiveness and efficiency of VOCE based on these two mainstream models can prove the ability of VOCE to analyze alert incidents.
Additionally, our dataset consists of one month of alerts from Company A, which may limit its diversity and the generalizability of our approaches. However, as a large commercial entity providing financial services to over a billion users across numerous countries, Company A serves as a representative online service system that generates sufficient complex data. The results of VOCE on the real dataset of Company A indicate that our approach is sufficiently generalizable to benefit other companies. In the future, we will deploy VOCE on more service systems. We see no intrinsic limitations that would prevent VOCE from working reliably on different online service systems.

6.2 Limitations

To enhance the robustness of VOCE and facilitate its practical deployment in real-word operational environments, it is essential to address the following issues.
System Topology Incompleteness: VOCE leverages system topology to reveal the propagation process of system faults. However, in practice, system topology information may be incomplete or unavailable. To address this limitation, it is crucial to ensure proper maintenance of comprehensive system topology data to ensure the applicability of VOCE.
Missing Originating Anomalies in Alerts: VOCE assumes that the originating anomaly of a fault is captured by monitoring systems and recorded by alerts. In cases where monitoring coverage is insufficient, identifying the originating anomaly solely based on alerts may be challenging. Therefore, robust monitoring and alerting mechanisms are essential.
Lack of Proactive Fault Resolution: While VOCE automates alert analysis, it cannot proactively resolve faults. Automated fault resolution requires standardized troubleshooting procedures. In practice, troubleshooting involves a diverse range of procedures such as SQL queries, Bash scripts, custom scripts, and specialized software tools, which often lack uniformity. This diversity poses challenges for LLMs in learning and executing such procedures. Addressing this limitation requires not only enabling LLMs to understand expert troubleshooting workflows but also establishing standardized troubleshooting procedures.
Hybrid Human-Machine Workflows: To ensure consistency between the analysis provided by VOCE and expert assessments, a hybrid human-machine workflow can be introduced. Since VOCE emulates the analysis processes of operations engineers, engineers can seamlessly intervene at various stages. For example, during Causality Mining, engineers can review and validate the causal relationships inferred by VOCE between system components, which is often more efficient than analyzing raw alerts. During Causality Correction, the directed edges in the fault propagation graph intuitively represent fault propagation, allowing engineers to refine any incorrect dependencies identified by VOCE.

7 Conclusion

In this paper, we propose an approach VOCE (Virtual On-call Engineer) to automatically analyze alert incidents through a large language model. We first use real alert incidents to investigate the analysis process undertaken by operations experts regarding alert incidents. Then, according to the investigation result, VOCE uses a large language model to emulate the process of operations experts analyzing alert incidents. More Specifically, VOCE first comprehends the local anomalies recorded by the alerts from different system components (sources), then deduces the propagation graph of the system fault indicated by an alert incident, and finally suggests the originating alert.
We conduct extensive experiments on real alert incidents and the results demonstrate that VOCE can effectively and efficiently analyze alert incidents. Currently, VOCE can only analyze an alert incident. It lacks the ability to proactively address the fault underlying an alert incident. In the future, we will endow VOCE with advanced reasoning and autonomous operational execution abilities, enabling it to independently resolve an alert incident after analysis.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Literatur
1.
Zurück zum Zitat Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023) Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:​2303.​08774 (2023)
2.
Zurück zum Zitat Agrawal, A., Karlupia, R., Gupta, R.: Logan: A distributed online log parser. In: IEEE 35th International Conference on Data Engineering (ICDE). pp. 1946–1951. IEEE (2019) Agrawal, A., Karlupia, R., Gupta, R.: Logan: A distributed online log parser. In: IEEE 35th International Conference on Data Engineering (ICDE). pp. 1946–1951. IEEE (2019)
3.
Zurück zum Zitat Ahmed, T., Ghosh, S., Bansal, C., Zimmermann, T., Zhang, X., Rajmohan, S.: Recommending root-cause and mitigation steps for cloud incidents using large language models. In: Proceedings of the 45th International Conference on Software Engineering. p. 1737-1749. ICSE ’23, IEEE Press (2023) Ahmed, T., Ghosh, S., Bansal, C., Zimmermann, T., Zhang, X., Rajmohan, S.: Recommending root-cause and mitigation steps for cloud incidents using large language models. In: Proceedings of the 45th International Conference on Software Engineering. p. 1737-1749. ICSE ’23, IEEE Press (2023)
4.
Zurück zum Zitat Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020) Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20, Curran Associates Inc., Red Hook, NY, USA (2020)
5.
Zurück zum Zitat Chen, J., Wang, P., Wang, W.: Online summarizing alerts through semantic and behavior information. In: 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). pp. 1646–1657 (2022) Chen, J., Wang, P., Wang, W.: Online summarizing alerts through semantic and behavior information. In: 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). pp. 1646–1657 (2022)
6.
Zurück zum Zitat Chen, J., He, X., Lin, Q., Xu, Y., Zhang, H., Hao, D., Gao, F., Xu, Z., Dang, Y., Zhang, D.: An empirical investigation of incident triage for online service systems. In: IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice. pp. 111–120. IEEE (2019) Chen, J., He, X., Lin, Q., Xu, Y., Zhang, H., Hao, D., Gao, F., Xu, Z., Dang, Y., Zhang, D.: An empirical investigation of incident triage for online service systems. In: IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice. pp. 111–120. IEEE (2019)
7.
Zurück zum Zitat Chen, X., Lin, M., Schärli, N., Zhou, D.: Teaching large language models to self-debug (2023) Chen, X., Lin, M., Schärli, N., Zhou, D.: Teaching large language models to self-debug (2023)
8.
Zurück zum Zitat Chen, X., Deng, L., Huang, F., Zhang, C., Zhang, Z., Zhao, Y., Zheng, K.: Daemon: Unsupervised anomaly detection and interpretation for multivariate time series. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). pp. 2225–2230 (2021) Chen, X., Deng, L., Huang, F., Zhang, C., Zhang, Z., Zhao, Y., Zheng, K.: Daemon: Unsupervised anomaly detection and interpretation for multivariate time series. In: 2021 IEEE 37th International Conference on Data Engineering (ICDE). pp. 2225–2230 (2021)
9.
Zurück zum Zitat Chen, Y., Zhang, C., Dong, Z., Yang, D., Peng, X., Ou, J., Yang, H., Wu, Z., Qu, X., Li, W.: Dynamic graph neural networks-based alert link prediction for online service systems. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 79–90 (2023) Chen, Y., Zhang, C., Dong, Z., Yang, D., Peng, X., Ou, J., Yang, H., Wu, Z., Qu, X., Li, W.: Dynamic graph neural networks-based alert link prediction for online service systems. In: 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 79–90 (2023)
10.
Zurück zum Zitat Chen, Y., Yang, X., Dong, H., He, X., Zhang, H., Lin, Q., Chen, J., Zhao, P., Kang, Y., Gao, F., Xu, Z., Zhang, D.: Identifying linked incidents in large-scale online service systems. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 304-314. ESEC/FSE 2020, Association for Computing Machinery, New York, NY, USA (2020) Chen, Y., Yang, X., Dong, H., He, X., Zhang, H., Lin, Q., Chen, J., Zhao, P., Kang, Y., Gao, F., Xu, Z., Zhang, D.: Identifying linked incidents in large-scale online service systems. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 304-314. ESEC/FSE 2020, Association for Computing Machinery, New York, NY, USA (2020)
11.
Zurück zum Zitat Chen, Y., Yang, X., Lin, Q., Zhang, H., Gao, F., Xu, Z., Dang, Y., Zhang, D., Dong, H., Xu, Y., Li, H., Kang, Y.: Outage prediction and diagnosis for cloud service systems. In: The World Wide Web Conference. p. 2659-2665. ACM, New York, NY, USA (2019) Chen, Y., Yang, X., Lin, Q., Zhang, H., Gao, F., Xu, Z., Dang, Y., Zhang, D., Dong, H., Xu, Y., Li, H., Kang, Y.: Outage prediction and diagnosis for cloud service systems. In: The World Wide Web Conference. p. 2659-2665. ACM, New York, NY, USA (2019)
12.
Zurück zum Zitat Chen, Z., Liu, J., Su, Y., Zhang, H., Wen, X., Ling, X., Yang, Y., Lyu, M.R.: Graph-based incident aggregation for large-scale online service systems. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 430–442 (2021) Chen, Z., Liu, J., Su, Y., Zhang, H., Wen, X., Ling, X., Yang, Y., Lyu, M.R.: Graph-based incident aggregation for large-scale online service systems. In: 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). pp. 430–442 (2021)
13.
Zurück zum Zitat Du, M., Li, F.: Spell: Streaming parsing of system event logs. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). pp. 859–864 (2016) Du, M., Li, F.: Spell: Streaming parsing of system event logs. In: 2016 IEEE 16th International Conference on Data Mining (ICDM). pp. 859–864 (2016)
14.
Zurück zum Zitat Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 1285–1298. ACM (2017) Du, M., Li, F., Zheng, G., Srikumar, V.: Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. pp. 1285–1298. ACM (2017)
15.
Zurück zum Zitat Evans, T.S., Chen, B.: Linking the network centrality measures closeness and degree. Communications Physics 5(1), 172 (2022) Evans, T.S., Chen, B.: Linking the network centrality measures closeness and degree. Communications Physics 5(1),  172 (2022)
16.
Zurück zum Zitat Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry pp. 35–41 (1977) Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry pp. 35–41 (1977)
17.
Zurück zum Zitat Freeman, L.C.: Centrality in social networks conceptual clarification. Social Networks 1(3), 215–239 (1978) Freeman, L.C.: Centrality in social networks conceptual clarification. Social Networks 1(3), 215–239 (1978)
18.
Zurück zum Zitat Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., Neubig, G.: PAL: Program-aided language models. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 10764–10799. PMLR (23–29 Jul 2023) Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., Neubig, G.: PAL: Program-aided language models. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 202, pp. 10764–10799. PMLR (23–29 Jul 2023)
19.
Zurück zum Zitat Han, S., Wu, Q., Zhang, H., Qin, B., Hu, J., Shi, X., Liu, L., Yin, X.: Log-based anomaly detection with robust feature extraction and online learning. IEEE Transactions on Information Forensics and Security 16, 2300–2311 (2021) Han, S., Wu, Q., Zhang, H., Qin, B., Hu, J., Shi, X., Liu, L., Yin, X.: Log-based anomaly detection with robust feature extraction and online learning. IEEE Transactions on Information Forensics and Security 16, 2300–2311 (2021)
20.
Zurück zum Zitat He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS). pp. 33–40. IEEE (2017) He, P., Zhu, J., Zheng, Z., Lyu, M.R.: Drain: An online log parsing approach with fixed depth tree. In: 2017 IEEE International Conference on Web Services (ICWS). pp. 33–40. IEEE (2017)
21.
Zurück zum Zitat Hundman, K., Constantinou, V., Laporte, C., Colwell, I., Soderstrom, T.: Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 387-395. Association for Computing Machinery, New York, NY, USA (2018) Hundman, K., Constantinou, V., Laporte, C., Colwell, I., Soderstrom, T.: Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 387-395. Association for Computing Machinery, New York, NY, USA (2018)
25.
Zurück zum Zitat Kang, S., Yoon, J., Yoo, S.: Large language models are few-shot testers: Exploring llm-based general bug reproduction. In: Proceedings of the 45th International Conference on Software Engineering. p. 2312-2323. ICSE ’23, IEEE Press (2023) Kang, S., Yoon, J., Yoo, S.: Large language models are few-shot testers: Exploring llm-based general bug reproduction. In: Proceedings of the 45th International Conference on Software Engineering. p. 2312-2323. ICSE ’23, IEEE Press (2023)
26.
Zurück zum Zitat Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 22199–22213. Curran Associates, Inc. (2022) Kojima, T., Gu, S.S., Reid, M., Matsuo, Y., Iwasawa, Y.: Large language models are zero-shot reasoners. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 22199–22213. Curran Associates, Inc. (2022)
27.
Zurück zum Zitat Le, V.H., Zhang, H.: Log parsing with prompt-based few-shot learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). pp. 2438–2449 (2023) Le, V.H., Zhang, H.: Log parsing with prompt-based few-shot learning. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). pp. 2438–2449 (2023)
28.
Zurück zum Zitat Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020) Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.t., Rocktäschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge-intensive nlp tasks. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)
29.
Zurück zum Zitat Lin, D., Raghu, R., Ramamurthy, V., Yu, J., Radhakrishnan, R., Fernandez, J.: Unveiling clusters of events for alert and incident management in large-scale enterprise it. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 1630-1639. ACM, New York, NY, USA (2014) Lin, D., Raghu, R., Ramamurthy, V., Yu, J., Radhakrishnan, R., Fernandez, J.: Unveiling clusters of events for alert and incident management in large-scale enterprise it. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 1630-1639. ACM, New York, NY, USA (2014)
30.
Zurück zum Zitat Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion. p. 102-111. ACM, New York, NY, USA (2016) Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: Proceedings of the 38th International Conference on Software Engineering Companion. p. 102-111. ACM, New York, NY, USA (2016)
31.
Zurück zum Zitat Liu, P., Xu, H., Ouyang, Q., Jiao, R., Chen, Z., Zhang, S., Yang, J., Mo, L., Zeng, J., Xue, W., Pei, D.: Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks. In: IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). pp. 48–58. IEEE (2020) Liu, P., Xu, H., Ouyang, Q., Jiao, R., Chen, Z., Zhang, S., Yang, J., Mo, L., Zeng, J., Xue, W., Pei, D.: Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks. In: IEEE 31st International Symposium on Software Reliability Engineering (ISSRE). pp. 48–58. IEEE (2020)
32.
Zurück zum Zitat Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Chen, Y., Zhang, R., Tao, S., Sun, P., Zhou, R.: Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. pp. 4739–4745. IJCAI Organization (7 2019) Meng, W., Liu, Y., Zhu, Y., Zhang, S., Pei, D., Liu, Y., Chen, Y., Zhang, R., Tao, S., Sun, P., Zhou, R.: Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. pp. 4739–4745. IJCAI Organization (7 2019)
33.
Zurück zum Zitat Nedelkoski, S., Cardoso, J., Kao, O.: Anomaly detection and classification using distributed tracing and deep learning. In: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). pp. 241–250. IEEE (2019) Nedelkoski, S., Cardoso, J., Kao, O.: Anomaly detection and classification using distributed tracing and deep learning. In: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID). pp. 241–250. IEEE (2019)
34.
Zurück zum Zitat Negre, C.F., Morzan, U.N., Hendrickson, H.P., Pal, R., Lisi, G.P., Loria, J.P., Rivalta, I., Ho, J., Batista, V.S.: Eigenvector centrality for characterization of protein allosteric pathways. Proceedings of the National Academy of Sciences 115(52), E12201–E12208 (2018) Negre, C.F., Morzan, U.N., Hendrickson, H.P., Pal, R., Lisi, G.P., Loria, J.P., Rivalta, I., Ho, J., Batista, V.S.: Eigenvector centrality for characterization of protein allosteric pathways. Proceedings of the National Academy of Sciences 115(52), E12201–E12208 (2018)
35.
Zurück zum Zitat Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., Lowe, R.: Training language models to follow instructions with human feedback (2022) Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C.L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., Lowe, R.: Training language models to follow instructions with human feedback (2022)
36.
Zurück zum Zitat Rae, J.W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., van den Driessche, G., Hendricks, L.A., Rauh, M., Huang, P.S., Glaese, A., Welbl, J., Dathathri, S., Huang, S., Uesato, J., Mellor, J., Higgins, I., Creswell, A., McAleese, N., Wu, A., Elsen, E., Jayakumar, S., Buchatskaya, E., Budden, D., Sutherland, E., Simonyan, K., Paganini, M., Sifre, L., Martens, L., Li, X.L., Kuncoro, A., Nematzadeh, A., Gribovskaya, E., Donato, D., Lazaridou, A., Mensch, A., Lespiau, J.B., Tsimpoukelli, M., Grigorev, N., Fritz, D., Sottiaux, T., Pajarskas, M., Pohlen, T., Gong, Z., Toyama, D., de Masson d’Autume, C., Li, Y., Terzi, T., Mikulik, V., Babuschkin, I., Clark, A., de Las Casas, D., Guy, A., Jones, C., Bradbury, J., Johnson, M., Hechtman, B., Weidinger, L., Gabriel, I., Isaac, W., Lockhart, E., Osindero, S., Rimell, L., Dyer, C., Vinyals, O., Ayoub, K., Stanway, J., Bennett, L., Hassabis, D., Kavukcuoglu, K., Irving, G.: Scaling language models: Methods, analysis & insights from training gopher (2021) Rae, J.W., Borgeaud, S., Cai, T., Millican, K., Hoffmann, J., Song, F., Aslanides, J., Henderson, S., Ring, R., Young, S., Rutherford, E., Hennigan, T., Menick, J., Cassirer, A., Powell, R., van den Driessche, G., Hendricks, L.A., Rauh, M., Huang, P.S., Glaese, A., Welbl, J., Dathathri, S., Huang, S., Uesato, J., Mellor, J., Higgins, I., Creswell, A., McAleese, N., Wu, A., Elsen, E., Jayakumar, S., Buchatskaya, E., Budden, D., Sutherland, E., Simonyan, K., Paganini, M., Sifre, L., Martens, L., Li, X.L., Kuncoro, A., Nematzadeh, A., Gribovskaya, E., Donato, D., Lazaridou, A., Mensch, A., Lespiau, J.B., Tsimpoukelli, M., Grigorev, N., Fritz, D., Sottiaux, T., Pajarskas, M., Pohlen, T., Gong, Z., Toyama, D., de Masson d’Autume, C., Li, Y., Terzi, T., Mikulik, V., Babuschkin, I., Clark, A., de Las Casas, D., Guy, A., Jones, C., Bradbury, J., Johnson, M., Hechtman, B., Weidinger, L., Gabriel, I., Isaac, W., Lockhart, E., Osindero, S., Rimell, L., Dyer, C., Vinyals, O., Ayoub, K., Stanway, J., Bennett, L., Hassabis, D., Kavukcuoglu, K., Irving, G.: Scaling language models: Methods, analysis & insights from training gopher (2021)
37.
Zurück zum Zitat Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., Zhang, Q.: Time-series anomaly detection service at microsoft. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 3009-3017. Association for Computing Machinery, New York, NY, USA (2019) Ren, H., Xu, B., Wang, Y., Yi, C., Huang, C., Kou, X., Xing, T., Yang, M., Tong, J., Zhang, Q.: Time-series anomaly detection service at microsoft. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. p. 3009-3017. Association for Computing Machinery, New York, NY, USA (2019)
38.
Zurück zum Zitat Rozière, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X.E., Adi, Y., Liu, J., Remez, T., Rapin, J., Kozhevnikov, A., Evtimov, I., Bitton, J., Bhatt, M., Ferrer, C.C., Grattafiori, A., Xiong, W., Défossez, A., Copet, J., Azhar, F., Touvron, H., Martin, L., Usunier, N., Scialom, T., Synnaeve, G.: Code llama: Open foundation models for code (2023) Rozière, B., Gehring, J., Gloeckle, F., Sootla, S., Gat, I., Tan, X.E., Adi, Y., Liu, J., Remez, T., Rapin, J., Kozhevnikov, A., Evtimov, I., Bitton, J., Bhatt, M., Ferrer, C.C., Grattafiori, A., Xiong, W., Défossez, A., Copet, J., Azhar, F., Touvron, H., Martin, L., Usunier, N., Scialom, T., Synnaeve, G.: Code llama: Open foundation models for code (2023)
39.
Zurück zum Zitat Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T.L., Raja, A., Dey, M., Bari, M.S., Xu, C., Thakker, U., Sharma, S.S., Szczechla, E., Kim, T., Chhablani, G., Nayak, N., Datta, D., Chang, J., Jiang, M.T.J., Wang, H., Manica, M., Shen, S., Yong, Z.X., Pandey, H., Bawden, R., Wang, T., Neeraj, T., Rozen, J., Sharma, A., Santilli, A., Fevry, T., Fries, J.A., Teehan, R., Bers, T., Biderman, S., Gao, L., Wolf, T., Rush, A.M.: Multitask prompted training enables zero-shot task generalization (2021) Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Scao, T.L., Raja, A., Dey, M., Bari, M.S., Xu, C., Thakker, U., Sharma, S.S., Szczechla, E., Kim, T., Chhablani, G., Nayak, N., Datta, D., Chang, J., Jiang, M.T.J., Wang, H., Manica, M., Shen, S., Yong, Z.X., Pandey, H., Bawden, R., Wang, T., Neeraj, T., Rozen, J., Sharma, A., Santilli, A., Fevry, T., Fries, J.A., Teehan, R., Bers, T., Biderman, S., Gao, L., Wolf, T., Rush, A.M.: Multitask prompted training enables zero-shot task generalization (2021)
40.
Zurück zum Zitat Shinn, N., Cassano, F., Labash, B., Gopinath, A., Narasimhan, K., Yao, S.: Reflexion: Language agents with verbal reinforcement learning (2023) Shinn, N., Cassano, F., Labash, B., Gopinath, A., Narasimhan, K., Yao, S.: Reflexion: Language agents with verbal reinforcement learning (2023)
41.
Zurück zum Zitat Tang, L., Li, T., Pinel, F., Shwartz, L., Grabarnik, G.: Optimizing system monitoring configurations for non-actionable alerts. In: 2012 IEEE Network Operations and Management Symposium. pp. 34–42 (2012) Tang, L., Li, T., Pinel, F., Shwartz, L., Grabarnik, G.: Optimizing system monitoring configurations for non-actionable alerts. In: 2012 IEEE Network Operations and Management Symposium. pp. 34–42 (2012)
43.
Zurück zum Zitat Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., Lample, G.: Llama: Open and efficient foundation language models (2023) Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., Rodriguez, A., Joulin, A., Grave, E., Lample, G.: Llama: Open and efficient foundation language models (2023)
44.
Zurück zum Zitat Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C.C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P.S., Lachaux, M.A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E.M., Subramanian, R., Tan, X.E., Tang, B., Taylor, R., Williams, A., Kuan, J.X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., Scialom, T.: Llama 2: Open foundation and fine-tuned chat models (2023) Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., Bikel, D., Blecher, L., Ferrer, C.C., Chen, M., Cucurull, G., Esiobu, D., Fernandes, J., Fu, J., Fu, W., Fuller, B., Gao, C., Goswami, V., Goyal, N., Hartshorn, A., Hosseini, S., Hou, R., Inan, H., Kardas, M., Kerkez, V., Khabsa, M., Kloumann, I., Korenev, A., Koura, P.S., Lachaux, M.A., Lavril, T., Lee, J., Liskovich, D., Lu, Y., Mao, Y., Martinet, X., Mihaylov, T., Mishra, P., Molybog, I., Nie, Y., Poulton, A., Reizenstein, J., Rungta, R., Saladi, K., Schelten, A., Silva, R., Smith, E.M., Subramanian, R., Tan, X.E., Tang, B., Taylor, R., Williams, A., Kuan, J.X., Xu, P., Yan, Z., Zarov, I., Zhang, Y., Fan, A., Kambadur, M., Narang, S., Rodriguez, A., Stojnic, R., Edunov, S., Scialom, T.: Llama 2: Open foundation and fine-tuned chat models (2023)
45.
Zurück zum Zitat Wang, X., Zhang, X., Li, L., He, S., Zhang, H., Liu, Y., Zheng, L., Kang, Y., Lin, Q., Dang, Y., Rajmohan, S., Zhang, D.: Spine: A scalable log parser with feedback guidance. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 1198-1208. ESEC/FSE 2022, Association for Computing Machinery, New York, NY, USA (2022) Wang, X., Zhang, X., Li, L., He, S., Zhang, H., Liu, Y., Zheng, L., Kang, Y., Lin, Q., Dang, Y., Rajmohan, S., Zhang, D.: Spine: A scalable log parser with feedback guidance. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 1198-1208. ESEC/FSE 2022, Association for Computing Machinery, New York, NY, USA (2022)
46.
Zurück zum Zitat Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners (2021) Wei, J., Bosma, M., Zhao, V.Y., Guu, K., Yu, A.W., Lester, B., Du, N., Dai, A.M., Le, Q.V.: Finetuned language models are zero-shot learners (2021)
47.
Zurück zum Zitat Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E.H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., Fedus, W.: Emergent abilities of large language models (2022) Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E.H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., Fedus, W.: Emergent abilities of large language models (2022)
48.
Zurück zum Zitat Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 24824–24837. Curran Associates, Inc. (2022) Wei, J., Wang, X., Schuurmans, D., Bosma, M., ichter, b., Xia, F., Chi, E., Le, Q.V., Zhou, D.: Chain-of-thought prompting elicits reasoning in large language models. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems. vol. 35, pp. 24824–24837. Curran Associates, Inc. (2022)
49.
Zurück zum Zitat Yang, L., Chen, J., Wang, Z., Wang, W., Jiang, J., Dong, X., Zhang, W.: Plelog: Semi-supervised log-based anomaly detection via probabilistic label estimation. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). pp. 230–231 (2021) Yang, L., Chen, J., Wang, Z., Wang, W., Jiang, J., Dong, X., Zhang, W.: Plelog: Semi-supervised log-based anomaly detection via probabilistic label estimation. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). pp. 230–231 (2021)
50.
Zurück zum Zitat Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T.L., Cao, Y., Narasimhan, K.: Tree of thoughts: Deliberate problem solving with large language models (2023) Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T.L., Cao, Y., Narasimhan, K.: Tree of thoughts: Deliberate problem solving with large language models (2023)
51.
Zurück zum Zitat Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., Xie, C., Yang, X., Cheng, Q., Li, Z., Chen, J., He, X., Yao, R., Lou, J.G., Chintalapati, M., Shen, F., Zhang, D.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 807-817. ACM, New York, NY, USA (2019) Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., Xie, C., Yang, X., Cheng, Q., Li, Z., Chen, J., He, X., Yao, R., Lou, J.G., Chintalapati, M., Shen, F., Zhang, D.: Robust log-based anomaly detection on unstable log data. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 807-817. ACM, New York, NY, USA (2019)
52.
Zurück zum Zitat Zhao, N., Chen, J., Peng, X., Wang, H., Wu, X., Zhang, Y., Chen, Z., Zheng, X., Nie, X., Wang, G., Wu, Y., Zhou, F., Zhang, W., Sui, K., Pei, D.: Understanding and handling alert storm for online service systems. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. p. 162-171. ACM, New York, NY, USA (2020) Zhao, N., Chen, J., Peng, X., Wang, H., Wu, X., Zhang, Y., Chen, Z., Zheng, X., Nie, X., Wang, G., Wu, Y., Zhou, F., Zhang, W., Sui, K., Pei, D.: Understanding and handling alert storm for online service systems. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. p. 162-171. ACM, New York, NY, USA (2020)
53.
Zurück zum Zitat Zhao, N., Chen, J., Wang, Z., Peng, X., Wang, G., Wu, Y., Zhou, F., Feng, Z., Nie, X., Zhang, W., Sui, K., Pei, D.: Real-time incident prediction for online service systems. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 315-326. ACM, New York, NY, USA (2020) Zhao, N., Chen, J., Wang, Z., Peng, X., Wang, G., Wu, Y., Zhou, F., Feng, Z., Nie, X., Zhang, W., Sui, K., Pei, D.: Real-time incident prediction for online service systems. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 315-326. ACM, New York, NY, USA (2020)
54.
Zurück zum Zitat Zhao, N., Jin, P., Wang, L., Yang, X., Liu, R., Zhang, W., Sui, K., Pei, D.: Automatically and adaptively identifying severe alerts for online service systems. In: IEEE Conference on Computer Communications. pp. 2420–2429. IEEE (2020) Zhao, N., Jin, P., Wang, L., Yang, X., Liu, R., Zhang, W., Sui, K., Pei, D.: Automatically and adaptively identifying severe alerts for online service systems. In: IEEE Conference on Computer Communications. pp. 2420–2429. IEEE (2020)
55.
Zurück zum Zitat Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.Y., Wen, J.R.: A survey of large language models (2023) Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.Y., Wen, J.R.: A survey of large language models (2023)
56.
Zurück zum Zitat Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E.P., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging llm-as-a-judge with mt-bench and chatbot arena (2023) Zheng, L., Chiang, W.L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E.P., Zhang, H., Gonzalez, J.E., Stoica, I.: Judging llm-as-a-judge with mt-bench and chatbot arena (2023)
57.
Zurück zum Zitat Zhou, B., Liu, S., Hooi, B., Cheng, X., Ye, J.: Beatgan: Anomalous rhythm detection using adversarially generated time series. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. pp. 4433–4439. International Joint Conferences on Artificial Intelligence Organization (7 2019) Zhou, B., Liu, S., Hooi, B., Cheng, X., Ye, J.: Beatgan: Anomalous rhythm detection using adversarially generated time series. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. pp. 4433–4439. International Joint Conferences on Artificial Intelligence Organization (7 2019)
58.
Zurück zum Zitat Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Liu, D., Xiang, Q., He, C.: Latent error prediction and fault localization for microservice applications by learning from system trace logs. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 683-694. ACM, New York, NY, USA (2019) Zhou, X., Peng, X., Xie, T., Sun, J., Ji, C., Liu, D., Xiang, Q., He, C.: Latent error prediction and fault localization for microservice applications by learning from system trace logs. In: Proceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. p. 683-694. ACM, New York, NY, USA (2019)
Metadaten
Titel
VOCE: A Virtual On-Call Engineer for Automated Alert Incident Analysis Using a Large Language Model
verfasst von
Jia Chen
Xiaolei Chen
Jie Shi
Peng Wang
Wei Wang
Copyright-Jahr
2025
DOI
https://doi.org/10.1007/978-3-031-90900-9_4