AI智能总结
打造自适应AI运维智慧体:大语言模型在软件日志运维的实践 刘逸伦华为2012实验室 1.软件日志运维观点2.自适应智慧体在运维领域面临的Gap3.大模型Prompt引擎助力自适应运维智慧体4.大模型知识迁移打造运维专精模型5.未来畅想 软件日志运维观点:PART 01 智能运维演进趋势是从任务数据驱动到自适应运维智慧体 观点1:软件日志运维是从机器语言向自然语言的转化 (1)日志是机器语言:大规模网络、软件系统在运行过程中每天会产生PB级别的日志,这些日志是一些类自然语言的文本,实时描述了设备的运行状态、异常情况。 (2)传统网络运维是机器语言的人工翻译过程:为了维护网络的稳定,运维人员会持续监控设备的运行状态,希望准确、及时地检测异常和突发事件。网络日志是设备运行维护最重要的数据源,运维人员通常会通过解读日志中的自然语言、语义信息来发现问题、分析根因。 (3)自动日志分析是机器语言的自动翻译过程:日志文本种类繁多、数量庞大,且多数日志为非结构化文本,无法通过人工方式监控和检测全部的日志。更重要的是,分析设备日志需要丰富的领域知识,耗时耗力;简单的规则配置也无法理解文本的语义信息。 观点2:智能运维演进趋势:从任务数据驱动到自适应运维智慧体 [1]LogAnomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs (IJCAI 2019)[2]LogParse: Making Log Parsing Adaptive through Word Classification. (ICCCN 2020)[3]LogStamp: Automatic Online Log Parsing Based on Sequence Labelling. (WAIN Performance 2021)[4]BigLog:UnsupervisedLarge-scale Pre-training for a Unified Log Representation. (IWQoS2023)[5]DA-Parser: A Pre-trained Domain-Aware Parsing Framework for Heterogeneous Log Analysis. (COMPSAC 2023)[6]Logprompt: Prompt engineering towards zero-shot and interpretable log analysis. (ICSE 2024 & ICPC 2024)团队repo地址:https://github.com/LogAIBox PART 02 自适应智慧体在运维领域面临的Gap: 传统自动运维模型既没法“自适应”,也仅是有限“智慧” Gap1:传统智能运维算法依赖于任务标注数据,仅仅是拟合数据,对于新领域无法自适应 在线场景下,由于频繁的软件更新、第三方插件等,大部分产生的日志都是模型未见过的,难以获得足量的历史标注数据,需求模型有自适应能力。 当任务训练数据减少时,传统方法普遍出现了预测精度下降。因此,要将其应用到私有系统中,必然需要大量标注数据。 Gap2:传统运维系统可解释性差、可交互性弱,智慧有限 对异常日志生成了解释,可以快速判断虚报、漏报。 根据本轮分析结果由大语言模型自动生成了分析报告,推荐解决方案。 运维智慧体愿景:并非数据驱动,而是指令驱动,可以进行根因查找与自我纠偏,充当设备系统与工程师之间交流沟通的桥梁 (1)传统日志分析算法只输出“告警/正常”,对于异常日志无反馈,需要专家阅读相关日志模板,人力整理生成分析报告,费时费力。(2)只给出预测结果,对于报假警、漏报等情况不能很快地排除,需要结合原始日志进行分析排查。 现有方法基于任务数据可以自动映射故障现象,但依然没有完成智能运维的最后一步:根因分析和故障自恢复。这些系统的交互设计缺乏反馈与互动,离“智慧体”距离遥远。 PART 03 大模型Prompt引擎助力自适应运维智慧体: LogPrompt:利用Prompt工程激发大模型运维潜能,零样本推断+可解释性 LogPrompt解决传统日志分析两大Gap 传统方法 LogPrompt 无需训练资源,可灵活迁移至不同设备应用 依赖于任务数据,专家标注耗时耗力,自适应性差 •依托大模型预训练阶段内生通用知识,不再单独进行领域微调•基于Prompt策略注入领域专家对齐信息,快速灵活迁移 增强分析结果的可解释性、可交互性 •以思维链提示引擎激发大语言模型的领域文本分析能力和根因推理能力,在告警日志纷杂的信息中梳理思维链逻辑,AI模型端到端生成事件分析总结,快速判断漏报、误报,找出根因。•根据用户需求描述,以多轮对话的方式灵活地提供告警查询、定位、分析服务。 智慧有限,可解释性差,直接输出告警结论,无法实现告警事件分析 LLM作为运维智慧体的潜力与挑战:大模型有强语言泛化与解释能力,但是对Prompt敏感 In our preliminary experiments,ChatGPT with the simple promptachieved an F1-score of only 0.189 inanomaly detection. However, our bestprompt strategy outperformed thesimple prompt by 0.195 in F1-score. Large language models (LLMs) have powerful generalizationability tounseen user instructions(Gap 1), and may also beable to handle unseen logs in the online situation of loganalysis. Since log analysis is a domain-specific and non-general NLP task, directlyapplying a simple prompt to LLMs can result in poor performance. Unlike existing deep learningmodels, LLMs (such asChatGPT) has strong languagegenerating ability and canhandling complex writingtasks(Gap 2)likeemail,report, etc. Log interpretationcan be seen as a domainwriting task. The primary objective of LogPromptis to enhance the correctness andinterpretability of log analysis in theonline scenario, through exploring aproper strategy of prompting LLMs. There are many prompt philosophiesproposed in NLP tasks, such as CoT, ToT, etc. 引入chain-of-thought (CoT) prompt策略可以激发LLM解决日志分析挑战的能力 The concept of chain of thought (CoT), a series ofintermediate reasoning steps, is introduced by Wei etal.[1]. The CoT promptemulates the human thoughtprocessby requiring the model to include thinkingstepswhen addressing complex problems and canenhance the performance of LLMs in challenging tasks,such as solving mathematical problems. The CoT Prompt in the original CoT paperput an example with intermediate stepsbefore an input math problem. So that themodel is encouraged to follow the thinkingstyle in the example. Advantages of CoT prompt •Break down unseen problems into manageable steps(Gap 1)•Enhance interpretability and transparency of LLM output(Gap 2)•Unleash the learned abilities in the pre-training phase LogPrompt探索:将CoT prompt的思想引入日志分析任务 In manual log analysis, practitioners also engage in a series of reasoning steps to reach a conclusion. For instance, withoutfurther definitions, the boundary between a normal log and anabnormal log is unclear.To emulate the thinking process of O&M engineers, we propose two variants ofCoTprompt for log analysis: •ExplicitCoT:We further explicitly define intermediate steps to regulate the thinking process. For example, in the task of anomaly detection,we constrain the definition of anomaly to beonly “alerts explicitly expressed in textual content” and define four steps for anomaly detection. pIn-context Prompt: This approach uses severalLogPrompt探索:其他可以在日志分析任务中应用的Prompt策略 samples of labeled logs to set the context for the task.The LLM then predicts on new logs, using the contextfrom the sample logs. pSelf-prompt:this strategy involves the LLM suggesting itsown prompts. A meta-prompt describing the task asks theLLM to generate prompt prefix candidates. pFormat Control:We employ two functions,𝑓x([X])and𝑓Z([




