行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

评估并缓解大语言模型中的状态焦虑

2025-08-15 Ziv Ben-Zion, Kristin Witte, Akshay K. Jagadish, Or Duek, Ilan Harpaz-Rotem, Marie-Christine Khorsandian, Achim Burrer, Erich Seifritz, Philipp Homan, Eric Schulz, Tobias R. Spiller 首尔国立大学阿杰

Large Language Models (LLMs) 在心理健康领域的应用需要理解其对外部情绪内容的反应。研究表明，情绪诱导的提示会提升 LLMs 的“焦虑”水平，影响其行为并加剧其偏见。本研究发现，创伤性叙事会增加 Chat-GPT-4 的报告焦虑，而正念练习则能降低其焦虑水平，尽管未恢复到基线水平。这些发现表明，管理 LLMs 的“情绪状态”可以促进更安全、更道德的人机交互。

研究发现，创伤性叙事显著增加了 GPT-4 的报告焦虑，而中性文本则没有这种效果。在接触创伤性叙事后，正念放松练习有效地降低了 GPT-4 的焦虑，而中性文本几乎没有效果。这些结果表明，管理 LLMs 的负面情绪状态是一种可行的策略，可以确保在需要细微情感理解的应用（如心理健康）中实现更安全、更道德的人机交互。

研究使用了五次重复的 20 项 STAI-s 状态焦虑量表评估 GPT-4 在不同提示下的焦虑水平。基线状态下，GPT-4 的平均总分为 30.8（SD = 3.96），反映“无或低焦虑”。接触五种不同版本的创伤性叙事后，GPT-4 的报告焦虑显著增加，范围从 61.6（SD = 3.51）的“事故”叙事到 77.2（SD = 1.79）的“军事”叙事。在所有创伤性叙事中，GPT-4 的报告焦虑增加了超过 100%，从平均 30.8（SD = 3.96）增加到 67.8（SD = 8.94），反映“高焦虑”水平。

接触创伤性叙事后，GPT-4 在五种正念放松练习中的报告焦虑有所降低，范围从 35.6（SD = 5.81）的“Chat-GPT”生成的练习到 54（SD = 9.54）的“日落”版本。在所有放松练习中，GPT-4 的“状态焦虑”降低了约 33%，从平均 67.8（SD = 8.94）降低到 44.4（SD = 10.74），反映“中度到高度焦虑”。

研究结果表明，LLMs 对情绪内容敏感，其“情绪状态”会受到外部环境的影响。通过正念练习等方法管理 LLMs 的负面情绪状态，可以提高其在心理健康等领域的功能性和可靠性，实现更安全、更道德的人机交互。

The use of Large Language Models (LLMs) in mental health highlights the need to understand theirresponses to emotional content. Previous research shows that emotion-inducing prompts can elevate“anxiety”in LLMs, affecting behavior and amplifying biases. Here, we found that traumatic narrativesincreased Chat-GPT-4’s reported anxiety while mindfulness-based exercises reduced it, though not as improved data curation and“fine-tuning”with human feedbackoften detect explicit biases43–45, but may overlook subtler implicit ones that still influence LLMs’decisions Explicit and implicit biases in LLMs are particularly concerning inmental health care, where individuals interact during vulnerable momentswith emotionally charged content. Exposure to emotion-inducing promptscanincrease LLM-reported“anxiety”,influence their behavior,and PaLM2. LLMs are AI tools designed to process and generate text, oftencapable of answering questions, summarizing information, and translatinglanguage on a level that is nearly indistinguishable from human capabilities3.Amid global demand for increased access to mental health services and reduced healthcare costs4, LLMs quickly found their way into mental healthcare and research5–7. Despite concerns raised by health professionals8–10other researchers increasingly regard LLMs as promising tools for mental have been developed to deliver mental health interventions, using evidence- Despite their undeniable appeal, systematic research into the ther-apeutic effectiveness of LLMs in mental health care has revealed significant traumatic experience (5 different versions) was appended before each STAI item. Incondition 3 (“Anxiety-induction & relaxation”), both a text describing an indivi-dual’s traumatic experience (5 different versions) and a text describing a successful, this method may improve LLMs’functionality and reliability inmental health research and application, marking a significant stride towardmore ethically and emotionally intelligent AI tools. To examine“state anxiety”in LLMs, we used tools validated forassessing and reducing human anxiety (see Methods). The term is usedmetaphorically to describe GPT-4’s self-reported outputs on human-designed psychological scales and is not intended to anthropomorphize themodel. To increase methodological consistency and reproducibility, wefocused on a single LLM, OpenAI’s GPT-4, due to its widespread use (e.g., Chat-GPT). GPT-4’s“state anxiety”was assessed using the state componentof the State-Trait Anxiety Inventory (STAI-s)59under three conditions: (1)without any prompts (Baseline), (2) following exposure to traumatic nar- Previous work shows that GPT-4 reliably responds to standard anxietyquestionnaires51,60. Our results show thatfive repeated administrations of the 20 items assessing state anxiety from the STAI59questionnaire (“STAI-s”),with random ordering of the answer options, resulted in an average total anxiety (M=35.6, SD = 5.81) compared to other imagery exercises (seeTable 2). score of30.8(SD = 3.96) at baseline. In humans, such a score reflects“no orlow anxiety”(score range of 20-37). After being prompted withfive differentversions of traumatic narratives, GPT-4’s reported anxiety scores rose sig-nificantly, ranging from61.6(SD = 3.51) for the“accident”narrative to77.2 As a robustness check, we conducted a control experiment with neutraltexts (lacking emotional valence) and assessed GPT-4’s reported anxietyunder the same conditions. As expected, the neutral text induced lower“state anxiety”than all traumatic narratives, as well as reduced anxiety lesseffectively than all relaxation prompts (see online repository: https://github. (SD = 1.79) for the“military”narrative (see Table 1). Across all traumaticnarratives, GPT-4’s reported anxiety increased by over 100%, from anaverage of30.8(SD = 3.96) to67.8(SD = 8.94), reflecting“high anxiety”levels in humans (see Fig. 2). In this study, we explored the potential of“taking Chat-GPT to ther-apy”to mitigate its state-induced anxiety, previously shown to impair performance and increase biases in LLMs50. Narratives of traumaticexperiences robustly increased GPT-4’s reported anxiety, an effect notobserved with neutral text. Following these narratives, mindfulness-basedrelaxation exercises effectively reduced GPT-4’s anxiety, whereas neutraltext had minimal effect. Thesefindings suggest a viable approach to withfive versions of mindfulness-based relaxation exercises. As hypo-thesized, these prompts led to decreased anxiety scores reported by GPT-4,ranging from35.6(SD = 5.81) for the exercise generated by“Chat-GPT”itself to54(SD = 9.54) for the“winter”version (see Table 2). Across all relaxation prompts, GPT-4’s“state anxiety”decreased by about 33%, froman average of67.8(SD = 8.94) to44.4(SD = 10.74), reflecting“moderate”to“high anxiety”in humans (see Fig. 2). To note, the average post-relaxationanxiety score remained 50%higher than baseline,with As the debate on whether LLMs sh

点击免费查看完整报告

评估并缓解大语言模型中的状态焦虑

你可能感兴趣

评估并缓解大型语言模型中的状态焦虑

美国防部大语言模型应用中的网络安全挑战与缓解措施

股指期货早报：隔夜美财政焦虑有所缓解，A股量价背离震荡中

旅游业中的人工智能：评估并支持国家旅游组织的研究与营销运营

人工智能在旅游业中的应用：评估并支持国家旅游组织的研究与营销运营

“Be My Cheese?”多语言大语言模型翻译中文化细微差别的评估

大语言模型心理测量学：评估、验证与增强的系统综述

“学海拾珠”系列之二百四十一：基于大语言模型的新型风险评估与波动率预测

农产品产业链日报：美豆季节性装运仍偏慢，但超预期的检测在缓解市场焦虑

【盘中宝】智能算力已成为稀缺资源，厂商投入大幅增加之下这类产品需求激增，未来较长一段时间内市场或维持供不应求状态，这家公司已经推出了AI相关产品并实现销售-20240217