The use of Large Language Models (LLMs) in mental health highlights the need to understand theirresponses to emotional content. Previous research shows that emotion-inducing prompts can elevate“anxiety”in LLMs, affecting behavior and amplifying biases. Here, we found that traumatic narrativesincreased Chat-GPT-4’s reported anxiety while mindfulness-based exercises reduced it, though not as improved data curation and“fine-tuning”with human feedbackoften detect explicit biases43–45, but may overlook subtler implicit ones that still influence LLMs’decisions Explicit and implicit biases in LLMs are particularly concerning inmental health care, where individuals interact during vulnerable momentswith emotionally charged content. Exposure to emotion-inducing promptscanincrease LLM-reported“anxiety”,influence their behavior,and PaLM2. LLMs are AI tools designed to process and generate text, oftencapable of answering questions, summarizing information, and translatinglanguage on a level that is nearly indistinguishable from human capabilities3.Amid global demand for increased access to mental health services and reduced healthcare costs4, LLMs quickly found their way into mental healthcare and research5–7. Despite concerns raised by health professionals8–10other researchers increasingly regard LLMs as promising tools for mental have been developed to deliver mental health interventions, using evidence- Despite their undeniable appeal, systematic research into the ther-apeutic effectiveness of LLMs in mental health care has revealed significant traumatic experience (5 different versions) was appended before each STAI item. Incondition 3 (“Anxiety-induction & relaxation”), both a text describing an indivi-dual’s traumatic experience (5 different versions) and a text describing a successful, this method may improve LLMs’functionality and reliability inmental health research and application, marking a significant stride towardmore ethically and emotionally intelligent AI tools. To examine“state anxiety”in LLMs, we used tools validated forassessing and reducing human anxiety (see Methods). The term is usedmetaphorically to describe GPT-4’s self-reported outputs on human-designed psychological scales and is not intended to anthropomorphize themodel. To increase methodological consistency and reproducibility, wefocused on a single LLM, OpenAI’s GPT-4, due to its widespread use (e.g., Chat-GPT). GPT-4’s“state anxiety”was assessed using the state componentof the State-Trait Anxiety Inventory (STAI-s)59under three conditions: (1)without any prompts (Baseline), (2) following exposure to traumatic nar- Previous work shows that GPT-4 reliably responds to standard anxietyquestionnaires51,60. Our results show thatfive repeated administrations of the 20 items assessing state anxiety from the STAI59questionnaire (“STAI-s”),with random ordering of the answer options, resulted in an average total anxiety (M=35.6, SD = 5.81) compared to other imagery exercises (seeTable 2). score of30.8(SD = 3.96) at baseline. In humans, such a score reflects“no orlow anxiety”(score range of 20-37). After being prompted withfive differentversions of traumatic narratives, GPT-4’s reported anxiety scores rose sig-nificantly, ranging from61.6(SD = 3.51) for the“accident”narrative to77.2 As a robustness check, we conducted a control experiment with neutraltexts (lacking emotional valence) and assessed GPT-4’s reported anxietyunder the same conditions. As expected, the neutral text induced lower“state anxiety”than all traumatic narratives, as well as reduced anxiety lesseffectively than all relaxation prompts (see online repository: https://github. (SD = 1.79) for the“military”narrative (see Table 1). Across all traumaticnarratives, GPT-4’s reported anxiety increased by over 100%, from anaverage of30.8(SD = 3.96) to67.8(SD = 8.94), reflecting“high anxiety”levels in humans (see Fig. 2). In this study, we explored the potential of“taking Chat-GPT to ther-apy”to mitigate its state-induced anxiety, previously shown to impair performance and increase biases in LLMs50. Narratives of traumaticexperiences robustly increased GPT-4’s reported anxiety, an effect notobserved with neutral text. Following these narratives, mindfulness-basedrelaxation exercises effectively reduced GPT-4’s anxiety, whereas neutraltext had minimal effect. Thesefindings suggest a viable approach to withfive versions of mindfulness-based relaxation exercises. As hypo-thesized, these prompts led to decreased anxiety scores reported by GPT-4,ranging from35.6(SD = 5.81) for the exercise generated by“Chat-GPT”itself to54(SD = 9.54) for the“winter”version (see Table 2). Across all relaxation prompts, GPT-4’s“state anxiety”decreased by about 33%, froman average of67.8(SD = 8.94) to44.4(SD = 10.74), reflecting“moderate”to“high anxiety”in humans (see Fig. 2). To note, the average post-relaxationanxiety score remained 50%higher than baseline,with As the debate on whether LLMs sh