Jason Phang∗,Michael Lampe∗,Lama AhmadCathy Mengying Fang†,Auren R. LiuSamantha W.T. Chan†,Pat PataranutapornAbstract As AI chatbots see increased adoption and integration into everyday life, questions have beenraised about the potential impact of human-like or anthropomorphic AI on users. In this work,we investigate the extent to which interactions with ChatGPT (with a focus on Advanced VoiceMode) may impact users’ emotional well-being, behaviors and experiences through two parallelstudies. To study the affective use of AI chatbots, we perform large-scale automated analysis ofChatGPT platform usage in a privacy-preserving manner, analyzing over 4 million conversationsfor affective cues and surveying over 4,000 users on their perceptions of ChatGPT. To investigatewhether there is a relationship between model usage and emotional well-being, we conduct anInstitutional Review Board (IRB)-approved randomized controlled trial (RCT) on close to 1,000 1Introduction Over the past two years, the adoption of AI chat platforms has surged, driven by advancements inlarge language models (LLMs) and their increasing integration into everyday life. These platforms,such as OpenAI’s ChatGPT, Anthropic’s Claude, and Google’s Gemini, are designed as general-purpose tools for a wide variety of applications, including work, education, and entertainment. Recent work in AI safety has begun to raise issues that arise from these systems becomeincreasingly personal and personable (Cheng et al., 2024). In response, researchers have introducedthe concept ofsocioaffectivealignment–the idea that AI systems should not only meet static task- emerging evidence of social reward hacking, where an AI may exploit human social cues (e.g.,sycophancy, mirroring), to increase user preference ratings (Williams et al., 2024). In other words,while an emotionally engaging chatbot can provide support and companionship, there is a risk that While past studies have examined the impact of using such systems through the lens of affectivecomputing, parasocial relationships, and social psychology (Edwards and Stevens, 2024; Guingrichand Graziano, 2023), there has been comparatively less work on the influence of interacting withsuch systems on users’ well-being and behavioral patterns over time. Studying the impact of chatbotbehavior and usage on well-being is challenging due to the highly individualized and subjectivenature of human emotions, the diverse and evolving functionalities of chatbot technologies, and thelimited access to comprehensive, ethically obtained interaction data. For the purpose of this paper, This paper investigates whether and to what extent interactions on AI chat platforms shapeusers’ emotional well-being and behaviors through two complementary studies (Figure 1), eachoffering unique insights across a spectrum of real-world relevance and experimental control. First,we examine real-world usage patterns of ChatGPT users, leveraging large-scale data to captureboth aggregate trends and individual behaviors over time while preserving user privacy. Second, Concretely, we performed the following analyses: 1. On-Platform Data Analysis •Conversation Analysis: We perform roughly 36 million automated classifications on over 3 million ChatGPT conversations in a privacy preserving manner without humanreview of the underlying conversations (Section 3.2).•Individual Longitudinal Analysis: We assessed the aggregate usage of around 6,000heavy users of ChatGPT’s Advanced Voice Mode over 3 months to understand how their 2. Randomized Controlled Trial (RCT) •981-user Study: We conducted a randomized controlled trial on close to a thousandparticipants using ChatGPT with different model configurations over the course of 28 daysto understand the impact on socialization, problematic use, dependence, and lonelinessfrom usage of text and voice models over time. This RCT is described in full detail in a Our findings indicate the following: •Across both on-platform data analysis and our RCT, comparatively high-intensity usage(e.g.top decile) is associated with markers of emotional dependence and lower perceived socialization. This underscores the importance of focusing on specific user populations insteadof just aggregate platform behavior.•Across both on-platform data analysis and our RCT, we find that while the majority of userssampled for this analysis engage in relatively neutral or task-oriented ways, there exists a tail •We also find that automated classifiers, while imperfect, provide an efficient method forstudying affective use of models at scale, and its analysis of conversation patterns coheres Section 2 introduces a set of automatic classifiers for affective cues in conversations that willbe used in the remainder of the paper. Section 3 discusses our analysis of on-platform ChatGPTusage, focusing on Advanced Voice Mode and power users. Section 4 describes our RCT, wherewe varied both the model and usage instructions to participants a