您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[国际货币基金组织]:当前的法学硕士如何有效地分析宏观金融问题? - 发现报告

当前的法学硕士如何有效地分析宏观金融问题?

当前的法学硕士如何有效地分析宏观金融问题?

How Effectively CanCurrent LLMs AnalyzeMacrofinancial Issues? Paola Ganum and Tohid Atashbar WP/26/35 IMF Working Papersdescribe research inprogress by the author(s) and are published toelicit comments and to encourage debate.The views expressed in IMF Working Papers arethose of the author(s) and do not necessarilyrepresent the views of the IMF, its Executive Board,or IMF management. 2026FEB IMF Working Paper Strategy, Policy, and Review How Effectively CanCurrentLLMs AnalyzeMacrofinancial Issues?Prepared byPaola Ganum and Tohid Atashbar* Authorized for distribution byEugenio CeruttiFebruary2026 IMF Working Papersdescribe research in progress by the author(s) and are published to elicitcomments and to encourage debate.The views expressed in IMF Working Papers are those of theauthor(s) and do not necessarily represent the views of the IMF, its Executive Board, or IMF management. ABSTRACT:This paper empirically evaluates the ability of current Large Language Models (LLMs) to analyzemacrofinancial coverage in IMF Article IV staff reports, using human economists' assessments as abenchmark. We test several GPT models on reports from 2016-2024, assessing their performance on bothqualitative ratings and binary questions. Our findings indicate that the latest models can meaningfully assisteconomists, achieving an average accuracy of 71-75% on ratings and an average exact match rate of 76-81%on binary questions in 2024 across advanced GPT models. However, we find that LLMs tend to assign higher,less-dispersed ratings than human experts and struggle with open-ended questions that require deepcontextual judgment. The paper provides quantitative evidence on current LLM accuracy in this domain,explores the drivers of its performance, and discusses key limitations such as optimistic bias. RECOMMENDED CITATION:Ganum, Paola, and Tohid Atashbar. 2026. "How Effectively Can Current LLMsAnalyze Macrofinancial Issues?" IMF Working Paper WP/26/35, International Monetary Fund, Washington,D.C. How Effectively CanCurrentLLMsAnalyzeMacrofinancial Issues? Prepared byPaola GanumandTohid Atashbar1 Contents Section I.Introduction..........................................................................................................................................4Section II.Related Literature................................................................................................................................5Section III.Data, GPT Models, and Methodology................................................................................................8Section IV.Empirical Results across GPT Models............................................................................................12Section V.Discussion of Results and LLM issues.............................................................................................19Section VI.Conclusions.....................................................................................................................................23 References.........................................................................................................................................................32 Section I. Introduction The rapid development of Generative Artificial Intelligence (gen-AI) tools, and in particular the public release ofLarge Language Models (LLMs)exemplified by ChatGPT, Claude, and Gemini,have marked an inflectionpoint. This technology promises to be productivity-enhancing, particularly for occupations where it can becomplementaryto human work(IMF 2024).These LLM models offerunparalleledabilitiesto helpeconomicresearchbyconducting literature reviews, analyzing data, writing code, and preparingmanuscripts (Korinek,2023). In this paper,we empirically evaluate the ability ofcurrentGenerative Pre-trained Transformer (GPT)models toreviewthe coverage of macrofinancial issues in Article IV staff reports, using humaneconomists’ assessmentas a benchmark.Since the Global Financial Crisis, theIMFhas taken substantial steps in strengtheningmacrofinancial surveillance and analysis (IMF 2017)1.In this paper, we would like to explore how welldifferentLLMscould reviewmacrofinancialcoveragein Article IV staffreports if put to the task. While the LLMs are effective at textual tasks, analyzing macrofinancialcoveragein staff reports requirestechnical knowledge,reasoning, andjudgement, something thehuman mindwith the appropriate technicalexpertisecan dobutit may be more challenging forLLMsto achieve. Our research involvesfeedingstaff reportsone at a timein a standardized pdf format toan LLMand asking themodel to answera set ofquestionson it and producing an excel file for each reportindividuallywith itsanswers.We tried different GPT models, starting with the GPT-4o, we then moved on to more advancedmodels like GPT-4.1(we also tried GPT-4.1-mini),GPT-o1, and GPT-5(medium and high effort).Our primaryinterest is to understand the LLM’s ability to answer questionsand ass