您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [国际货币基金组织]:当前的法学硕士如何有效地分析宏观金融问题? - 发现报告

当前的法学硕士如何有效地分析宏观金融问题?

2026-02-27 国际货币基金组织 Billy
报告封面

How Effectively CanCurrent LLMs Analyze Paola Ganum and Tohid Atashbar WP/26/35 IMF Working Papersdescribe research inprogress by the author(s) and are published toelicit comments and to encourage debate.The views expressed in IMF Working Papers are 2026FEB IMF Working Paper Strategy, Policy, and Review How Effectively CanCurrentLLMs AnalyzeMacrofinancial Issues?Prepared byPaola Ganum and Tohid Atashbar* Authorized for distribution byEugenio Cerutti IMF Working Papersdescribe research in progress by the author(s) and are published to elicitcomments and to encourage debate.The views expressed in IMF Working Papers are those of theauthor(s) and do not necessarily represent the views of the IMF, its Executive Board, or IMF management. ABSTRACT:This paper empirically evaluates the ability of current Large Language Models (LLMs) to analyzemacrofinancial coverage in IMF Article IV staff reports, using human economists' assessments as abenchmark. We test several GPT models on reports from 2016-2024, assessing their performance on bothqualitative ratings and binary questions. Our findings indicate that the latest models can meaningfully assisteconomists, achieving an average accuracy of 71-75% on ratings and an average exact match rate of 76-81% RECOMMENDED CITATION:Ganum, Paola, and Tohid Atashbar. 2026. "How Effectively Can Current LLMsAnalyze Macrofinancial Issues?" IMF Working Paper WP/26/35, International Monetary Fund, Washington, WORKING PAPERS How Effectively CanCurrentLLMsAnalyzeMacrofinancial Issues? Prepared byPaola GanumandTohid Atashbar Contents Section I.Introduction..........................................................................................................................................4Section II.Related Literature................................................................................................................................5Section III.Data, GPT Models, and Methodology................................................................................................8 Annex I.Additional Tables and Figures..............................................................................................................25 References.........................................................................................................................................................32 Section I. Introduction The rapid development of Generative Artificial Intelligence (gen-AI) tools, and in particular the public release ofLarge Language Models (LLMs)exemplified by ChatGPT, Claude, and Gemini,have marked an inflectionpoint. This technology promises to be productivity-enhancing, particularly for occupations where it can becomplementaryto human work(IMF 2024).These LLM models offerunparalleledabilitiesto helpeconomic In this paper,we empirically evaluate the ability ofcurrentGenerative Pre-trained Transformer (GPT)models toreviewthe coverage of macrofinancial issues in Article IV staff reports, using humaneconomists’ assessmentas a benchmark.Since the Global Financial Crisis, theIMFhas taken substantial steps in strengtheningmacrofinancial surveillance and analysis (IMF 2017)1.In this paper, we would like to explore how welldifferent While the LLMs are effective at textual tasks, analyzing macrofinancialcoveragein staff reports requirestechnical knowledge,reasoning, andjudgement, something thehuman mindwith the appropriate technical Our research involvesfeedingstaff reportsone at a timein a standardized pdf format toan LLMand asking themodel to answera set ofquestionson it and producing an excel file for each reportindividuallywith itsanswers.We tried different GPT models, starting with the GPT-4o, we then moved on to more advancedmodels like GPT-4.1(we also tried GPT-4.1-mini),GPT-o1, and GPT-5(medium and high effort).Our primaryinterest is to understand the LLM’s ability to answer questionsand assign ratingswhilecomparingitto the In pursuing this evaluation, this paper generates a novel dataset of LLM answers to explore how well the LLMcanperformthe review of macrofinancialcoveragein staff reports. We will call “accuracy”(see section III foritsdefinition)the LLM performance relative to the human. To our knowledge, this represents a novel systematic A key observation from this paper is that certain GPT models appear to achieveuseful-but still limited-levelsof accuracy in analyzing IMF staff reports, particularly on structured, fact-based questions.We found thatGPT-o12,GPT-4.1,and GPT-5offer an improvement over GPT-4o, with improved justifications and moreagreementwith economists’assessments.3Ourfindingsindicatethat the LLM can meaningfully assist economists in 2This model is better at reasoning problems. SeeIntroducing OpenAI o1 | OpenAI3GPT-4.1-mini wasalsotested for 2024 staff reports, but results resembled those of GPT-4o, comparing unfavorably to those ofthe more advanced models.4As defined in Section III, accuracy measures the proportion ofcorrect pre