行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

当前的法学硕士如何有效地分析宏观金融问题？

2026-02-27 国际货币基金组织 Billy

本文实证评估了当前大型语言模型（LLM）分析IMF Article IV工作人员报告中宏观金融问题的能力，以人类经济学家的评估作为基准。

核心观点：

某些GPT模型在分析IMF工作人员报告中宏观金融问题方面表现出有用但有限的准确性，尤其是在结构化、基于事实的问题上。
最新模型在2024年对评级问题的平均准确率为71-75%，对二元问题的平均精确匹配率为76-81%。
LLM倾向于比人类专家分配更高的、更集中的评级，并且在需要深度上下文判断的开放式问题上表现不佳。

关键数据：

GPT-o1、GPT-4.1和GPT-5等高级模型在评级问题上的准确率高于GPT-4o。
二元问题的精确匹配率在2024年达到80.6%（GPT-o1）和76.3%（GPT-4.1）。
LLM在重复测试中表现出高度的一致性，二元问题的平均匹配率为88%，评级问题的平均准确率为83%。

研究结论：

LLM可能能够复制部分手动审查过程，为宏观金融分析提供效率提升。
LLM的乐观倾向可能导致对报告质量的误判，需要人类监督。
LLM在事实性、二元问题上表现较好，但在开放式问题和深度推理方面存在局限。
未来研究可探索改进提示设计、上下文基准测试和扩展数据样本，以提升LLM的分析能力。
LLM可作为人类审查者的辅助工具，而非替代品，通过人机协作提高效率和一致性。

How Effectively CanCurrent LLMs Analyze Paola Ganum and Tohid Atashbar WP/26/35 IMF Working Papersdescribe research inprogress by the author(s) and are published toelicit comments and to encourage debate.The views expressed in IMF Working Papers are 2026FEB IMF Working Paper Strategy, Policy, and Review How Effectively CanCurrentLLMs AnalyzeMacrofinancial Issues?Prepared byPaola Ganum and Tohid Atashbar* Authorized for distribution byEugenio Cerutti IMF Working Papersdescribe research in progress by the author(s) and are published to elicitcomments and to encourage debate.The views expressed in IMF Working Papers are those of theauthor(s) and do not necessarily represent the views of the IMF, its Executive Board, or IMF management. ABSTRACT:This paper empirically evaluates the ability of current Large Language Models (LLMs) to analyzemacrofinancial coverage in IMF Article IV staff reports, using human economists' assessments as abenchmark. We test several GPT models on reports from 2016-2024, assessing their performance on bothqualitative ratings and binary questions. Our findings indicate that the latest models can meaningfully assisteconomists, achieving an average accuracy of 71-75% on ratings and an average exact match rate of 76-81% RECOMMENDED CITATION:Ganum, Paola, and Tohid Atashbar. 2026. "How Effectively Can Current LLMsAnalyze Macrofinancial Issues?" IMF Working Paper WP/26/35, International Monetary Fund, Washington, WORKING PAPERS How Effectively CanCurrentLLMsAnalyzeMacrofinancial Issues? Prepared byPaola GanumandTohid Atashbar Contents Section I.Introduction..........................................................................................................................................4Section II.Related Literature................................................................................................................................5Section III.Data, GPT Models, and Methodology................................................................................................8 Annex I.Additional Tables and Figures..............................................................................................................25 References.........................................................................................................................................................32 Section I. Introduction The rapid development of Generative Artificial Intelligence (gen-AI) tools, and in particular the public release ofLarge Language Models (LLMs)exemplified by ChatGPT, Claude, and Gemini,have marked an inflectionpoint. This technology promises to be productivity-enhancing, particularly for occupations where it can becomplementaryto human work(IMF 2024).These LLM models offerunparalleledabilitiesto helpeconomic In this paper,we empirically evaluate the ability ofcurrentGenerative Pre-trained Transformer (GPT)models toreviewthe coverage of macrofinancial issues in Article IV staff reports, using humaneconomists’ assessmentas a benchmark.Since the Global Financial Crisis, theIMFhas taken substantial steps in strengtheningmacrofinancial surveillance and analysis (IMF 2017)1.In this paper, we would like to explore how welldifferent While the LLMs are effective at textual tasks, analyzing macrofinancialcoveragein staff reports requirestechnical knowledge,reasoning, andjudgement, something thehuman mindwith the appropriate technical Our research involvesfeedingstaff reportsone at a timein a standardized pdf format toan LLMand asking themodel to answera set ofquestionson it and producing an excel file for each reportindividuallywith itsanswers.We tried different GPT models, starting with the GPT-4o, we then moved on to more advancedmodels like GPT-4.1(we also tried GPT-4.1-mini),GPT-o1, and GPT-5(medium and high effort).Our primaryinterest is to understand the LLM’s ability to answer questionsand assign ratingswhilecomparingitto the In pursuing this evaluation, this paper generates a novel dataset of LLM answers to explore how well the LLMcanperformthe review of macrofinancialcoveragein staff reports. We will call “accuracy”(see section III foritsdefinition)the LLM performance relative to the human. To our knowledge, this represents a novel systematic A key observation from this paper is that certain GPT models appear to achieveuseful-but still limited-levelsof accuracy in analyzing IMF staff reports, particularly on structured, fact-based questions.We found thatGPT-o12,GPT-4.1,and GPT-5offer an improvement over GPT-4o, with improved justifications and moreagreementwith economists’assessments.3Ourfindingsindicatethat the LLM can meaningfully assist economists in 2This model is better at reasoning problems. SeeIntroducing OpenAI o1 | OpenAI3GPT-4.1-mini wasalsotested for 2024 staff reports, but results resembled those of GPT-4o, comparing unfavorably to those ofthe more advanced models.4As defined in Section III, accuracy measures the proportion ofcorrect pre

点击免费查看完整报告

你可能感兴趣

当前的法学硕士如何有效地分析宏观金融问题？

你可能感兴趣

当前的法学硕士如何有效地分析宏观金融问题？（英）

宏观观察：当前我国金融运行形势、存在的问题及对策建

宏观周观点：如何看待金融委会议对当前信用事件的定调？

跨境支付购买指南：8个问题帮助您选择合适的供应商并有效地进行跨境交易

解决极为罕见和儿科发病疾病中未满足的需求 - 生物制药创新模式如何帮助识别当前问题并找到潜在解决方案