AI智能总结
Policy Research Working Paper Does LLM Assistance Improve HealthcareDelivery? An Evaluation Using On-Site Physiciansand Laboratory Tests Jason AbaluckRobert PlessNirmal RaviAnja SautmannAaron Schwartz Policy Research Working Paper11298 Abstract In a selected sample, retrospective review by academicphysicians also suggested improvements in care related tolong-term risk management. However, the three metricsshow mixed effects of LLM-assistance, with on averageno significant improvement in diagnostic alignment withphysicians, detection rates for the tested conditions, or phy-sician subjective assessments. Health workers follow LLM This study tests the effects of large language model (LLM)decision support on patient care at two outpatient clinicsin Nigeria. Health workers were given the option to makerevisions to their initial care plan based on LLM feedback.The unassisted and assisted plans are evaluated using (1)comparisons with independent care plans created by on-sitephysicians, (2) laboratory tests for malaria, anemia, and This paper is a product of the Development Research Group, Development Economics. It is part of a larger effort by theWorld Bank to provide open access to its research and make a contribution to development policy discussions aroundthe world. Policy Research Working Papers are also posted on the Web at http://www.worldbank.org/prwp. The authors The Policy Research Working Paper Series disseminates the findings of work in progress to encourage the exchange of ideas about developmentissues. An objective of the series is to get the findings out quickly, even if the presentations are less than fully polished. The papers carry thenames of the authors and should be cited accordingly. The findings, interpretations, and conclusions expressed in this paper are entirely those DoesLLMAssistanceImproveHealthcareDelivery? AnEvaluationUsingOn-SitePhysiciansandLaboratoryTests JasonAbaluck,†RobertPless,‡NirmalRavi,§AnjaSautmann,¶AaronSchwartz‖ JEL: I10, O12, O15; keywords: LLM, primary care, health care quality 1Introduction Large language models (LLMs) have the potential to improve health provider decision-making withoutrequiring substantial infrastructure to train and deploy (Pressman et al., 2024). LLMs have shown diagnosticperformance comparable to general physicians on written tests and in simulated patient encounters such asvignettes or model patients (Takita et al., 2025; Huang et al., 2024; Tu et al., 2025). A smaller number ofstudies use real patient information, and very few test LLM decision support in a realistic clinical environment et al., 2024). In both, the evaluation of quality of care with and without LLM is done via retrospective chart We build on the existing literature by evaluating the (subjective and objective) alignment of LLM-assisted care with care provided by higher-level providers, as well as with medical test results. We evaluatea prototype intervention in two outpatient clinics in Kano, Nigeria, in which an LLM gives health workersan instant “second opinion” on their care plans.We purposely did not modify the LLM beyond prompt Our design allows us to compare unassisted and LLM-assisted care plans through three complementarylenses: (1) concordance with independent care plans created by on-site physicians, (2) agreement of testingand treatment decisions with laboratory test results for the three most commonly tested conditions (malaria, In any of our metrics of care plan quality, health workers’ care plansprior to LLM assistance showsubstantial deficits.LLM assistance led health workers to meaningfully revise diagnoses (41% of notes), test ordering (33% of notes), and prescribing decisions (54% of notes).The health workers themselvesoverwhelmingly reported that they found the LLM feedback helpful. However, the on-site physicians did notevaluate the assisted care plans more positively, and the assisted plans did not objectively resemble physician Three academic physicians (MDs) who teach in health worker degree programs retrospectively reviewedthe case records of a selected subsample with high physician-assessed patient harms in the unassisted careplan. For these cases, the academic reviewers rated the LLM-assisted notes relatively more favorably than When analyzing the LLM feedback itself, we find that the LLM makes 3.75 recommendations per patient,and these recommendations were about equally likely to increase or decrease alignment with physician testingdecisions and prescriptions, although they better aligned with physician behavioral advice. Health workers In summary, our results show that LLM assistance is welcomed and accepted by frontline health workers,who make significant changes to their care plans in response. We also find some indication of improvementsin care from retrospective chart reviews (supplemented by access to the physician’s patient record and careplan) by academic physicians, possibly driven by better care for chronic degenerative