Diana Goldemberg1, Luke Jordan2,∗, Thomas Kenyon1 Abstract This paper applies novel techniques to long-standing questions of aid effectiveness.Itconstructs a new dataset using machine learning methods to encode aspects of develop-ment project documents that would be infeasible with manual methods. It then uses thatdataset to show that the strongest predictor of these projects’ contribution to develop-ment outcomes is not the self-evaluation ratings assigned by donors, but their degree ofadaptation to country context and that the largest differences between ratings and actualimpact occur in large projects in institutionally weak settings.It also finds suggestiveevidence that the content of ex post reviews of project effectiveness may predict sectoroutcomes, even if ratings do not. Keywords:aid effectiveness, machine learning, World Bank projects JEL Codes: O12, O15, O19 1. Introduction Many empirical studies have explored whether foreign aid effectively improves develop-ment outcomes in recipient countries. This research typically takes one of two approaches.The first looks at the macro-level effect of aid at the country level, assessing its impacton economic growth or sectoral outcomes. The second focuses on the micro-level effectof individual development projects or interventions, often using donors’ self-evaluationratings or randomized control trials to assess their success. In this study, we bridge theseapproaches by examining the relationship between donor-funded projects and measurabledevelopment outcomes at the sectoral level. We begin by asking whether project ratingsprovide a link between project-level and sector-level outcomes and find that they onlyprovide limited insight. We then turn to machine-learning methods, constructing a newdataset utilizing large language models and the texts of over a thousand World Bankprojects.We find that this text analysis produces measurable project features that dopredict sector outcomes. We first replicate previous findings of positive effects of aid on sector outcomes. Wethen use ratings from projects undertaken in 183 developing countries by eight donorssince the 1990s, concentrating on a few service delivery sectors with readily availabledata on beneficiary-level outcomes, introducing aggregates of the ratings as independentvariables to the sector specifications. For World Bank projects, for which more granulardata is available, we create what are called “text embeddings” of project documents usingrecent advances in machine learning models, turning texts into numerical representationsof their similarities and differences. These replicate expert human assessments of projectcharacteristics but at greater accuracy and with far greater efficiency.The dataset weconstruct of the embeddings of project documents is likely to have additional uses and ispublicly available3. Its construction is explained in some detail in the Methods sectionbelow.With this dataset, we are able to quantify the degree to which a project’s coredescription differs from others in its sector and country, deriving a new measure of project “contextualisation”. We then use non-linear methods to predict projects’ sector outcomes, and probe whatfeatures of the projects the model paid most attention to.We find that projects withwhat appear to be high degrees of tailoring to country context and concentration offunds in fewer sectors are associated with stronger outcomes. To our knowledge, this isthe first attempt to quantify the importance of project contextualization to developmenteffectiveness. Our findings have actionable implications for the system through which theWorld Bank and other development institutions evaluate project performance, as well asimplications for the design and staffing of these projects. 2. Literature and Theory 2.1. Development Effectiveness Aid effectiveness has been evaluated at several levels (see Table 1).At the mostaggregate level, cross-country studies have focused on the volume of aid as input andeconomic growth as the outcome, with institutional quality and political environment asother explanatory variables. A second approach has examined the relationship betweenaid and outcomes in sectors such as education, health, water and sanitation.A thirdhas concentrated on the self-evaluated outcomes, or ratings, of donor-financed projects,typically those of multilateral development banks. At the lowest level, a large literaturehas used randomized control trials to evaluate the impact of interventions and projectcomponents. The focus of this paper is at the sector and project level and the relationshipbetween them. [insert Table 1 here] The project-level approach has burgeoned recently, examining the relationship be-tween project characteristics and country level features as independent variables, anddonors’ ratings of project outcomes, which are considered a noisy but valid measure ofproject performance (Denizer, Kaufmann, and Kraay 2013).Explanatory factors for