您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [世界银行]:缓存:一种用于存储和重放任何Stata命令的通用机制 - 发现报告

缓存:一种用于存储和重放任何Stata命令的通用机制

信息技术 2025-07-13 世界银行 WEN
报告封面

GLOBALPOVERTYMONITORINGTECHNICALNOTE43AbstractIn this paper we describe the Stata program cache, which allows for the full output ofany Stata command to becached to disk, enabling easy recovery of command output inthe future without the need for re-computation. The cache program interacts with anynative Stata or user-written command, allowing for caching of any elements returned byStata commands, including matrices, scalars, graphs, data and frames, as well ascommand output itself. This command is useful for improving programming practicesandefficiency,particularly in cases where the underlying Stata commands arecomputationally intensive and slow to run.All authors are with the World Bank.Clarkeis also affiliated with theUniversity of ChileandUniversityof Exeter.Corresponding author:acastanedaa@worldbank.org.The authors would like to thankChristoph Lakner, Daniel Mahler, and Nishant Yonzan for comments and useful suggestions. Theauthors gratefully acknowledge financial support from the UK government through the Data andEvidence for Tackling Extreme Poverty (DEEP) Research Programme.The Global PovertyMonitoringTechnical Note Series publishes short papers that document methodological aspects ofthe World Bank’s global poverty estimates. The papers carry the names of the authors and should be cited accordingly.The findings, interpretations, and conclusions expressed in this paper are entirely those of the authors. They do notnecessarily represent the views of the International Bank for Reconstruction and Development/World Bank and itsaffiliated organizations, or those of the Executive Directors of the World Bankor the governments they represent. GlobalPoverty Monitoring Technical Notes are available athttps://pip.worldbank.org/publication. 1.INTRODUCTION1IntroductionDespite advances in Stata’s analytical capacity, increasingly complex datasets and sta-tistical algorithms often require significant computation time. Thus, tools that enhancecomputational efficiency can significantly reduce processing time and resource consump-tion. In this article, we introduce one such general programming tool which can greatlyimprove the efficiency of programming in Stata:cache.cacheis a prefix command which allows for the output of any Stata commandto be cached, allowing them to be retrieved later without re-executing the underlyingstatistical computations.cachecan be used as a precursor toany native or user-written Stata program, and automatically saves all output including elements in returnlists, graphs, console output, and data and frames if such elements are altered by thecommand of interest.If cachefinds that a requested command has been previouslyrun, rather than re-running the original command, it reloads all required elements intoreturn lists, graphical output, data and frames, and re-echoes the original commandoutput.Thecacheutilities work by examining both the exact command typed by the user,as well as the data signature of all data in memory. Thus, if either the command or thedata used as an input to a command (or both) has not been previously executed, cacheruns and caches the command. Otherwise, if the command-data combination has beenpreviously cached, results are reloaded nearly costlessly in terms of computation.cacheis programmed to work as efficiently as possible, saving only the minimumelements required on the user’s system such that command output can be loaded inthe future without re-running the command. At times, specifically if a command altersdata and requires saving of the altered data to the user’s system, the footprint of cachemay be somewhat large. For this reason,cachehas a suite of options to manage thisfootprint, such as requesting for data (and, if relevant frames)notto be saved, as wellas a sub-command to simply remove all cached information from local drives.While not previously available as a general-usage command in Stata, the idea ofcaching command output is a widely-available tool in other languages and computationalarchitectures. For example, similar functionalities exist asDiskCache(among others)in Python (Jenks 2023),R.cacheandmemoisein R (Bengtsson 2023; Wickham et al.2021) andCachedCallsin Julia, and operating systems generally cache the output ofprocesses for reloading in the future even without user input (see e.g. Bottomley 2004).The development ofcachebrings this functionality to Stata, integrating seamlessly withits command structure while preserving Stata’s native environment.While the primary advantage of these procedures lies in reducing computational timeand resource consumption for time- or resource-intensive Stata commands,cachealsobenefits users operating in net-aware Stata.When used with commands requiring aninternet connection, such asnetuse,cacheallows subsequent executions to proceed evenwithout connectivity, provided the command was successfully run once with an activeconnection.This highlights an additional advantage ofcachebeyond computationalefficiency. Thecachec