您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [谷歌深度思维&谷歌研究院]:一款助力科学家编写专家级实证软件的人工智能系统 - 发现报告

一款助力科学家编写专家级实证软件的人工智能系统

报告封面

Eser Aygün1,*, Anastasiya Belyaeva2,*, Gheorghe Comanici1,*, Marc Coram2,*, Hao Cui2,*, Jake Garrison3,*,Renee Johnston2,*, Anton Kast2,*, Cory Y. McLean2,*, Peter Norgaard2,*, Zahra Shamsi2,*, David Smalling1,*,James Thompson2,*, Subhashini Venugopalan2,*, Brian P. Williams2,*, Chujun He2,4,**, Sarah Martinson2,5,**,Martyna Plomecka2,6,**, Lai Wei2, Yuchen Zhou2, Qian-Ze Zhu2,5,**, Matthew Abraham2, Erica Brand2, AnnaBulanova1, Jeffrey A. Cardille2,7, Chris Co2, Scott Ellsworth2, Grace Joseph2, Malcolm Kane2, RyanKrueger2,5,**, Johan Kartiwa2, Dan Liebling2, Jan-Matthis Lueckmann2, Paul Raccuglia2, Xuefei (Julie)Wang2,8,**, Katherine Chou2, James Manyika2, Yossi Matias2, John C. Platt2, Lizzie Dorfman2, Shibl Mourad1,‡and Michael P. Brenner2,5,‡ 1Google DeepMind,2Google Research,3Google Platforms and Devices,4Massachusetts Institute of Technology,5School ofEngineering and Applied Sciences, Harvard University,6Google Cloud,7Faculty of Agricultural and Environmental Sciences, McGillUniversity,8California Institute of Technology The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software tosupport computational experiments. To address this, we present an AI system that creates expert-levelscientific software whose goal is to maximize a quality metric. The system uses a Large Language Model(LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate thelarge space of possible solutions. The system achieves expert-level results when it explores and integratescomplex research ideas from external sources. The effectiveness of tree search is demonstrated across awide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysisthat outperformed the top human-developed methods on a public leaderboard. In epidemiology, itgenerated 14 models that outperformed the CDC ensemble and all other individual models for forecastingCOVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis,neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. Bydevising and implementing novel solutions to diverse tasks, the system represents a significant steptowards accelerating scientific progress. Keywords: Tree Search, Generative AI, Scorable Scientific Tasks, Empirical Software Introduction Scientists need diverse information to advance their scientific agendas. Some are simple questions forwhich perfunctory answers can be fulfilled by a search engine. However, performing computationalexperiments often demands deeper information. For example, one of the authors’ research involvesdeforestation analyses, assessing land cover change1using global spatially-resolved measurements,past and present. This is carried out using a satellite-based deforestation detector, built with codeto answer a scientific question. A deforestation detector is one of many thousands of examples ofempirical softwarein science. We use the term empirical software to mean software that is designedto maximize a definable or measurable quality score, typically a fit to existing observations. If a taskcan be solved with empirical software, we call this ascorable task. We have two hypotheses about the scorable tasks and empirical software in science. First,scorabletasks are ubiquitous in science.Almost every sub-field of science, applied mathematics, and engineeringnow relies on software. In the combined experience of the authors, we have found that much of thissoftware is empirical software solving a scorable task. Often such empirical software is at the heart of a scientist’s work. Empirical software has recently enabled a number of Nobel Prizes in Chemistry: in1998 for Density Functional Theory2,3, in 2013 for molecular dynamics simulation4and in 2024 forprotein structure prediction5,6. Empirical software underlies our ability to create models of complexsystems, ranging from parameterizations of a vertical column of the earth’s atmosphere for weathermodeling7, to the parameterization of stress response in a turbulent fluid flow8, to the prediction ofsocial systems9–11. Second,empirical software for science is slow and difficult to create. Domain-specific empiricalsoftware requires tedious work, often over many years. When empirical software is used to testcomplex hypotheses, it becomes ever more difficult to write purely from first principles. There usuallyis no systematic search for alternative approaches. Design choices are often governed by intuition orexpediency, rather than exhaustive experimentation. Creating the software is so time-consuming thatit severely limits the possibilities that can be productively explored. This paper presents an AI-based system that systematically and automatically creates empiricalsoftware to solve scorable tasks. Our method is based on an LLM that rewrites software to attempt toimprove its quality score. The system create