行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

一款助力科学家编写专家级实证软件的人工智能系统

2025-09-09 谷歌深度思维&谷歌研究院张兵

核心观点

该研报介绍了一种基于大型语言模型（LLM）和树搜索（TS）的AI系统，旨在自动创建专家级科学软件，以解决可评分的科学任务。该系统通过LLM重写软件来尝试提高其质量指标，并使用TS来决定哪些候选方案值得进一步探索。

关键数据和研究结论

可评分任务在科学中的普遍性：研报认为，可评分任务在科学中普遍存在，几乎所有科学、应用数学和工程领域都依赖于软件，而其中大部分软件都是解决可评分任务的实证软件。
实证软件创建的挑战：实证软件的创建通常是缓慢且困难的，需要繁琐的工作，并且缺乏对替代方法的系统搜索，导致设计选择往往受直觉或便利性影响。
AI系统的有效性：该AI系统在生物信息学、流行病学、地理空间分析、神经活动预测、时间序列预测和数值积分等领域都取得了专家级的结果。
- 生物信息学：发现了40种新的单细胞数据分析方法，其中一种方法（BBKNN(TS)）在公开排行榜上优于最先进的人工开发方法。
- 流行病学：生成了14个模型，其中10个模型在预测COVID-19住院人数方面优于美国疾病控制与预防中心（CDC）的集成模型和所有其他单个模型。
- 地理空间分析：在DLRSD基准测试中，生成的三种解决方案在语义分割方面显著优于最近发表在学术论文中的结果。
- 神经科学：在ZAPBench基准测试中，生成的模型在预测斑马鱼全脑神经活动方面优于所有其他基线方法。
- 时间序列预测：在GIFT-Eval基准测试中，生成的模型在时间序列预测方面具有竞争力，并且能够创建一个通用的预测库。
- 数值分析：生成的代码能够正确评估19个标准算法无法解决的积分，其中17个积分的误差小于3%。
系统的工作原理：该系统使用LLM重写软件代码，并通过TS来探索解决方案空间。LLM还可以从高度引用的论文、专业教科书和搜索引擎中获取研究想法，并将其注入到代码生成过程中。
系统与其他方法的比较：与传统的遗传编程、生成编程、自动机器学习（AutoML）和科学问题代理相比，该系统更具通用性，能够解决更广泛的科学问题，并且能够更有效地整合复杂的研究想法。

研究意义

该AI系统代表了科学进步的重要一步，它能够快速、系统地创建和改进科学软件，从而加速科学发现的过程。

Eser Aygün1,*, Anastasiya Belyaeva2,*, Gheorghe Comanici1,*, Marc Coram2,*, Hao Cui2,*, Jake Garrison3,*,Renee Johnston2,*, Anton Kast2,*, Cory Y. McLean2,*, Peter Norgaard2,*, Zahra Shamsi2,*, David Smalling1,*,James Thompson2,*, Subhashini Venugopalan2,*, Brian P. Williams2,*, Chujun He2,4,**, Sarah Martinson2,5,**,Martyna Plomecka2,6,**, Lai Wei2, Yuchen Zhou2, Qian-Ze Zhu2,5,**, Matthew Abraham2, Erica Brand2, AnnaBulanova1, Jeffrey A. Cardille2,7, Chris Co2, Scott Ellsworth2, Grace Joseph2, Malcolm Kane2, RyanKrueger2,5,**, Johan Kartiwa2, Dan Liebling2, Jan-Matthis Lueckmann2, Paul Raccuglia2, Xuefei (Julie)Wang2,8,**, Katherine Chou2, James Manyika2, Yossi Matias2, John C. Platt2, Lizzie Dorfman2, Shibl Mourad1,‡and Michael P. Brenner2,5,‡ 1Google DeepMind,2Google Research,3Google Platforms and Devices,4Massachusetts Institute of Technology,5School ofEngineering and Applied Sciences, Harvard University,6Google Cloud,7Faculty of Agricultural and Environmental Sciences, McGillUniversity,8California Institute of Technology The cycle of scientific discovery is frequently bottlenecked by the slow, manual creation of software tosupport computational experiments. To address this, we present an AI system that creates expert-levelscientific software whose goal is to maximize a quality metric. The system uses a Large Language Model(LLM) and Tree Search (TS) to systematically improve the quality metric and intelligently navigate thelarge space of possible solutions. The system achieves expert-level results when it explores and integratescomplex research ideas from external sources. The effectiveness of tree search is demonstrated across awide range of benchmarks. In bioinformatics, it discovered 40 novel methods for single-cell data analysisthat outperformed the top human-developed methods on a public leaderboard. In epidemiology, itgenerated 14 models that outperformed the CDC ensemble and all other individual models for forecastingCOVID-19 hospitalizations. Our method also produced state-of-the-art software for geospatial analysis,neural activity prediction in zebrafish, time series forecasting and numerical solution of integrals. Bydevising and implementing novel solutions to diverse tasks, the system represents a significant steptowards accelerating scientific progress. Keywords: Tree Search, Generative AI, Scorable Scientific Tasks, Empirical Software Introduction Scientists need diverse information to advance their scientific agendas. Some are simple questions forwhich perfunctory answers can be fulfilled by a search engine. However, performing computationalexperiments often demands deeper information. For example, one of the authors’ research involvesdeforestation analyses, assessing land cover change1using global spatially-resolved measurements,past and present. This is carried out using a satellite-based deforestation detector, built with codeto answer a scientific question. A deforestation detector is one of many thousands of examples ofempirical softwarein science. We use the term empirical software to mean software that is designedto maximize a definable or measurable quality score, typically a fit to existing observations. If a taskcan be solved with empirical software, we call this ascorable task. We have two hypotheses about the scorable tasks and empirical software in science. First,scorabletasks are ubiquitous in science.Almost every sub-field of science, applied mathematics, and engineeringnow relies on software. In the combined experience of the authors, we have found that much of thissoftware is empirical software solving a scorable task. Often such empirical software is at the heart of a scientist’s work. Empirical software has recently enabled a number of Nobel Prizes in Chemistry: in1998 for Density Functional Theory2,3, in 2013 for molecular dynamics simulation4and in 2024 forprotein structure prediction5,6. Empirical software underlies our ability to create models of complexsystems, ranging from parameterizations of a vertical column of the earth’s atmosphere for weathermodeling7, to the parameterization of stress response in a turbulent fluid flow8, to the prediction ofsocial systems9–11. Second,empirical software for science is slow and difficult to create. Domain-specific empiricalsoftware requires tedious work, often over many years. When empirical software is used to testcomplex hypotheses, it becomes ever more difficult to write purely from first principles. There usuallyis no systematic search for alternative approaches. Design choices are often governed by intuition orexpediency, rather than exhaustive experimentation. Creating the software is so time-consuming thatit severely limits the possibilities that can be productively explored. This paper presents an AI-based system that systematically and automatically creates empiricalsoftware to solve scorable tasks. Our method is based on an LLM that rewrites software to attempt toimprove its quality score. The system create

点击免费查看完整报告

一款助力科学家编写专家级实证软件的人工智能系统

核心观点

关键数据和研究结论

研究意义

你可能感兴趣

开源与人工智能的未来：智能体如何颠覆我们的系统、先例以及软件中的人类角色

合成生物学周报：凯赛生物获“分离提纯生物基哌啶的系统”专利，人工智能助力合成生物学发展

超级智能代理带来灾难性风险：科学家人工智能能否提供更安全的路径

勒索软件二进制文件的实证比较分析

勒索软件二进制文件的实证比较分析

风电机组及风电直流并网系统的构网控制与实证

人工智能的崛起：能源与经济影响的实证审视

中国少儿人工智能教育发展趋势研究报告：基于2026年暑期编程教育市场的实证分析与案例研究

人工智能助力软件货币化

基于云计算系统平台的软件供应商