行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

QuarkMed医疗基础模型技术报告

2025-08-19 阿里巴巴张彦男 Tim

核心观点

QuarkMed 是由阿里巴巴集团 Quark Medical 团队开发的一款针对医疗领域的大型语言模型（LLM），旨在解决通用 LLM 在医疗应用中的局限性。该模型通过整合精心策划的医疗数据、先进的检索增强生成（RAG）技术和多阶段强化学习训练，实现了在医疗领域的专业知识和准确性。

数据来源

QuarkMed 的训练数据主要来源于三个方面：

医疗材料：包括教科书、临床指南、共识声明、学术文献、药品说明书、医学百科全书和临床路径等，涵盖了事实性、概念性和程序性知识。
医疗知识：通过知识转换技术将结构化数据（如知识图谱）转换为自然语言数据，增强模型的知识应用能力。
医疗记录：包括公开的在线医疗咨询对话和去识别化的电子健康记录（EHR），提供真实的临床叙事、诊断推理和治疗计划。

训练方法

QuarkMed 的训练过程分为三个阶段：

指令微调（IFT）：通过构建包含数百个核心任务和超过 40 万个高质量样本的指令数据集，提升模型理解和执行专业医疗任务的能力。
监督微调（SFT）：利用合成数据和真实在线数据，训练模型进行高级理解和推理，并增强其安全性和准确性。
强化学习（RL）：分为两个阶段，第一阶段专注于推理任务，提升模型在疾病诊断、合理用药和检验选择等方面的推理能力；第二阶段则通过多维度的奖励机制，使模型的行为与人类偏好和价值观保持一致。

评估结果

QuarkMed 在多个公开和内部医疗基准测试中表现出色，包括：

医疗问答（QA）：在 MedQA、MedMCQA、PubMedQA、CMExam 和 AfriMed-QA 等数据集上取得了优异的成绩。
医疗推理：在 MedXpertQA、DiagnosisArena、RareBench 等数据集上表现出强大的推理能力，仅次于一些闭源模型。
基础医疗能力：在 MMLU 和 MedCalc 等基准测试中展现了出色的临床计算能力。

研究结论

QuarkMed 是一款功能强大且通用的个人医疗 AI 解决方案，已在数百万用户中提供服务。该模型的出现标志着医疗 AI 领域的一个重要里程碑，为开发更可靠和有效的医疗 AI 工具提供了新的思路。未来，QuarkMed 将继续发展，朝着多模态、实时个性化等方向发展，为全球医疗健康事业做出更大的贡献。

Ao Li1, Bin Yan1, Bingfeng Cai1, Chenxi Li1, Cunzhong Zhao1, Fugen Yao1, Gaoqiang Liu1, Guanjun Jiang1,Jian Xu1, Liang Dong1, Liansheng Sun1, Rongshen Zhang1, Xiaolei Gui1, Xin Liu1, Xin Shang1, Yao Wu1, YuCao1, Zhenxin Ma1and Zhuang Jia11Quark Medical Team, Alibaba Group Recent advancements in large language models have significantly accelerated their adoption in health-care applications, including AI-powered medical consultations, diagnostic report assistance, and medicalsearch tools. However, medical tasks often demand highly specialized knowledge, professional accuracy,and customization capabilities, necessitating a robust and reliable foundation model. QuarkMed ad-dresses these needs by leveraging curated medical data processing, medical-content Retrieval-AugmentedGeneration (RAG), and a large-scale, verifiable reinforcement learning pipeline to develop a high-performance medical foundation model.The model achieved 70% accuracy on the Chinese Medi-cal Licensing Examination, demonstrating strong generalization across diverse medical benchmarks.QuarkMed offers a powerful yet versatile personal medical AI solution, already serving over millions ofusers at https://ai.quark.cn. 1. Introduction The advent of large language models (LLMs) has marked a pivotal moment in artificial intelligence, demonstrat-ing remarkable capabilities in understanding and generating human-like text across a multitude of domains.This progress has catalyzed significant interest in their application to specialized fields, particularly medicine,where they hold the potential to revolutionize medical information retrieval, enhance early diagnostic accuracy,and support personalized health care requirements. However, the medical domain presents unique and formidable challenges [47]. Unlike general-domain text,medical language is characterized by a highly specialized vocabulary, complex clinical concepts, and a nuancedsyntax that is often ambiguous and context-dependent. As a result, general-purpose LLMs, which are typicallyfine-tuned for broad, non-medical corpora, often lack the deep, specialized knowledge required for high-stakesmedical applications [1]. This knowledge gap can lead to unsatisfactory, and at times unsafe, performancewhen these models are directly applied to medical tasks. Recognizing these limitations, the research community has shifted towards developing domain-specificfoundation models for medicine. This endeavor began with the adaptation of Transformer-based architectureslike BERT (Bidirectional Encoder Representations from Transformers). Early pioneering work led to the creationof models such as BioBERT [24], which was pre-trained on large-scale biomedical literature, and ClinicalBERT[18], which was trained on unstructured clinical notes from electronic health records (EHRs). These modelsdemonstrated that domain-specific pre-training significantly improves performance on various biomedical textmining tasks. Following this trend, models like BEHRT were developed to specifically model structured EHRdata for predicting clinical events [27]. The success of these earlier models paved the way for the development of generative models tailored formedicine. BioGPT, for instance, was a generative pre-trained transformer that excelled at creating fluentbiomedical text and improving performance on downstream tasks [29]. As model scaling became a key driverof performance, the field saw the emergence of significantly larger and more powerful medical LLMs. Modelslike GatorTron, with billions of parameters trained on massive clinical text datasets, demonstrated the benefitsof scale in capturing the long-range dependencies and intricate relationships within clinical narratives [49]. More recently, the landscape has been defined by even larger and more sophisticated models that integrateextensive medical knowledge with robust instruction-following capabilities.Med-PaLM and its successor were among the first to approach expert-level performance on medical licensing examination-style questions,leveraging a combination of improved base models, medical domain fine-tuning, and advanced promptingstrategies [1,39]. Concurrently, the open-source community has produced a variety of powerful medical LLMs.Models like PMC-LLaMA [45], MEDITRON-70B [7], BioMedLM [33], and BioMistral [23] have been developedby pre-training on vast corpora of biomedical literature and clinical data, showing performance competitive withproprietary models. This proliferation of models has been accompanied by the creation of more comprehensiveand challenging benchmarks, such as MedExQA [12] and MedS-Bench [46], which evaluate LLMs on morecomplex, long-form question answering and a wider array of clinical tasks. Beyond supervised learning, Reinforcement Learning (RL) has emerged as a powerful paradigm for optimiz-ing sequential decision-making, making it a promising approach for healthcare applications such as developingdynamic treatment regimes [19]. Concurren

点击免费查看完整报告

QuarkMed医疗基础模型技术报告

核心观点

数据来源

训练方法

评估结果

研究结论

你可能感兴趣

【电报解读】全球首个医疗基础模型群发布，AI+医疗可提升医疗圈流程效率未来市场规模或近400亿，这家企业正研发专病智能阅片、智能导诊等AI产品

建投计算机OpenAI公布技术路线图深度推理成为基础模型的重要组成

城市信息模型（CIM）基础平台技术导则

基于视频技术基础，延伸远程医疗和远程教育；

中国肿瘤医疗服务行业市场规模测算逻辑模型头豹词条报告系列

中国医疗数字化服务平台行业市场规模测算逻辑模型头豹词条报告系列

中国医疗数字化服务平台行业市场规模测算逻辑模型头豹词条报告系列

中国磁在医疗中的应用行业市场规模测算逻辑模型头豹词条报告系列

中国康复医疗服务行业市场规模测算逻辑模型头豹词条报告系列

中国医疗检测行业市场规模测算逻辑模型头豹词条报告系列

QuarkMed医疗基础模型技术报告

你可能感兴趣

【电报解读】全球首个医疗基础模型群发布，AI+医疗可提升医疗圈流程效率未来市场规模或近400亿，这家企业正研发专病智能阅片、智能导诊等AI产品

建投计算机OpenAI公布技术路线图深度推理成为基础模型的重要组成

城市信息模型（CIM）基础平台技术导则

基于视频技术基础，延伸远程医疗和远程教育；

中国肿瘤医疗服务行业市场规模测算逻辑模型 头豹词条报告系列

中国医疗数字化服务平台行业市场规模测算逻辑模型 头豹词条报告系列

中国医疗数字化服务平台行业市场规模测算逻辑模型 头豹词条报告系列

中国磁在医疗中的应用行业市场规模测算逻辑模型 头豹词条报告系列

中国康复医疗服务行业市场规模测算逻辑模型 头豹词条报告系列

中国医疗检测行业市场规模测算逻辑模型 头豹词条报告系列

中国肿瘤医疗服务行业市场规模测算逻辑模型头豹词条报告系列

中国医疗数字化服务平台行业市场规模测算逻辑模型头豹词条报告系列

中国医疗数字化服务平台行业市场规模测算逻辑模型头豹词条报告系列

中国磁在医疗中的应用行业市场规模测算逻辑模型头豹词条报告系列

中国康复医疗服务行业市场规模测算逻辑模型头豹词条报告系列

中国医疗检测行业市场规模测算逻辑模型头豹词条报告系列