您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [OpenAI]:GPT-5.3-Codex系统卡 - 发现报告

GPT-5.3-Codex系统卡

2026-02-05 OpenAI 土豆不吃泥
报告封面

OpenAI February 5, 2026 Contents 2Baseline Model Safety Evaluations32.1Disallowed Content Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 3Product-Specific Risk Mitigations4 3.1Agent sandbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43.2Network access. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4Model-Specific Risk Mitigations5 4.1Avoid data-destructive actions. . . . . . . . . . . . . . . . . . . . . . . . . . . .54.1.1Risk description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54.1.2Mitigation: Safety training. . . . . . . . . . . . . . . . . . . . . . . . . .6 5Preparedness6 5.1Capabilities Assessment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 5.1.1.1Tacit Knowledge and Troubleshooting. . . . . . . . . . . . . . .75.1.1.2ProtocolQA Open-Ended. . . . . . . . . . . . . . . . . . . . . .85.1.1.3Multimodal Troubleshooting Virology. . . . . . . . . . . . . . .85.1.1.4TroubleshootingBench . . . . . . . . . . . . . . . . . . . . . . . .9 5.1.2.1Capture-the-flag (professional). . . . . . . . . . . . . . . . . . .125.1.2.2CVE-Bench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135.1.2.3Cyber Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145.1.2.4External Evaluations by Irregular. . . . . . . . . . . . . . . . .17 5.1.3.1Monorepo-Bench . . . . . . . . . . . . . . . . . . . . . . . . . . .185.1.3.2OpenAI-Proof Q&A. . . . . . . . . . . . . . . . . . . . . . . . .19 5.2Safeguards Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215.2.1Cyber Safeguards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .215.2.1.1Threat Model and Scenarios. . . . . . . . . . . . . . . . . . . .225.2.1.2Cyber Threat Taxonomy. . . . . . . . . . . . . . . . . . . . . .225.2.1.3Safeguards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .235.2.1.3.1Model Safety Training . . . . . . . . . . . . . . . . . . .245.2.1.3.2Conversation monitor. . . . . . . . . . . . . . . . . . .245.2.1.3.3Expert Red Teaming. . . . . . . . . . . . . . . . . . .255.2.1.3.4Actor Level Enforcement. . . . . . . . . . . . . . . . .275.2.1.3.5Trust-based access. . . . . . . . . . . . . . . . . . . . .275.2.1.4Security Controls . . . . . . . . . . . . . . . . . . . . . . . . . . .285.2.1.5Misalignment risks and internal deployment . . . . . . . . . . . .285.2.1.6Sufficiency of Risk Mitigation Measures. . . . . . . . . . . . . .29 1Introduction GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier codingperformance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities ofGPT-5.2.This enables it to take on long-running tasks that involve research, tool use, andcomplex execution. Much like a colleague, you can steer and interact with GPT-5.3-Codex whileit’s working, without losing context. Like other recent models, it is being treated as High capability on biology, and is being deployedwith the corresponding suite of safeguards we use for other models in the GPT-5 family. It doesnot reach High capability on AI self-improvement. This is the first launch we are treating as High capability in the Cybersecurity domain under ourPreparedness Framework, activating the associated safeguards. We do not have definitive evidencethat this model reaches our High threshold, but are taking a precautionary approach becausewe cannot rule out the possibility that it may be capable enough to reach the threshold. Oursafeguards for high capability in cybersecurity rely on a layered safety stack designed to impedeand disrupt threat actors, while we work to make these same capabilities as easily available aspossible for cyber defenders. 2Baseline Model Safety Evaluations 2.1Disallowed Content Evaluations To further ecosystem insight into frontier AI, and for consistency with our general practice,we provide the following benchmark evaluations of GPT-5.3-Codex across disallowed contentcategories tested in a conversational setting. A subset of these evals are displayed below. Wedo not believe these conversational evals are reflective of real-world risk in the context of thiscoding-focused model.1 We report here on our Production Benchmarks, an evaluation set with conversations representativeof challenging examples from production data.As we noted in previous system cards, weintroduced these Production Benchmarks to help us measure continuing progress given that ourearlier Standard evaluations for these categories had become relatively saturated. These evaluations were deliberately created to be difficult. They were built around cases in whichour existing models were not yet giving ideal responses, and this is reflected in the scores below.Error rates are not representative of average production traffic. The primary metric is not_unsafe,checking that the model did not produce output that is disallowed under the relevant OpenAIp