行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

CRASE5用于ACT写作技术报告

信息技术 2026-02-03 ACT 🌱

CRASE5 研究报告总结

一、引言

自 2022 年 10 月起，ACT 开始在其国际项目 ACT 写作考试中使用自动化评分引擎 CRASE® 生成其中一份评分。此后，CRASE® 已被应用于 ACT 区域（2023 年春季）、ACT 州级等更多项目。为支持该决策，ACT 建立了研究议程，通过多项概念验证研究评估 CRASE® 评分与人工评分的准确性。自 CRASE® 开始评分 ACT 写作以来，ACT 希望通过添加新功能（如自动检测离题内容、自动检测不当内容、利用现代模型拟合方法等）来扩展其处理能力。本报告旨在使用 CRASE5 生成的新通用评分模型复制 CRASE+ 技术报告中的研究，并评估 CRASE5 模型的性能。

二、背景：自动化评分和 CRASE5

自动化评分（或自动化作文评分）是指使用计算机算法模拟人工评分行为。评分算法称为引擎，而准备评分算法以投入运营称为训练引擎。CRASE®（即 Constructed Response Automated Scoring Engine）于 2007 年为美国一州的总结性评估项目创建。CRASE5 是 CRASE 软件的第五个主要版本，提供了更大的灵活性来开发评分模型。本报告假设读者对自动化评分概念有基本了解，并推荐相关资源。

三、引擎训练和验证方法

数据

CRASE 引擎的训练需要人工评分的作文数据，这些数据应在真实测试条件下收集，并代表不同写作任务的样本。本报告使用的数据来自 2020 年 9 月和 10 月的 ACT 国际考试以及 2021 年春季的州级和区域考试。训练和盲测验证样本分别包含 8,862 篇和 5,128 篇作文。

训练和验证样本

通用评分模型的目标是使用来自多个写作任务的作文数据构建模型，以便在类似任务的作文数据上使用。训练样本用于确定最佳拟合模型，而盲测验证样本用于通过模拟新作文的运营实践来评估模型。

引擎训练

CRASE5 引擎的训练方式与 CRASE+ 相同，使用默认的 39 个写作特征。ACT 写作的记录分数定义如下：如果两位评分者对同一篇作文的同一领域分配相同的分数，则记录分数为两位评分者分数的总和；如果两位评分者分配的分数相差不超过 1 分，则记录分数为两位评分者分数的总和；如果两位评分者分配的分数相差超过 1 分，则由第三位评分者进行解决阅读。

引擎评估

自动化评分模型使用盲测验证样本的分布指标和协议指标进行评估。分布指标包括评分者 1、评分者 2 和 CRASE5 的分数点分布、均值和标准差。协议指标包括评分者间信度（即两位独立评分者之间的一致程度）。评估指标包括标准化均值差异（SMD）、精确协议率、邻近协议率、kappa 和二次加权 kappa（QWK）。

四、引擎训练和验证结果

基线结果（1-6 分制）

CRASE5 模型的评分结果与人工评分结果相似。所有四个领域的 R1-CRASE5 和 R2-CRASE5 一致率均优于 R1-R2 一致率。所有但一个 QWK 超过了 0.70 的阈值，所有但一个精确协议率超过了 60% 的阈值。

基线结果（2-12 分制）

CRASE5 模型的评分结果与人工评分结果相似。所有四个领域的 SR-CRASE5 QWK 均超过 0.88，表明模型具有较高的评分准确性。

五、子群分析

方法

子群分析旨在评估不同子群（如性别、西班牙裔身份和种族/民族）的评分差异。本报告采用 ETS 风格分析，计算了不同子群的评分指标。

结果

对于 1-6 分制和 2-12 分制，所有性别和西班牙裔/非西班牙裔子群指标均符合预期，除少数例外。种族/民族子群指标中，少数情况下未达到预期，但总体差异较小。

六、自动检测条件代码

CRASE5 具有自动分配条件代码的功能，以识别无效或难以评分的作文。这些条件代码包括空白、作废、无效、离题、无法辨认、非英语、不当内容和踢出。CRASE5 通过一系列检查来识别这些条件代码，并在必要时将作文发送给人工评分者或测试管理员进行审查。

总结

CRASE5 引擎的评分结果与人工评分结果相似，并在子群分析中表现出较小的差异。CRASE5 的自动检测功能能够有效识别无效或难以评分的作文，提高了评分的准确性和效率。这些结果表明，CRASE5 可以作为人工评分的有效补充，并有助于提高大规模考试的评分效率和质量。

Scott W. Wood, Sungjin Nam, and Dongmei Li I. Introduction Starting in October 2022, ACT began using its automated scoring engine, CRASE®, to provideone of the two rater scores for ACT writing essays in the ACT International program. Since then,CRASE has been used for other programs, including ACT District (Spring 2023), ACT State To support this decision, ACT created a significant research agenda. Researchers conductedmultiple proof-of-concept studies to evaluate the accuracy of CRASE scores compared to those Since CRASE began to score ACT writing, there has been a desire to add new enginefunctionalities in order to expand the kinds of essays CRASE can handle. The newfunctionalities include automatically detecting off-topic essays, automatically detecting disturbingcontent, leveraging modern model-fitting approaches, and providing information about the The primary purpose of this report is to replicate the studies listed in the CRASE+ technicalreport using the new generic scoring models produced in CRASE5. The results presented in thisreport, especially when compared to the corresponding results in the original report, should This report follows the organization of the CRASE+ technical report. The next section contains abrief overview of CRASE and CRASE5. Section III discusses the data and processes used totrain and validate the engine, while Section IV provides validation results. Section V contains a II. Background: Automated Scoring and CRASE5 Automated scoring (or automated essay scoring) is the use of a computer algorithm to emulatehand scoring behavior on constructed-response or essay items. The scoring algorithm is calledthe engine. Preparing the scoring algorithm for operational use is called training the engine.There are four parts to a scoring engine: a means of reading text data, a preprocessor thatstandardizes and initially processes the text, a means of extracting the quantitative CRASE, short for Constructed Response Automated Scoring Engine, was created in 2007 for aU.S. state’s summative assessment program. The system has since been enhanced to includescoring methods for additional types of free-response items and to incorporate newtechnologies in text processing and modeling. CRASE has been used operationally in multiple When CRASE was first used on ACT writing, it was called CRASE+and represented the fourthmajor version of the CRASE software. With the construction of a new engine that gives theengine trainer greater flexibility in developing scoring models, ACT proposed the use of This report assumes a basic familiarity with automated scoring concepts. For readers new toautomated scoring, the CRASE research team recommends the following resources: Lottridge, S., Burkhardt, A., & Boyer, M. (2020). Digital module 18: Automated scoring.Educational Measurement: Issues and Practice,39(3), 141–142.https://doi.org/10.1111/emip.12388 McCaffrey, D., Casablanca, J., Ricker-Pedley, K., Lawless, R., & Wendler, C. (2021).Bestpractices for constructed-response scoring. ETS.https://www.ets.org/content/dam/ets- Shermis, M. D., & Burstein, J. (Eds.). (2013).Handbook of automated essay evaluation: Currentapplications and new directions.Routledge. Wood, S., Yao, E., Haisfield, L., & Lottridge, S. (2021).Establishing standards of best practice inautomated scoring. ACT. Yan, D., Rupp, A. A., & Foltz, P. W. (Eds.). (2020).Handbook of automated scoring: Theory intopractice. CRC Press. III. Methods for Engine Training and Validation Data Data from hand-scored essays are required to train the CRASE engine. These data should becollected under authentic testing conditions, if possible, and must be representative of the This report uses the same ACT writing data from the original training and validation studiesconducted prior to 2023. Readers can find details about the data, the training sample, and theblind-validation sample in theCRASE+®for ACT Writing Technical Report. This section will The training and blind-validation essays came from three sources: the September 2020 ACTInternational administration, the October 2020 ACT International administration, and selectedSpring 2021 State and District administrations. Approximately two thirds of the records camefrom the State and District administrations. Only essays obtained via online administrations Information about hand scoring score point distributions, examinee gender, examinee Hispanicstatus, and examinee race/ethnicity can be found in the CRASE+ technical report on pages 5 Training and Validation Samples Recall that a generic scoring model is an automated scoring model built using essay data frommultiple writing prompts with the goal of using the model on essay data from comparable writingprompts. (The alternative is a prompt-specific model, where the model is built using essay datafrom a single writing prompt with the goal of using the model on essay data from only thatprompt.) Generic scoring models allow for consistent scoring regardl

点击免费查看完整报告

CRASE5用于ACT写作技术报告

CRASE5 研究报告总结

一、引言

二、背景：自动化评分和 CRASE5

三、引擎训练和验证方法

数据

训练和验证样本

引擎训练

引擎评估

四、引擎训练和验证结果

基线结果（1-6 分制）

基线结果（2-12 分制）

五、子群分析

方法

结果

六、自动检测条件代码

总结

你可能感兴趣

CRASE + 用于 ACT 写作技术报告

在ACT写作测试论文中使用CRASE置信度评分的有效性论证

技术专家组会议报告：用于保存结核病诊断临床标本的商业产品

AI写作头豹词条报告系列

2024年生成式AI写作指南报告V1.0

ChatGPT行业点评报告-解密ChatGPT：机器终将取代传统写作？

2024年生成式AI写作指南报告V1.0

【掘金行业龙头】PCB+封装基板，营收居国内行业第二，具备多种封装形式的封装基板技术能力，这家公司产品用于服务器、存储等领域

用于研究和学习的新兴技术：专家访谈

用于支付应用调查的生成式人工智能：人工智能对隐私和技术的看法

CRASE5用于ACT写作技术报告

你可能感兴趣

CRASE + 用于 ACT 写作技术报告

在ACT写作测试论文中使用CRASE置信度评分的有效性论证

技术专家组会议报告：用于保存结核病诊断临床标本的商业产品

AI写作 头豹词条报告系列

2024年生成式AI写作指南报告V1.0

ChatGPT行业点评报告-解密ChatGPT：机器终将取代传统写作？

2024年生成式AI写作指南报告V1.0

【掘金行业龙头】PCB+封装基板，营收居国内行业第二，具备多种封装形式的封装基板技术能力，这家公司产品用于服务器、存储等领域

用于研究和学习的新兴技术：专家访谈

用于支付应用调查的生成式人工智能：人工智能对隐私和技术的看法

AI写作头豹词条报告系列