行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

王俊杰-基于多模态大模型的用户界面交互和测试

信息技术 2024-11-17 2024AI研发数字峰会AiDD北京站 LM

用户界面测试现状和挑战
- 现有测试工具（如Monkey、Fastbot2等）面临挑战，包括合适文本输入、连续长串操作、复合操作、页面功能理解及逻辑错误的发现。
- 成果：相关研究发表在ICSE、TSE、CHI等旗舰会议/期刊，并在贝壳找房、抖音、华为鸿蒙生态等应用中验证。
测试输入生成技术
- 通过设定语言模式生成提示，基于当前页面内容生成文本输入。
- 评估：通过率0.87，显著提升活动量和错误检测（122%），应用于GUI测试工具。
面向测试路径规划的自动化GUI测试技术
- GPTDroid：将自动GUI测试问题转化为交互式问答任务，让LLM理解GUI语义信息并自动推断操作步骤。
  - 核心技术：GUI上下文提取、GUI提示和执行命令生成、功能感知记忆提示。
  - 评估：平均活动覆盖率75%，检测错误数比最佳基线高32%。
- VisionDroid：视觉驱动、多智能体协作的自动化GUI测试方法，用于检测非崩溃功能错误。
  - 智能体分工：
    - 探索器：导航应用、捕获视图层次和截图，引导探索。
    - 监督器：记录探索历史，触发检测器。
    - 检测器：通过GUI页面变化逻辑判断潜在功能错误。
  - 挑战与解决方案：
    - 视觉和文本对齐：集成文本属性与视觉上下文，标注可操作组件。
    - 功能导向探索：从详细探索序列中抽象功能，避免超限。
    - 测试预言推断：功能感知链式思维，先推断预言再检测错误。
  - 评估：精度50%-72%，召回率42%-65%，平均召回和精度提升显著。
针对文本输入的模糊测试技术
- Unusual Text Inputs Generation：通过LLM生成异常文本输入，检测崩溃错误。
  - 生成规则：插入特殊字符等。
  - 评估：发现率72%-78%，显著高于基线。
面向文本输入组件的交互提升技术
- HintDroid：预测文本输入的提示文本。
  - 核心模块：
    - GUI实体提取和提示。
    - 示例丰富提示。
    - 反馈提取和提示。
  - 评估：
    - 准确性：优于所有基线。
    - 有效性：帮助视障用户填写输入，提升输入准确率。
    - 用途：生成高质量提示文本。
总结和展望
- 挑战：测试覆盖率不足、页面元素复杂、描述模糊、大模型输出标号和描述对不上。
- 展望：进一步优化功能导向探索、视觉与文本对齐、测试预言推断等技术。

基于多模态大模型的用户界面交互和测试王俊杰中国科学院软件研究所演讲嘉宾王俊杰中国科学院软件研究所研究员，博士生导师中国科学院软件研究所研究员，博士生导师，中国科学院特聘研究岗位、青年创新促进会会员，主要从事智能化软件工程、软件质量等方面的研究，近年来主要关注智能软件测试、大模型驱动的软件测试等。在国际著名学术期刊/会议发表60余篇高水平学术论文，四次荣获ACM/IEEE杰出论文奖。主持和参与了多项国家自然科学基金项目、科技部重点研发计划、CCF-华为胡杨林基金等。担任CCF A类期刊TSE的Associate Editor，ICSE、FSE、ISSRE等的PCmember，TOSEM、EMSE、AUSE、软件学报等期刊的审稿人。 1.用户界面测试现状和挑战2.测试输入生成技术3.面向测试路径规划的自动化GUI测试技术4.基于多模态大模型的自动化GUI测试技术5.针对文本输入的模糊测试技术6.面向文本输入组件的交互提升技术7.总结和展望用户界面测试现状和挑战相关成果 ◆多篇论文发表在软件工程和人机交互领域旗舰会议/期刊ICSE、TSE、CHI等 ◆贝壳找房app、抖音app、华为鸿蒙生态、新能源汽车车载系统进行了应用或对接中 ICSE 2024 ICSE 2024 ICSE 2023 Under submission TSE 2024 FSE2024-SE2030 提纲 ◆用户界面测试现状和挑战 ◆面向测试路径规划的自动化GUI测试技术 ◆基于多模态大模型的自动化GUI测试技术 ◆针对文本输入的模糊测试技术 ◆面向文本输入组件的交互提升技术 Text Input Generation AskLLMto fill in the blank according to the generated prompts Text input generation •Set up linguistic patterns to generate prompts based on the current page Text input generation for mobile app testing •Examples Evaluation •Passing rate: 0.87 •Significant activity boost and122% (51 vs 23) more bugsby added to GUItesting tools 提纲 ◆用户界面测试现状和挑战◆测试输入生成技术◆面向测试路径规划的自动化GUI测试技术◆基于多模态大模型的自动化GUI测试技术◆针对文本输入的模糊测试技术◆面向文本输入组件的交互提升技术 GPTDroid: Function-aware Automatic GUI testing ◆Auto GUI testing with LLM •Formulate the automatic GUI testing problemto an interactive question & answering task •to let the LLM conduct the whole app testingby understanding the GUI semanticinformation and automatically inferringpossible operation steps GPTDroid: Function-aware Automatic GUI testing ◆GUI context extraction ◆GUI prompting and executive commandgeneration ◆Functionality-aware memory prompting •Testing sequence memorizer to record all the detailedinteractive testing information, e.g., the exploredactivities and widgets GPTDroid: Function-aware Automatic GUI testing ◆GUI context extraction •Accurately depict the GUI page currently under test, as well as its contained widgetsinformation from a more micro perspective, and the app information from a moremacro perspective. GPTDroid: Function-aware Automatic GUI testing ◆GUI prompting and executive command generation •Feedback prompt:inform the LLM error occurred and re-try for querying next operation•Command generation:provide LLM with the output template, including available operationsand operation primitives GPTDroid: Function-aware Automatic GUI testing ◆Functionality-aware memory prompting •Build a testing sequence memorizer to record all detailed testing information•Query the LLM about the function-level progress of the testing•Functionality-aware memory prompt Example Evaluation ◆75% average activity coverage, 32% higher than the best baselines◆detects 95 bugs for the 93 apps, 31% higher than the best baselines Evaluation ◆Capability ofGPTDroid •Function-aware exploration through long meaningful testing trace•Prioritization•Valid text inputs & compound operations 提纲 ◆用户界面测试现状和挑战◆测试输入生成技术◆面向测试路径规划的自动化GUI测试技术◆基于多模态大模型的自动化GUI测试技术◆针对文本输入的模糊测试技术◆面向文本输入组件的交互提升技术 ◆Crash-bugs vs. Non-crash functional bugsVisionDroid：Vision-driven Automated MobileGUI Testing ◆Vision-driven, multi-agent collaborativeVisionDroid：Vision-driven Automated MobileGUI Testing automated GUI testing approach for detectingnon-crash functional bugs ◆Explorer Agent: navigates through the app, captures viewhierarchies and screenshots, and guides the explorationtowards diverse GUI pages while focusing on the app’sfunctionalities. ◆Monitor Agent: supervises the testing process, records theexploration history, and triggers the detector agent at theappropriate time. ◆Detector Agent: identifies potential functional bugs byexamining whether there are any issues in the logicaltransitions that occur during GUI page changes ◆Challenge 1: Aligning visual and text for MLLM input.VisionDroid：Vision-driven Automated MobileGUI Testing •Alignment method that integrates text properties withvisual context;•Screenshot annotation method, pay attention todifferent types of actionable widgets, resolve issue ofoverlapping ◆Challenge 2: Functionality-oriented exploration. •Infer and abstract the current functionality fromdetailed exploration sequences, avoids exceeding tokenlimits when interacting with the LLM, enableexploration more focusing on the functionality aspect ◆Challenge 3: Inferring test oracle. •Let the Monitor Agent trigger the Detector Agent atthe end of each functionality exploration•Functionality-aware Chain-of-Thought (COT) to enablethe MLLM to first explicitly infer oracles and thendetect functional bugs based on these inference Explorer Agent Monitor Agent Detector Agent ◆Enriching Detector Prompt with Example•bug description, bug screenshot and naturallanguage described bug reproduction pathwhich facilitates the MLLM understanding ofwhat the non-crash functional bugs are Evaluation ◆50%-72%precision and42%-65% recall◆more than 14%-112% and 108%-147% boost inaverage recall andprecisioncompared withthe best baseline 面向自然语言描述的测试用例迁移 ◆测试需求：我家-添加资产流程

点击免费查看完整报告

王俊杰-基于多模态大模型的用户界面交互和测试

你可能感兴趣

8-1 基于多模态大模型的人机对话-王金桥

【机构龙虎榜解读】多模态+AI大模型+机器人，自主研发深度学习人脸识别算法，在视觉图像识别及人机智能交互方面拥有产业链关键技术，并于去年开始在双足和四足机器人方向展开技术探索，机构大额净买入这家公司

计算机行业人工智能系列报告(六)：交互型多模态大模型有望带来应用的爆发起点

基于多模态智能交互技术的视频银行服务

原玉娇-大模型在端到端交互测试的探索与实践

基于车路协同的自动驾驶实车在环测试系统应用数据交互信息集

【点金互动易】机器人AI多模态，细分机器人产品在日、法等多国均有交付，在人形机器人领域展开布局，已发布基于多模态超融合技术的大模型，这家公司Al领域储备包括视觉技术、机器人控制技术

操作系统智能体：基于多模态大模型（mllm）的通用计算设备智能体综述

马斯克大模型产品迅速迭代升级，Grok1·5有望在下个月发布，AI大模型应用或迎新一轮热潮，这家公司相关产品已具备多模态交互等基座功能

2024交互型多模态大模型研究进展、应用前景以及商业模式分析报告

王俊杰-基于多模态大模型的用户界面交互和测试

你可能感兴趣

8-1 基于多模态大模型的人机对话-王金桥

【机构龙虎榜解读】多模态+AI大模型+机器人，自主研发深度学习人脸识别算法，在视觉图像识别及人机智能交互方面拥有产业链关键技术，并于去年开始在双足和四足机器人方向展开技术探索，机构大额净买入这家公司

计算机行业人工智能系列报告(六)：交互型多模态大模型有望带来应用的爆发起点

基于多模态智能交互技术的视频银行服务

原玉娇-大模型在端到端交互测试的探索与实践

基于车路协同的自动驾驶实车在环测试系统 应用数据交互信息集

【点金互动易】机器人AI多模态，细分机器人产品在日、法等多国均有交付，在人形机器人领域展开布局，已发布基于多模态超融合技术的大模型，这家公司Al领域储备包括视觉技术、机器人控制技术

操作系统智能体：基于多模态大模型（mllm）的通用计算设备智能体综述

马斯克大模型产品迅速迭代升级，Grok1·5有望在下个月发布，AI大模型应用或迎新一轮热潮，这家公司相关产品已具备多模态交互等基座功能

2024交互型多模态大模型研究进展、应用前景以及商业模式分析报告

基于车路协同的自动驾驶实车在环测试系统应用数据交互信息集