Baby Introduction
Baby Introduction
Observation & Insights
1.缺少领域知识
Observation & Insights
1.缺少领域知识2.跨任务能力迁移
Observation & Insights
1.缺少领域知识2.跨任务能力迁移3.Test-timescaling
Observation & Insights
1.缺少领域知识2.跨任务能力迁移3.Test-timescaling
ReCopilot: Overview
➢构建二进制大模型:recopilot
ReCopilot: Raw Dataset
➢Data Sources:
1.现成的软件包:带有调试信息和源代码的软件源Launchpad.net(Ubuntu Software Sources)
2.交叉编译: Cross-Arch, Cross-Opti, Cross-Compiler, Cross-Platform
3.AutoCompiler: Ad-hoc Software/Library
Challenges&Insights
ReCopilot: Pretrain Dataset
➢目标:建模二进制代码&构建Binary-Source-NL映射关系
ReCopilot: Pretrain Dataset
➢目标:建模二进制代码&构建Binary-Source-NL映射关系➢预训练数据样例
{comment} {typedef} int foo(){...}
ReCopilot: SFT Dataset
➢目标:学习二进制分析任务&推理能力
Challenges&Insights
ReCopilot: SFT Dataset
➢目标:学习二进制分析任务&推理能力➢多任务支持(task tag)
⚫Function Name Recovery ⚫Function Signature Recovery ⚫Variables Recovery ⚫Arguments Recovery ⚫Variable Recovery
Challenges&Insights
ReCopilot: SFT Dataset
➢目标:学习二进制分析任务&推理能力➢多任务支持(task tag)
{context}
{target_func}{call_chains}{data_flow}
ReCopilot: SFT Dataset
➢自动化构建推理过程(Chain-of-ThoughtCoT)➢Generator-Discriminator (persona-driven data synthesis)
ReCopilot: DPO Dataset
➢目标:提升输出格式遵循能力&增强推理逻辑一致性
ReCopilot: DPO Dataset
➢目标:提升输出格式遵循能力&增强推理逻辑一致性➢收集good-bad样本对
ReCopilot: Demo
Challenges&Insights
Evaluation: Overall Effectiveness
➢Tool-level
Challenges&Insights
Evaluation: Model Comparison (up to date)
➢Model-level评估日期:Oct 17 2025
Next Stage?
➢低级别的分析任务➢需用户主动触发功能➢有限的上下文感知
ReCopilot-Agent
ReCopilot-Agent
➢How LLM agent works
ReCopilot-Agent: Code Embedding
Challenges&Insights
ReCopilot-Agent: Code Embedding
➢ReCopilot embedding model performance
ReCopilot-Agent: Case Study
➢真实案例: Firmwarevulnerabilityexploitation
ReCopilot-Agent: Case Study
Challenges&Insights
ReCopilot-Agent: Case Study
➢Preliminary Analysis1.Collectinformation
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB
RESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB
Challenges&Insights
ReCopilot-Agent: Case Study
➢Preliminary Analysis1.Collectinformation
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB
RESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%p
RESPONSE: Oct 10 10:32:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB(nil)
Challenges&Insights
ReCopilot-Agent: Case Study
➢Preliminary Analysis1.Collectinformation
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB
RESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%p
RESPONSE: Oct 10 10:32:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB(nil)
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%n
RESPONSE: watchdog: restart httpd [PROGRAM CRASHED onlibshared.so]
Challenges&Insights
ReCopilot-Agent: Case Study
➢ReCopilot-Agent Workflow
This binary is from awifi device firmware. We have found ahidden page in the firmware, `/apps_test.asp`, which allowthree parameters:action_mode=?&apps_name=?&apps_flag=?.
Each time we post some random value of three key, we thenobserved the log messages in system logging page. Afterfuzzing this endpoint, we have found several odd response:
POST:action_mode=install&apps_name=AAAA&apps_flag=BBBBRESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBBPOST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%pRESPONSE: Oct 10 10:32:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB(nil)POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%n
Finally, we noticed the crash occurred in thelibshared.sofile. Please help me analyze how this crash happened andwhether it could be an exploitable vulnerability.
ReCopilot-Agent: Case Study
ReCopilot-Agent: Case Study
➢ReCopilot-Agent Workflow
ReCopilot-Agent: Case Study
➢ReCopilot-Agent Workflow
ReCopilot-Agent: Case Study
➢ReCopilot-Agent Workflow
ReCopilot-Agent: Case Study
➢ReCopilot-Agent Workflow
notify_rc(const char *cmd_str)[EXPORT]↳notify_rc_service(cmd_str, 0, 15u);↳j_logmessage_normal("rc_service", "%s %d:notify_rc%s",...);↳logmessage_normal(flag,fmt);↳vsnprintf(msg, 0x200u,fmt, varg_r2);
ReCopilot-Agent: Case Study
RESPONSE: Oct 10 10:35:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB…...0x41414120.0x42422041
➢ReCopilot是一个专为二进制代码分析构建的专家LLM,以更小的模型尺寸实现了超越通用先进LLM和其他领域模型的性能
➢ReCopilot-Agent支持了高级别分析任务,与自研专家模型深度集成,展示出自动化漏洞爆
Discussion
➢可能的未来
➢预先执行的反编译优化产生源代码级别的结果➢二进制分析工具中内嵌的智能体➢智能体主导的二进制分析云服务及其规模效应
➢如何防御
➢防御者先手优势➢代码混淆➢对抗样本
Q&A
➢Thanksforlistening
guoqiangchen@qianxin.com, ch3nye@mail.ustc.edu.cn