您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[奇安信技术研究院]:ReCopilot:基于大模型的二进制逆向工程助手 - 发现报告

ReCopilot:基于大模型的二进制逆向工程助手

AI智能总结
查看更多
ReCopilot:基于大模型的二进制逆向工程助手

Baby Introduction Baby Introduction Observation & Insights 1.缺少领域知识 Observation & Insights 1.缺少领域知识2.跨任务能力迁移 Observation & Insights 1.缺少领域知识2.跨任务能力迁移3.Test-timescaling Observation & Insights 1.缺少领域知识2.跨任务能力迁移3.Test-timescaling ReCopilot: Overview ➢构建二进制大模型:recopilot ReCopilot: Raw Dataset ➢Data Sources: 1.现成的软件包:带有调试信息和源代码的软件源Launchpad.net(Ubuntu Software Sources) 2.交叉编译: Cross-Arch, Cross-Opti, Cross-Compiler, Cross-Platform 3.AutoCompiler: Ad-hoc Software/Library Challenges&Insights ReCopilot: Pretrain Dataset ➢目标:建模二进制代码&构建Binary-Source-NL映射关系 ReCopilot: Pretrain Dataset ➢目标:建模二进制代码&构建Binary-Source-NL映射关系➢预训练数据样例 <sourcecode>{comment} {typedef} int foo(){...}</sourcecode> ReCopilot: SFT Dataset ➢目标:学习二进制分析任务&推理能力 Challenges&Insights ReCopilot: SFT Dataset ➢目标:学习二进制分析任务&推理能力➢多任务支持(task tag) ⚫Function Name Recovery <funcname>⚫Function Signature Recovery <signature><signature-str>⚫Variables Recovery <vars>⚫Arguments Recovery <args>⚫Variable Recovery <var:var_name> Challenges&Insights ReCopilot: SFT Dataset ➢目标:学习二进制分析任务&推理能力➢多任务支持(task tag) <context-pseudocode>{context}</context-pseudocode> <pseudocode>{target_func}</pseudocode><Call-Chains>{call_chains}</Call-Chains><Data-Flow>{data_flow}</Data-Flow> ReCopilot: SFT Dataset ➢自动化构建推理过程(Chain-of-ThoughtCoT)➢Generator-Discriminator (persona-driven data synthesis) ReCopilot: DPO Dataset ➢目标:提升输出格式遵循能力&增强推理逻辑一致性 ReCopilot: DPO Dataset ➢目标:提升输出格式遵循能力&增强推理逻辑一致性➢收集good-bad样本对 ReCopilot: Demo Challenges&Insights Evaluation: Overall Effectiveness ➢Tool-level Challenges&Insights Evaluation: Model Comparison (up to date) ➢Model-level评估日期:Oct 17 2025 Next Stage? ➢低级别的分析任务➢需用户主动触发功能➢有限的上下文感知 ReCopilot-Agent ReCopilot-Agent ➢How LLM agent works ReCopilot-Agent: Code Embedding Challenges&Insights ReCopilot-Agent: Code Embedding ➢ReCopilot embedding model performance ReCopilot-Agent: Case Study ➢真实案例: Firmwarevulnerabilityexploitation ReCopilot-Agent: Case Study Challenges&Insights ReCopilot-Agent: Case Study ➢Preliminary Analysis1.Collectinformation POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB RESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB Challenges&Insights ReCopilot-Agent: Case Study ➢Preliminary Analysis1.Collectinformation POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB RESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%p RESPONSE: Oct 10 10:32:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB(nil) Challenges&Insights ReCopilot-Agent: Case Study ➢Preliminary Analysis1.Collectinformation POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB RESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%p RESPONSE: Oct 10 10:32:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB(nil) POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%n RESPONSE: watchdog: restart httpd [PROGRAM CRASHED onlibshared.so] Challenges&Insights ReCopilot-Agent: Case Study ➢ReCopilot-Agent Workflow This binary is from awifi device firmware. We have found ahidden page in the firmware, `/apps_test.asp`, which allowthree parameters:action_mode=?&apps_name=?&apps_flag=?. Each time we post some random value of three key, we thenobserved the log messages in system logging page. Afterfuzzing this endpoint, we have found several odd response: POST:action_mode=install&apps_name=AAAA&apps_flag=BBBBRESPONSE: Oct 10 10:22:04rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBBPOST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%pRESPONSE: Oct 10 10:32:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB(nil)POST:action_mode=install&apps_name=AAAA&apps_flag=BBBB%n Finally, we noticed the crash occurred in thelibshared.sofile. Please help me analyze how this crash happened andwhether it could be an exploitable vulnerability. ReCopilot-Agent: Case Study ReCopilot-Agent: Case Study ➢ReCopilot-Agent Workflow ReCopilot-Agent: Case Study ➢ReCopilot-Agent Workflow ReCopilot-Agent: Case Study ➢ReCopilot-Agent Workflow ReCopilot-Agent: Case Study ➢ReCopilot-Agent Workflow notify_rc(const char *cmd_str)[EXPORT]↳notify_rc_service(cmd_str, 0, 15u);↳j_logmessage_normal("rc_service", "%s %d:notify_rc%s",...);↳logmessage_normal(flag,fmt);↳vsnprintf(msg, 0x200u,fmt, varg_r2); ReCopilot-Agent: Case Study RESPONSE: Oct 10 10:35:24rc_service: httpd 31732:notify_rcstart_apps_installAAAA BBBB…...0x41414120.0x42422041 ➢ReCopilot是一个专为二进制代码分析构建的专家LLM,以更小的模型尺寸实现了超越通用先进LLM和其他领域模型的性能 ➢ReCopilot-Agent支持了高级别分析任务,与自研专家模型深度集成,展示出自动化漏洞爆 Discussion ➢可能的未来 ➢预先执行的反编译优化产生源代码级别的结果➢二进制分析工具中内嵌的智能体➢智能体主导的二进制分析云服务及其规模效应 ➢如何防御 ➢防御者先手优势➢代码混淆➢对抗样本 Q&A ➢Thanksforlistening guoqiangchen@qianxin.com, ch3nye@mail.ustc.edu.cn