行业研究公司研究宏观策略财报招股书会议纪要海南封关低空经济 DeepSeek AIGC 大模型

BiAssemble：学习双臂几何装配的协同可供性

2025-06-10-北京大学表***

AI智能总结

核心观点：本研报提出了一种名为 BiAssembly 的双臂几何形状组装框架，旨在解决不规则碎片（如破碎的碗）的视觉协作组装问题。该框架利用点级可供性来感知局部几何信息，并通过预测解装方向、转换预测和对齐/组装动作来指导双臂协作组装过程。

关键数据：

实验数据集：Breaking Bad Dataset，包含 15 个类别、445 个形状和 11,820 对碎片，用于训练和测试。
评估指标：组装成功率，即两片碎片在组装结束时相对距离和旋转角度在阈值范围内。
基线方法：ACT、启发式方法、SE(3)-Equiv 和 DualAfford。
实验结果：BiAssembly 框架在训练类别和新类别上的成功率均优于基线方法，证明了其有效性和泛化能力。

研究结论： BiAssembly 框架能够有效地指导双臂机器人进行几何形状组装，并具有良好的泛化能力，可以应用于更广泛的协作操作任务。此外，本研报还构建了一个包含全局可用对象的现实世界基准，为机器人几何组装策略的评估提供了标准化的平台。

1. IntroductionEqual contribution1Peking University2Proceedings of the42ndby the author(s). * geometric collisions), with part poses denoted as alignmentposes. We then gradually move the fragments toward eachother to fit them together precisely. The alignment posesof the two fragments can be obtained by disassembling as-sembled parts in opposite directions. With this information,it becomes straightforward to extend the geometry-awareaffordance to further be aware of whether the controller canmove fragments into the alignment poses without collisions.We develop a simulation environment where robots canbe controlled to assemble broken parts. This simulationenvironment bridges the gap between vision-based pose pre-diction for broken parts and the real-world robotic geometricassembly. Moreover, since broken parts exhibit varied ge-ometries (e.g., the same bowl falling from different heightsbreaking into different groups of fragments), it is challeng- of bimanual collaboration. Consequently, the policy mustaccount for geometry, contact-rich assembly processes, andbimanual coordination.We propose ourBiAssembleframework for this challengingtask. For geometric awareness, we utilize point-level affor-dance, which is trained to focus on local geometry. This ap-proach has demonstrated strong geometric generalization indiverse tasks (Wu et al., 2022; 2023b), including short-termbimanual manipulation (Zhao et al., 2022), such as pushinga box or lifting a basket. To enhance the affordance modelwith an understanding of subsequent long-horizon bimanualassembly actions, we draw inspiration from how humansintuitively assemble fragments: after picking up two frag-ments, we align them at the seam, deliberately leaving a gap(since directly placing them in the target pose often causes ing to fairly assess policy performance in real-world settings.To address this, we further introduce a real-world bench-mark featuring globally available objects with reproduciblebroken parts, along with their corresponding 3D meshes,which can be integrated into simulation environment. Thisbenchmark enables consistent and fair evaluation of roboticgeometric assembly policies. Extensive experiments on di-verse categories demonstrate the superiority of our methodboth quantitatively and qualitatively.2. Related Work2.1. 3D Shape AssemblyShape assembly is a well-established problem in visual ma-nipulation, with many studies focusing on constructing acomplete shape from given parts, typically involving thepose prediction of each part for accurate placement (Zhanet al., 2020; Wang et al., 2022a). Further work (Heo et al.,2023; Tian et al., 2022; Jones et al., 2021; Willis et al., 2022;Tie et al., 2025) studied assembly with robotic execution,requiring robots to carry out each step, with benchmarksspanning various applications from home furniture assem-bly (Lee et al., 2021) to factory-based nut-and-bolt inter-actions (Narang et al., 2022). We categorize these tasksinto two types: furniture assembly and geometric assembly.This paper focuses on geometric assembly, where piecesare irregular and lack semantic definitions, such as in thecase of a broken bowl, making categorization difficult. Thiscontrasts with furniture assembly that involves parts likenuts, bolts, and screws, each with specific functions andclear categorization.Previous work on geometric assembly (Sell´an et al., 2022;Chen et al., 2022; Lu et al., 2024c), includes Wu et al.(2023c), which learns SE(3)-equivariant part representa-tions by capturing part correlations for assembly, and Leeet al. (2024), which introduces a low-complexity, high-orderfeature transform layer that refines feature pair matching.However, these methods primarily focus on synthesizingparts into a cohesive object based on pose considerationswithout incorporating robotic execution, which is impracti-cal in real-world scenarios where collisions may occur if theassembly process ignores actions. To tackle this challenge,we introduce a robotic bimanual geometric assembly frame-work that leverages two robots to collaboratively assemblepieces, enhancing stability in real-world execution.2.2. Bimanual ManipulationBimanual manipulation (Chen et al., 2023; Grannen et al.,2023; Mu et al., 2021; Chitnis et al., 2020; Lee et al., 2015;Xie et al., 2020; Ren et al., 2024b; Liu et al., 2024; 2022;Li et al., 2023; Mu et al., 2024) offers several advantages,particularly in tasks requiring stable control or wide action space. Current research primarily focuses on planning andcollaboration. ACT (Fu et al., 2024; Zhao et al., 2023) in-troduces a transformer-based encoder-decoder architecturethat leverages semantic knowledge from image inputs to pre-dict bimanual actions. PerAct2 (Grotz et al., 2024) learnsfeatures at both voxel and language levels, utilizing sharedand private transformer blocks to coordinate two roboticarms based on semantic instructions.However, in tasksrich in geometric complexity, where objects have limited se-mantic

点击免费查看完整报告

你可能感兴趣

BiAssemble：学习双臂几何装配的协同可供性

你可能感兴趣

基于黎曼几何的迁移学习可减少 c-VEP BCI 的训练时间

生成式机器人：用于人机协同创作的自监督学习

购买：几何购买–良好的长期协同作用

第九届挑战赛B3-基于深度学习的岩石样本岩性识别与含油面积百分含量计算

网络安全中的联邦学习：性能、鲁棒性与对抗性威胁