您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[-]:针对干扰攻击下的协作无人机中继网络的多智能体深度强化学习 - 发现报告

针对干扰攻击下的协作无人机中继网络的多智能体深度强化学习

信息技术2026-01-08--X***
针对干扰攻击下的协作无人机中继网络的多智能体深度强化学习

Thai Duong Nguyen1, Ngoc-Tan Nguyen1, Thanh-Dao Nguyen1, Nguyen Van Huynh2,Dinh-Hieu Tran3, and Symeon Chatzinotas3 1VNU - University of Engineering and Technology (VNU - UET), Hanoi, Vietnam2School of Computer Science and Informatics, University of Liverpool, United Kingdom3Interdisciplinary Centre for Security Reliability and Trust (SnT), University of Luxembourg, Luxembourg Prior research has typically addressed these challenges fromtwo main perspectives. One significant body of work hasfocused on communication-centric optimization, employingmathematical programming to determine optimal static place-ments for UAVs that maximize throughput or coverage [3].While theoretically sound, this static paradigm creates pre-dictable and high-value targets, rendering the network highlyvulnerable to kinetic or electronic attacks. Another line ofresearch has pursued survivability through motion, proposingpre-computed or reactive trajectories to evade threats [4].Yet, such approaches are often rigid, failing to adapt in real-time to the unpredictable mobility of allied ground units orthe dynamic nature of emergent threats. The fundamentallimitation of these prior works is the tendency to treat thesechallengesas separable sub-problems. This overlooks thedeeply coupled nature of an agent’s physical positioning, itsresource allocation, and the overall spectral resilience of thenetwork.This paper argues that the key to resilient UAV networks Abstract—The deployment of Unmanned Aerial Vehicle (UAV)swarms as dynamic communication relays is critical for next-generation tactical networks. However, operating in contested en-vironments requires solving a complex trade-off, including max-imizing system throughput while ensuring collision avoidanceand resilience against adversarial jamming. Existing heuristic-based approaches often struggle to find effective solutions duetothe dynamic and multi-objective nature of this problem.This paper formulates this challenge as a cooperative Multi-Agent Reinforcement Learning (MARL) problem, solved usingthe Centralized Training with Decentralized Execution (CTDE)framework. Our approach employs a centralized critic that usesglobal state information to guide decentralized actors whichoperate using only local observations. Simulation results showthat our proposed framework significantly outperforms heuristicbaselines, increasing the total system throughput by approxi-mately 50% while simultaneously achieving a near-zero collisionrate. A key finding is that the agents develop an emergent anti-jamming strategy without explicit programming. They learn tointelligently position themselves to balance the trade-off betweenmitigating interference from jammers and maintaining effectivecommunication links with ground users.IndexTerms—Multi-Agent Deep Reinforcement Learning lies in learning an optimalbehavioral policyrather than find-ing an optimalposition. We propose that robust operationalpatterns should be anemergent property of a holistic andmulti-objective learning process. Accordingly, we formulatethe problem within the framework of Multi-Agent Reinforce-ment Learning (MARL) [5]. We model the UAV swarms as acooperative team of intelligent agents tasked with learninga complex and decentralized policy that dynamically co-optimizes network performance and survivability in responseto real-time environmental feedback. Our approach utilizes theCentralized Training with Decentralized Execution (CTDE)architecture, which is well-suited for this task [6]. It enablesthe team to learn complex cooperative strategies from globalinformation during an offline training phase, while executingactions based solely on local observations during deployment,thereby removing the need for high-bandwidth inter-agentcommunication in the field.The primary contributions of this work are threefold:arXiv:2512.08341v1 [cs.NI] 9 Dec 2025 (MADRL), Centralized Training with Decentralized Execution(CTDE), UAV, Trajectory Planning, Resource Allocation. I. INTRODUCTION Unmanned Aerial Vehicles (UAVs) have emerged as a cor-nerstone of modern network architectures, poised to provideagile and on-demand connectivity for next-generation systems,ranging from 6G cellular networks to tactical edge computing[1]. In military and disaster-response scenarios, their ability torapidly deploy as aerial base stations for ground assets, such asGround Combat Vehicles (GCVs), is transformative. However,these deployments face considerable challenges, includingboth physical and spectral issues. In particular, a swarm ofUAVs must not only navigate complex terrains and avoidcollisions but also withstand complex electronic warfare, suchas adversarial jamming, which can cripple communicationlinks and compromise mission objectives [2]. This createsa fundamental trade-off between maximizing communicationperformance and ensuring multi-faceted operational surviv-ability. •We present a comprehensive MARL formulation for thejoint UAV operation problem that inte