行业研究公司研究宏观策略财报招股书会议纪要 seedance2.0 低空经济 DeepSeek AIGC 大模型

人工智能在作战决策中的应用：基于强化学习与图神经网络的武器目标分配

信息技术2025-03-04-首尔国立大学&韩华海洋华***

AI智能总结

核心观点

该研报针对战场环境中武器-目标分配（WTA）问题，提出了一种基于深度强化学习（DRL）和图神经网络（GNN）的智能决策模型，旨在最大化平台生存概率。传统WTA方法存在模型简化、计算负担重、适应性差等局限性，而DRL方法在表示战场拓扑关系、可扩展性和目标函数实用性方面存在不足。

研究方法

将WTA问题建模为部分可观察马尔可夫决策过程（POMDP），并利用高保真度兵棋推演模拟环境进行验证。
设计了包含观察特征、基于图的动作表示和奖励结构的POMDP模型。
提出了一种GNN-DRL模型，其中GNN用于增强对战场拓扑关系的表示，DRL用于优化决策策略。

关键数据

实验涵盖了海军和地面作战等多个军事领域。
性能指标为完全生存概率（PCS）。
与9种基线方法进行了比较，包括确定性策略和随机策略。
GNN模型在不同深度结构下的性能进行了比较，并进行了t-SNE可视化分析。

研究结论

GNN-DRL模型在泛化性和可扩展性方面优于基线方法。
GNN的实施显著提高了性能。
提出的决策策略考虑了拦截器库存状态和距离相关的命中概率，形成了一种复杂的交战策略。
未来工作将集成基于超启发式的DRL和探索合作多智能体强化学习框架。

Seung Heon Oh1, Geon Woong Byeon1, Young In Cho1, Seungmin Kwon1, and Jong HunWoo1 1Affiliation not available March 04, 2025 Abstract Selecting a threat to attack is one of the most important decisions on the battlefield. The decision problem is represented asa Weapon-Target Assignment problem (WTA) problem. In the previous studies, dynamic programming, linear programming,metaheuristics, and heuristic methods have been applied to solve this problem.However, previous studies have been limitedby oversimplified-model, computational burden, lack of adaptability to disruptive events, and recalculation when the problemsize changes. To overcome these limitations, this study aims to solve WTA by using reinforcement learning and graph neuralnetworks.The proposed method has high practicality by reflecting the real-world decision-making framework, OODA-loop(Observe-Orient-Decide).Experiments are conducted in various environments, and the effectiveness of the proposed methodis demonstrated by comparing it with existing heuristic and meta-heuristic methodologies. The proposed method introduces agroundbreaking methodology for intelligent decision-making in tactical command and control traditionally considered exclusiveto human-expert. Artificial Intelligence in Combat Decision-making:Weapon Target Assignment via ReinforcementLearning and Graph Neural Networks Seung Heon Oh, Geon Woong Byeon, Young In Cho, Seungmin Kwon and Jong Hun Woo solutions based solely on initial conditions.[4]applied agreedy algorithm, and[5]utilized stochastic programmingto solve WTA problems, while[6]combined greedy heuris-tics with nonlinear network flow. However, these open-loop approaches have limitations in adapting to rapidlyevolving combat situations. They require computationallyexpensive replanning to react to unpredictable or stochas-tic events such as new threat insertion, decoying event,or target hit. Their computational inefficiency contradictsthe OODA loop’s rapid decision-making principle.To address this issue, the closed-loop approach performs Abstract—Selecting a target to attack is one of the3most critical decisions on the battlefield. The decision4problemis represented as a dynamic weapon-target5assignment(DWTA)problem.While deep reinforce-6ment learning(DRL)is the state-of-the-art approach7for DWTA, previous studies have limitations in three8key aspects: representing topological relationships on9the battlefield, scalability to increased problem sizes,10and the practicality of the objective function. To over-11come these limitations, this study aims to solve the12DWTA problem by leveraging DRL and graph neu-13ral networks(GNN),with a novel partially observable14Markovdecision process(POMDP)design including15graph-based action representation, observation feature,16andreward structuring.Experiments are conducted17across multiple military domains, including naval and18ground combat, comparing the proposed approach with19existingheuristic and meta-heuristic methodologies.20Theeffectiveness of the GNN and decision-making21pattern is extensively analyzed through comprehensive22experimental validation.23434445464748495051525354555657585960 real-time decision making, which includes methods such asexact, two-stage, and heuristic approaches. Exact methodslike dynamic programming[7]and mixed-integer linearprogramming[8]adopt state-based sequential decision-making.Despite their optimality guarantee,they facecurse of dimensionality and computational burden. Meta-heuristics offer efficient alternatives, with[3]combiningconstructive heuristics and tabu search, and[9]applyinggenetic algorithm(GA).Adopting anytime frameworks,meta-heuristic methods gradually improve solutions untilreaching time user-defined limits, allowing real-time imple-mentation. Two-stage approaches decompose WTA into se-quencing and assignment problems to enhance the compu-tational efficiency.[10]adopts the Hungarian algorithm forassignment and particle swarm optimization for sequenc-ing. Both meta-heuristic and two-stage methods remainsensitive to computation time and problem scale. Heuristicapproaches[11], [12], [13]provide quick adaptation withminimal computation, despite suboptimal solutions. No-tably, recent studies[13], [14]emphasize the integrationof high-fidelity wargame simulations to enhance real-worldapplicability beyond lab-scale combinatorial optimizationresearch.Deep Reinforcement Learning(DRL)approaches, includ- Index Terms—Weapon Target Assignment Problem,Reinforcement Learning, Graph Neural Network6162 I. Introduction Combat commanders must make decisions under ex-27treme uncertainty, which stems from incomplete enemy in-28formation and unpredictable events. The OODA(Observe-29Orient-Decide-Act)loop emphasizes that combat comman-30ders must rapidly adapt their decision-making to evolving31battlefield conditions through cyclic information process-32ing and action under uncertainty. Weapon Target Assign-33ment(WTA),a key element in combat decision

点击免费查看完整报告

你可能感兴趣

人工智能在作战决策中的应用：基于强化学习与图神经网络的武器目标分配

你可能感兴趣

“学海拾珠”系列之一百四十九：基于强化学习和障碍函数的自适应风险管理在组合优化中的应用

量化分析报告：基于强化学习的组合优化在指增策略中的应用

印度陆战研究中心-人工智能在军事中的应用：21世纪作战空间的演变与作战-2021.3-13页

人工智能在自动驾驶决策中的应用与思考

第十届挑战赛C2-基于对偶对比学习文本分类及图神经网络的周边游需求图谱构建与分析