您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[首尔国立大学&韩华海洋]:人工智能在作战决策中的应用:基于强化学习与图神经网络的武器目标分配 - 发现报告

人工智能在作战决策中的应用:基于强化学习与图神经网络的武器目标分配

人工智能在作战决策中的应用:基于强化学习与图神经网络的武器目标分配

Seung Heon Oh1, Geon Woong Byeon1, Young In Cho1, Seungmin Kwon1, and Jong HunWoo1 1Affiliation not available March 04, 2025 Abstract Selecting a threat to attack is one of the most important decisions on the battlefield. The decision problem is represented asa Weapon-Target Assignment problem (WTA) problem. In the previous studies, dynamic programming, linear programming,metaheuristics, and heuristic methods have been applied to solve this problem.However, previous studies have been limitedby oversimplified-model, computational burden, lack of adaptability to disruptive events, and recalculation when the problemsize changes. To overcome these limitations, this study aims to solve WTA by using reinforcement learning and graph neuralnetworks.The proposed method has high practicality by reflecting the real-world decision-making framework, OODA-loop(Observe-Orient-Decide).Experiments are conducted in various environments, and the effectiveness of the proposed methodis demonstrated by comparing it with existing heuristic and meta-heuristic methodologies. The proposed method introduces agroundbreaking methodology for intelligent decision-making in tactical command and control traditionally considered exclusiveto human-expert. Artificial Intelligence in Combat Decision-making:Weapon Target Assignment via ReinforcementLearning and Graph Neural Networks Seung Heon Oh, Geon Woong Byeon, Young In Cho, Seungmin Kwon and Jong Hun Woo solutions based solely on initial conditions.[4]applied agreedy algorithm, and[5]utilized stochastic programmingto solve WTA problems, while[6]combined greedy heuris-tics with nonlinear network flow. However, these open-loop approaches have limitations in adapting to rapidlyevolving combat situations. They require computationallyexpensive replanning to react to unpredictable or stochas-tic events such as new threat insertion, decoying event,or target hit. Their computational inefficiency contradictsthe OODA loop’s rapid decision-making principle.To address this issue, the closed-loop approach performs Abstract—Selecting a target to attack is one of the3most critical decisions on the battlefield. The decision4problemis represented as a dynamic weapon-target5assignment(DWTA)problem.While deep reinforce-6ment learning(DRL)is the state-of-the-art approach7for DWTA, previous studies have limitations in three8key aspects: representing topological relationships on9the battlefield, scalability to increased problem sizes,10and the practicality of the objective function. To over-11come these limitations, this study aims to solve the12DWTA problem by leveraging DRL and graph neu-13ral networks(GNN),with a novel partially observable14Markovdecision process(POMDP)design including15graph-based action representation, observation feature,16andreward structuring.Experiments are conducted17across multiple military domains, including naval and18ground combat, comparing the proposed approach with19existingheuristic and meta-heuristic methodologies.20Theeffectiveness of the GNN and decision-making21pattern is extensively analyzed through comprehensive22experimental validation.23434445464748495051525354555657585960 real-time decision making, which includes methods such asexact, two-stage, and heuristic approaches. Exact methodslike dynamic programming[7]and mixed-integer linearprogramming[8]adopt state-based sequential decision-making.Despite their optimality guarantee,they facecurse of dimensionality and computational burden. Meta-heuristics offer efficient alternatives, with[3]combiningconstructive heuristics and tabu search, and[9]applyinggenetic algorithm(GA).Adopting anytime frameworks,meta-heuristic methods gradually improve solutions untilreaching time user-defined limits, allowing real-time imple-mentation. Two-stage approaches decompose WTA into se-quencing and assignment problems to enhance the compu-tational efficiency.[10]adopts the Hungarian algorithm forassignment and particle swarm optimization for sequenc-ing. Both meta-heuristic and two-stage methods remainsensitive to computation time and problem scale. Heuristicapproaches[11], [12], [13]provide quick adaptation withminimal computation, despite suboptimal solutions. No-tably, recent studies[13], [14]emphasize the integrationof high-fidelity wargame simulations to enhance real-worldapplicability beyond lab-scale combinatorial optimizationresearch.Deep Reinforcement Learning(DRL)approaches, includ- Index Terms—Weapon Target Assignment Problem,Reinforcement Learning, Graph Neural Network6162 I. Introduction Combat commanders must make decisions under ex-27treme uncertainty, which stems from incomplete enemy in-28formation and unpredictable events. The OODA(Observe-29Orient-Decide-Act)loop emphasizes that combat comman-30ders must rapidly adapt their decision-making to evolving31battlefield conditions through cyclic information process-32ing and action under uncertainty. Weapon Target Assign-33ment(WTA),a key element in combat decision