行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

dgp：一种用于图像增强大语言模型欺诈检测的双粒度提示框架

信息技术 2025-07-29 - 新加坡国立大学&字节跳动 CS杨林

核心观点

研报提出了一种名为 Dual Granularity Prompting (DGP) 的图提示框架，用于基于异构图的欺诈检测。DGP 通过保留目标节点的细粒度文本细节，同时将邻居信息汇总为粗粒度的文本提示，从而减轻信息过载问题。

关键数据和研究结论

数据集：研报在四个图数据集上进行了实验，包括两个公开数据集（YelpReviews 和 Amazon）和两个行业数据集（LifeService 和 E-Commerce）。这些数据集具有混合文本/数值特征、多类型边和失衡的欺诈标签。
性能表现：DGP 在所有数据集和评估指标上均取得了显著的性能提升，超越了包括 GNN 和 LLM 基线在内的多种方法。与最先进的 GNN 基线相比，DGP 的 AUPRC 提升了 6.8%。
消融实验：消融实验表明，文本摘要和数值摘要对于捕获异构欺诈图中的语义和统计信号至关重要。文本特征的重要性高于数值特征。
摘要长度影响：研究发现，较短的摘要（例如 5 个 token）提供不足的上下文，而较长的摘要（例如 80 个 token）可能导致 token 稀释，从而降低准确率。相对较小的摘要预算（例如 10 个 token）即可获得最佳性能。
任务感知摘要：任务感知摘要可能会降低 DGP 的性能，因为它们限制了 LLM 捕获微妙欺诈信号的能力。任务无关的提示允许更广泛的线索发现，可能支持更鲁棒的分类。

方法论

DGP 由三个核心模块组成：

节点级摘要：将每个节点的原始文本压缩成简洁且具有代表性的内在摘要。
基于扩散的元路径修剪：使用马尔可夫扩散核 (MDK) 选择沿每个元路径最相关的邻居，以减少语义噪声并专注于信息丰富的上下文。
元路径级摘要：将选定邻居的节点级摘要汇总成紧凑的、高级的文本表示，进一步减少冗余。

复杂度分析

Token 消耗：DGP 与全邻居方法和全向量化方法相比，显著减少了 token 消耗。
时间复杂度：DGP 的训练和推理复杂度均随节点数量线性增长，并且不受出度的限制，突出了其在实际应用中的实用性。

Yuan Li1, Jun Hu1, Bryan Hooi1, Bingsheng He1, Cheng Chen2 1National University of Singapore2ByteDance Inc.li.yuan@u.nus.edu,{jun.hu, dcsbhk, dcsheb}@nus.edu.sg, chencheng.sg@bytedance.com Abstract Real-world fraud detection applications benefits from graphlearning techniques that jointly exploit node features—oftenrich in textual data—and graph structural information. Re-cently, Graph-Enhanced LLMs emerge as a promising graphlearningapproach that converts graph information intoprompts, exploiting LLMs’ ability to reason over both textualand structural information. Among them, text-only prompt-ing, which converts graph information to prompts consist-ing solely of text tokens, offers a solution that relies only onLLM tuning without requiring additional graph-specific en-coders. However, text-only prompting struggles on heteroge-neous fraud-detection graphs: multi-hop relations expand ex-ponentially with each additional hop, leading to rapidly grow-ing neighborhoods associated with dense textual information.These neighborhoods may overwhelm the model with long,irrelevant content in the prompt and suppress key signalsfrom the target node, thereby degrading performance. To ad-dress this challenge, we propose Dual Granularity Prompting(DGP), which mitigates information overload by preservingfine-grained textual details for the target node while summa-rizing neighbor information into coarse-grained text prompts.DGP introduces tailored summarization strategies for differ-ent data modalities—bi-level semantic abstraction for tex-tual fields and statistical aggregation for numerical features—enabling effective compression of verbose neighbor contentinto concise, informative prompts. Experiments across publicand industrial datasets demonstrate that DGP operates withina manageable token budget while improving fraud detectionperformance by up to 6.8% (AUPRC) over state-of-the-artmethods, showing the potential of Graph-Enhanced LLMs forfraud detection. McAuley and Leskovec 2013) benefit from advanced graphlearning techniques.Graph-Enhanced LLMs for Fraud Detection.In recent years, various Graph Neural Networks (GNNs) have beenproposed for graph-based fraud detection, achieving notablesuccess by leveraging neighborhood information and struc-tural patterns to enhance detection accuracy (Duan et al.2024; Li et al. 2024). More recently, graph-enhanced LargeLanguage Models (LLMs) have emerged as a promising al-ternative for graph-based fraud detection tasks, leveragingtheir generalizable language capabilities and demonstratingcompetitive performance across a range of tasks (Tang et al.2024a,b; Liu et al. 2024b). These approaches have shownpotential in analyzing the rich semantics associated withfraudulent nodes, as well as the diverse relationships amongthem (as illustrated in Figure 1a), by exploiting the seman-tic nuances within the graph (Tang et al. 2024a). Notably,we distinguish these methods from LLM-enhanced GNNssuch as TAPE (He et al. 2024) and FLAG (Yang et al. 2025),arXiv:2507.21653v1 [cs.LG] 29 Jul 2025 1Introduction Graph-based fraud detection has emerged as a critical re-search direction, driven by its effectiveness in capturing thecomplex relational patterns inherent in real-world data (Xuet al. 2024; Akoglu, Tong, and Koutra 2015; Rayana andAkoglu 2015). The intricate structural properties of graphs,combined with the rich semantic and numerical informationon nodes, present unique opportunities and challenges foreffectively identifying fraudulent entities. Real-world appli-cations such as anomaly detection in social networks (Chenet al. 2024; Sharma et al. 2018), fake account identifica-tion (Li et al. 2022; Hooi et al. 2017), and the detection ofmalicious user-generated content (Rayana and Akoglu 2015; to 6.8% (AUPRC), demonstrating the effectiveness of ourdual-granularity design with reasonable token budgets.The key contribution of this work is three-fold: • We propose DGP, a novel graph prompting frameworkthat integrates fine-grained textual details for target nodeswith coarse-grained semantic summaries for their neigh-bors, thereby overcoming limitations faced by existinggraph-to-prompt methods.• We introduce specialized summarization strategies forcompressing neighborhoods associated with textual andnumerical features into concise, semantically meaningfulprompts tailored for LLM processing.• Extensive experiments on public and industry datasetsdemonstrate the superior empirical performance of DGP,achieving manageable prompt lengths while improvingfraud detection performance by up to 6.8% in AUPRCcompared to state-of-the-art approaches. which incorporate LLM-encoded features and rely heavilyon the classification capabilities of GNNs. In this work, wefocus on leveraging graph-enhanced LLMs as standaloneclassifiers to fully explore their potential in graph-basedfraud detection.To bridge the gap between graph-structured data and 2Related Work LLMs, graph-enhanced LLMs transform graph data into tex-

点击免费查看完整报告

dgp：一种用于图像增强大语言模型欺诈检测的双粒度提示框架

核心观点

关键数据和研究结论

方法论

复杂度分析

你可能感兴趣

大海捞针：一种用于支付系统异常检测的机器学习框架

一种用于增强局部高氧合的新型光合生物局部凝胶可促进外周动脉疾病的伤口愈合

预测gcc gdp：一种用于增强非石油gdp预测的机器学习解决方案（英）

用于欺诈分析的 Splunk 应用程序

【T112017-数据工程和技术分会场】用于图像标记的应用深度学习-旅行推荐引擎应用

所有的工作都在哪里？ : 一种用于发展中国家高分辨率城市就业预测的机器学习方法

一种用于超级细菌的新型非抗生素药物

克服数据稀疏性：一种机器学习方法，用于跟踪 COVID-19 对撒哈拉以南非洲的实时影响

基础帮助技能培训手册：一种基于能力的方法，用于培训帮助者以支持成年人

一种用于太赫兹行波管的分段正弦波导