行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

2025企业生成式AI峰会报告-Day1

2025-08-20 IEEE 庄晓瑞

IBM Fusion HCIPurnanand Kumar/ Dipali Chatterjee

核心观点
IBM Fusion CAS 是一种软件定义存储数据服务，旨在解决 GenAI 实施中的知识库挑战。其通过内容感知存储架构，实现大型语言模型（LLM）与海量非结构化数据的顺畅交互，增强洞察提取和推荐功能。

关键技术

使用 IBM Storage Scale CNSA 远程挂载文档集群，通过 NVIDIA NeMo 服务解析文档。
解析后的内容和元数据嵌入并索引到向量数据库中。
支持可扩展的增量数据摄取，包括监控文件夹和可选的 AFM。
提供语义、关键词和混合搜索 API，支持可选重排序，可集成到企业 RAG（检索增强生成）管道中。

优势

简单性：自动化 RAG 解决方案。
效率：降低成本/性能比，兼容遗留数据。
安全性：保留数据 ACL，支持数据加密嵌入。

架构

文件和文档通过 CAS 搜索 API 供 GenAI 应用调用，结合 Watson Orchestrate、watsonx.ai 等工具实现数据整合与处理。
支持从 GPFS、NFS、S3 等存储系统聚合数据，提供统一知识库视图。

关键数据

通过智能检测元数据与内容变化，减少数据冗余。
支持 Scale AFM 集成遗留存储数据。

Advancing AI - Next Stage of InnovationJimin (Anna) Yoon

核心观点
AI 开发正从离线模式转向持续在线测试，类似于软件行业的敏捷开发与 CI/CD 流程，以实现快速、安全且用户导向的迭代。

关键转变

历史对比：1990 年代软件发布缓慢，AI 开发类似；如今 AI 模型快速迭代，需类似软件的在线测试与反馈机制。
新测试范式：仅靠离线测试不足，需结合真实用户反馈评估生成式 AI 的效果（如参与度、任务成功率、成本、延迟）。

在线实验核心机制

评估飞轮：构建 AI 功能 → 测试变体（提示词/模型/参数）→ 观察行为 → 改进。
A/B 测试应用：评估产品 UX、LLM 系统质量与对齐度。
关键问题：如何优化用户留存、成本效益、减少幻觉等。

案例研究

帮助机器人测试提示词变化对解决率的影响。
评估小模型在延迟与质量间的权衡。
失败案例：仅关注 token 级指标而忽略用户目标。

最佳实践

分阶段上线，避免直接全量部署。
完整日志记录（输入/输出、模型配置、延迟、token 数量）。
将 LLM 视为实验臂而非静态 API。
建立实验文化以管理风险。

结论
在线测试是 AI 开发的 CI/CD，关键在于快速、安全地迭代，而非仅比拼提示词质量。现代工具（如 Statsig）使中小型企业也能实现这一目标。

Safety by DesignSetting the Standards to Prevent GenAI-Driven Child Exploitation

核心问题
生成式 AI 加剧了儿童性剥削（CSAM/CSEM），包括 AI 生成 CSAM 增加、受害者识别困难、剥削门槛降低等。

关键数据

2024 年 NCMEC 每分钟报告 100 多份 CSAM 文件。
AI 生成 CSAM 报告量同比增长 1250%。
62% 的图像难以区分真伪。
LAION 数据集发现 1008 张确认的 CSAM 图像。
10% 的性勒索报告涉及 AI 生成的虚假图像。
1/10 的未成年人知道同伴使用 AI 生成性图像。

解决方案：安全设计原则

9 项原则：责任、透明、用户控制等。
23 项缓解措施：数据治理、模型审计、水印等。
IEEE P3462 标准：从开发、部署到维护的全生命周期嵌入儿童安全。

标准生命周期

开发阶段：负责任的数据管理、检测/移除/报告。
部署阶段：访问前评估、分阶段上线、输入/输出检测。
维护阶段：政策执行、模型退役、数据删除、持续风险缓解。

案例应用

深度伪造 CSAM：通过数据治理、红队测试、分阶段部署、水印/日志记录、监控反馈等手段应对。

呼吁行动

开发者、研究人员、政策制定者等需协作，推广 P3462 标准，共享滥用案例与解决方案。

San Jose, CA Augmenting GenAI Workloadson IBM Fusion HCI Purnanand Kumar/ Dipali Chatterjee 28 July 2025 Augmenting GenAI Workloads with Content-Aware Storage on IBMFusion HCI for Scalable, Trustworthy, and Accelerated Enterprise AI The objective of the Content-Aware Storage (CAS) architecture is to facilitate smooth interactions betweenLLMs and extensive quantities of unstructured data, amplifying insight derivation and suggestionfunctionalities. IBM Fusion CAS uses IBM Storage Scale CNSA to remote mount a cluster where documents are parsed byNVIDIA NeMo services. Parsed content and metadata are embedded and indexed into a vector DB. Watchfolders and optional AFM enable scalable, incremental ingestion. CAS provides semantic, keyword, andhybrid search APIs with optional reranking. These results can be integrated into enterprise RAG pipelines. Introducing IBM Content-Aware Storage (CAS): a software-defined storage data service thatalleviates the knowledge base challenges for GenAI implementations IBM CAS combines the power of AI document processingwith IBM’sAI storage software, & research innovations… …to jointlybring to market a state-of-the-artstorage-based Knowledge solution that is… Simple: Automated RAG solution Enables Gen AI capabilities on unstructured data in anyon-prem location Efficient: a)Cost/performanceb)Works with legacy data Only process incrementally changed data; High performanceshared storage for data processing; GPU optimized storage foroptimized Document Processing and Search performance Secure: Preserve data ACL; Data encryption for embedding Content Aware Storage (CAS) Improved performance and reduced cost Enables rapid data insights •Caches intermediate processing artifacts for later use•Simplified deployment and enablement of Storage forAI•That is Scalable•Turnkey•Smart detection of metadata versus contentchanges for processing •Augments the storage with a Prompt ready interfacefor GenAI applications•Enact disparate data store across data silos fromStorage infrastructure•Present a unified view of knowledge by aggeratingData pipelines from multiple data sources from yourstorage infrastructure (File and Object)•Bringing more content into knowledge in minutes•Preserve Security Reduces copies of data •Includes data from legacy storage through Scale AFM Keeps the knowledge base up to date •Leverages change tracking to perform incrementalupdates to the searchable knowledge base•Keep knowledge base in sync with the changes inStorage Storage for AI Integrations Advancing AI - NextStage of Innovation A Data-Driven Approach to LLM Optimization ThroughOnline Experimentation Jimin (Anna) Yoon 20 August 2025 From Master Disks to Daily Deploys: A Historical Analogy ●Recap of 1990s software: slow, centralized, one-shot releases.●Comparison to early AI development: long offline cycles, little real user feedback.●Just as software moved to agile and CI/CD, AI is moving to continuous online testing.●Takeaway: You wouldn’t ship code without telemetry or testing—don’t do it with AI. Why AI Needs a New Testing Paradigm ●Offline testing still matters, but it’s insufficient alone: costly, slow, disconnected from user outcomes.●New foundation models drop week after week—precision without speed is obsolete.●Evaluating generative AI requires understanding real-world impact: engagement, task success, cost, latency.●The only way to truly test a generative model is to put it in front of users. Online Experimentation as the Core Engine The evaluation flywheel: ●Build AI feature → test variants (prompt/model/params) → observebehavior → improve.●How A/B testing can evaluate not just product UX, but LLM systemquality and alignment.●Tradeoffs in real deployments: speed vs safety, cost vs performance.●Key questions teams can answer:○Which prompt leads to more retained users?○Does upgrading to GPT-4 Turbo increase engagement enoughto justify the cost?○Does a new reranking logic reduce hallucinations? Case Studies from the Field ●Example: Testing prompt variations in a helpbot for resolution rate.●Example: Evaluating a smaller model’s latency vs quality tradeoff.●What failed: Measuring token-level metrics without understanding user goals.●Real-life tooling setup (Statsig or others): feature gates, logging, standardized inputs/outputs, cost tracking. Best Practices and Guardrails ●Embrace progressive rollout—don’t go 0 → 100% in production.●Log everything: input/output pairs, model config, latency, token count.●Treat LLMs like experiment arms—not static APIs.●Use a culture of experimentation to manage risk and improve outcomes. Final Takeaways ●Online testing is the CI/CD of AI development—fast, safe, user-centric.●This shift isn’t just for big tech: modern tooling (like Statsig) levels the playing field.●AI product success will depend not on who builds the best prompt, but who iterates fastest and safest. Thank You Anna (Jimin) Yoon linkedin.com/in/anaanna417/anna@statsig.co

点击免费查看完整报告

你可能感兴趣

2025企业生成式AI峰会报告-Day1

IBM Fusion HCIPurnanand Kumar/ Dipali Chatterjee

Advancing AI - Next Stage of InnovationJimin (Anna) Yoon

Safety by DesignSetting the Standards to Prevent GenAI-Driven Child Exploitation

你可能感兴趣

2025企业生成式AI峰会报告-Day2

2025生成式AI企业应用实务报告

2025企业生成式AI应用调研：解锁 workplace intelligence 的未来工作模式

IDC2024生态峰会-生成式 AI 驱动的产业数字化转型

2023 生成式 AI - 塑造未来调查报告 - VivaTech 峰会 - 英文