行业研究公司研究宏观策略财报招股书会议纪要 Token 低空经济十五五 AIGC 大模型

Building an Interoperable Data Lakehouse: Data Strategy for AI Leaders

信息技术 2026-04-06 Snowflake 黄崇贵-中国医药城15189901173

核心观点

数据架构困境：企业面临数据孤岛、碎片化和重复管道等问题，导致数据难以管理和利用，阻碍了人工智能的发展。
可互操作湖屋：提出了一种新的数据架构——可互操作湖屋，它结合了数据湖的灵活性和数据仓库的治理能力，并通过开放表格格式（如 Apache Iceberg）实现跨引擎互操作性。
三大支柱：可互操作湖屋的三大支柱是双向互操作性、规模化简化和针对人工智能的通用治理，这些支柱共同为企业提供了一个开放、灵活、可扩展且易于管理的数据平台。
Snowflake 的解决方案：Snowflake 提供了一系列工具和服务，如 Snowflake Horizon 目录、Cortex 代码命令行界面、动态表格和零拷贝数据集成，帮助企业构建和运营可互操作湖屋。
商业案例：多个企业案例展示了可互操作湖屋带来的效益，包括高盛、Affirm 和 Indeed，它们通过采用 Snowflake 和 Apache Iceberg 实现了更快的洞察力、更强的控制力、更低的成本和更高的效率。

关键数据

高盛将数据处理周期从 15 天缩短至 1 天。
Affirm 通过高性能变更数据捕获管道，将月度服务成本降低了 6 倍。
Indeed 使用 Snowflake 查询 Iceberg 表比以前的方法节省了 43-74% 的费用。

研究结论

可互操作湖屋是企业实现人工智能战略的关键基础设施。
Snowflake 提供了构建和运营可互操作湖屋的全面解决方案。
采用可互操作湖屋的企业可以获得显著的业务价值，包括更快的洞察力、更强的控制力、更低的成本和更高的效率。

Table of Contents Foreword: The AI Moment Demands a Connected Architecture3The Data Architecture Dilemma: Why Enterprises Are Stuck (orThinkThey Are)4Rethinking Architecture — from Data Silos to the Open, Interoperable Lakehouse6Connecting Data Without Compromise10 The AI Moment Demandsa Connected Architecture Businesses today are confronting the AI imperative, the pressure to findmeasurable value and efficiencies from using AI. They are looking beyondexperimentation and onto production, ultimately finding that successful AIdoesn’t really start with models — it starts with data. That is precisely why the way, a best-of-both-words architecture that combines the flexibility to empowereach data customer with their preferred setup, while maintaining central data approaches lack the foundational mechanism to let organizations realize fullagency over their data. Bidirectional interoperability means the ability tosecurely bring any engine to your data or access any data for both read and considerations. It involves standardizing metadata, lineage tracking and qualitychecks across all data sources, all in an effort to make data consistently clean,curated, labeled, accessible and governed across systems. But in practice, manybusinesses end up with a fragmented and duplicative data estate that spanssystems, clouds and regions, making it difficult to scale and near-impossibleto govern. It’s not unusual to see companies utilize different data warehouses, The Data Architecture Dilemma:Why Enterprises Are Stuck(or Think They Are) for the right pathway to get to the problem itself. In some cases, they just end up building a newpath, a short-term fix that often results in brittle, expensive and time-consuming pipelines that relyon copying data. That not only causes needless confusion as to the source of truth but it limits the processes to address their specific needs. Marketing and sales may turn toa data warehouse, while data science prioritizes a data lake. Adding to thecomplexity is the large ecosystem of enterprise SaaS tools, like Workday andSalesforce, all generating valuable data businesses need to power their AI proverbial firefighters, just hoping to keep pace with each new blaze before an all-out inferno breaksout. And with complex architectures that sprawl across multiple engines and multiple clouds, thesesystems can create tremendous operational drag. operational overhead or a managed solution that forces some degree of vendor lock-in. Either optioncomes with significant tradeoffs — most notably in reliability and cost. What legacy lakehouse approaches are missing is actually something attainable: a connectedyet open data and governance foundation that provides the flexibility for teams to choose their preferred engines and tools while centralizing governance and semantics. In this book, we will explore the open and interoperable data lakehouse — as a concept, its originsand some best practices designed to help enterprises succeed with AI, not just react to it.We will go over the three essential pillars holding up this architecture — bidirectional interoperability,streamlining for scale and universal governance for AI — all to demonstrate the near limitless power fewer headaches and lower costs. We’ll show how enterprises can gain full ownership of their dataestates again, freed from the fears of getting locked into that proverbial maze. Rethinking Architecture— from Data Silos tothe Open, InteroperableLakehouse Organizing it all can easily feel like a Sisyphean undertaking, endlessly managingdata warehouses for structured data alongside data lakes for unstructuredimages, videos or documents. In practice, this often created more silos andforced architects into complexity. But then, the lakehouse emerged to help This type of storage, capable of holding all sorts of data regardless of structure, is called a data lake. Data warehouse vs. lakehouse vs. data lake Then, there is the third type: the data lakehouse. To extend the metaphor, this garage has adjustablemetal racks along the walls and a few plastic bins that are well organized, but not everything fitsneatly into a box — and that’s OK. By and large, though, the garage is orderly, with each item having To help you understand the differences between traditional approaches to a lakehouse and aninteroperable lakehouse, we first need to review the three core architectures found in most datasystems today. Consider how people use their garages for storage. Think of those hyperorganizedfolks who have constructed built-in shelving along every wall and have organized all their belongings As one would imagine, each architecture has its own virtues (even the data lake — with its flexibilityand willingness to hold all kinds of stuff!), so it becomes important to assess which is right for anygiven circumstance. But what’s becoming clear for many forward-thinking organizations is that theinteroperable lakehouseis proving to be themodernoption that de

点击免费查看完整报告

你可能感兴趣

Microsoft Fabric for AI Readiness with a Data Mesh Strategy

信息技术Hexaware2024-12-10

D&B: 2026 Manufacturing Pulse Survey Report: Building Supply Chain Resilience for Manufacturing Leaders in the Age of Complexity (English Edition + Translation) (13 pages).pdf_Three Tinkerers Report

邓白氏2026-01-28

Protecting AI Data: Why GPUs Change the Landscape - SupremeRAID™ Ultra and InnoGrit N3X Breakthrough Parity Check RAID Performance for Massive Parallel AI I/O

信息技术图睿科技&英诺2026-04-03

How School District Leaders Can Support the Use of Data to Improve Teaching and Learning

文化传媒ACT Research2015-04-17

Why IT security leaders are prioritizing data discovery & classification

信息技术CB Insights2022-04-08

Building an Interoperable Data Lakehouse: Data Strategy for AI Leaders

核心观点

关键数据

研究结论

你可能感兴趣

Microsoft Fabric for AI Readiness with a Data Mesh Strategy

D&B: 2026 Manufacturing Pulse Survey Report: Building Supply Chain Resilience for Manufacturing Leaders in the Age of Complexity (English Edition + Translation) (13 pages).pdf_Three Tinkerers Report

How To Achieve True Net-Zero Emissions: An Action Plan for Supply Chain Leaders

On top of the data: Prepare for an eventful autumn

Multiplying the Value of AI with an AI Strategy

Data Lakehouse：你的下一个数据仓库

报告： SQL Data Lakehouse 主权指南

Protecting AI Data: Why GPUs Change the Landscape - SupremeRAID™ Ultra and InnoGrit N3X Breakthrough Parity Check RAID Performance for Massive Parallel AI I/O

How School District Leaders Can Support the Use of Data to Improve Teaching and Learning

Why IT security leaders are prioritizing data discovery & classification

Building an Interoperable Data Lakehouse: Data Strategy for AI Leaders

你可能感兴趣

Microsoft Fabric for AI Readiness with a Data Mesh Strategy

D&B: 2026 Manufacturing Pulse Survey Report: Building Supply Chain Resilience for Manufacturing Leaders in the Age of Complexity (English Edition + Translation) (13 pages).pdf_Three Tinkerers Report

How To Achieve True Net-Zero Emissions: An Action Plan for Supply Chain Leaders

On top of the data: Prepare for an eventful autumn

Multiplying the Value of AI with an AI Strategy

Data Lakehouse：你的下一个数据仓库

报告 ： SQL Data Lakehouse 主权指南

Protecting AI Data: Why GPUs Change the Landscape - SupremeRAID™ Ultra and InnoGrit N3X Breakthrough Parity Check RAID Performance for Massive Parallel AI I/O

How School District Leaders Can Support the Use of Data to Improve Teaching and Learning

Why IT security leaders are prioritizing data discovery & classification

报告： SQL Data Lakehouse 主权指南