AI智能总结
Gaurav ChadhaSenior Development ManagerMySQLHeatWaveMay 1, 2024 Safe harbor statement The following is intended to outline our general product direction.It is intended for information purposes only, and may not beincorporated into any contract. It is not a commitment to deliverany material, code, or functionality, and should not be reliedupon in making purchasing decisions. The development, release,timing,and pricing of any features or functionality described forOracle’s products may change and remains at the sole discretionof Oracle Corporation. Data comes in different flavors and volumes MySQL HeatWave ProcessALLworkloadswithHeatWave Lakehouse Lowest cost in industry for data warehousePrice performance comparison 10TB TPC-H According to10 TB TPC-H benchmarksas of May 23, 2023. Redshift, Snowflake, Databricks andBigQuerynumbers for 10TB TPC-H numbers are provided by a third party.Benchmark queries are derived from the TPC-H benchmarks, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specifications. Analytic functions–CUBE, HLLFACILITATES MIGRATION OF NON-MYSQL WORKLOADS 99.5% 99.5% of collected data remains unused HeatWaveLakehousetable interface Easy interface for data in object store as external table•ProvidesLakehouse-specific functionalitywith existing syntaxand is extensible External source file locations specified in extensible JSON interface•Files can be distributed across multiple object store buckets 100% compliant with standard MySQL syntax MySQL Autopilot-Auto Parallel Load in ActionAutomatically generated from files DDL to create non-existing DBs DDL to create non-existingtables •Using inferred column types •Length•Precision•Setting engines•Setting engine attribute•Can extract column names Same performance for data in DB or in object storeDevelop applications with data on object store without any performance impact HeatWaveLakehouse scales all the way to 500 TB HeatWave Lakehouse extends support to semi-structured data •JSONdata inCSV,Parquet, andAvrofile formats can now be processed by HeatWave •Support extended to newline-delimited JSON files•Ease of parsing and streaming has made it the most popular JSON format •NDJSONdata ingestion and processing scales similarly to structured file formats { “name”: “Jane”, “academics”: { "undergraduate": "MIT", "graduate": "UT Austin” }, "age": 24 }{ “name”: “Jill”, “academics”: { "undergraduate": ”Madison", "graduate": ”Stanford” }, "age": 27 }… JSON acceleration with HeatWaveQuery processing and real-time analytics on JSON documents •Data compressed up to3X•Scales across nodes DMLs propagated inreal-time Incremental data load in Lakehouse tablesFeatures •Provides 1-to-1 mapping between user data and Lakehouse table data at any point in time•Only delta in user data is applied incrementally over existing table data•Incremental load triggered manually through a SQL command •Read-committed&snapshot isolation:Queries on Lakehouse tables are never blocked•Queries are run on the version of the data which is committed as of the query start time •Integrated into existingAutoLoadinterface Scale-out delta ingestion •Granularity of data updateis an objectcorresponding to thousands of records •Objects in user buckets can beadded,deleted, orupdated•Delta computed comparing current list of objects with the list from the last table load or incremental load •Delta apply design: Treat each object as a new horizontal slice of the table •Objects added or updated are transformed and ingested in a scale-out manner acrossHeatWavecluster like table load•Bulk-inserts scale:HeatPumpparallelism at inter-file & intra-file levels•Objects deleted–fast in-memory operation ofdropping a table slice by updating table version Partial query execution inHeatWavefor data in object store Execute part of the query inHeatWave, rest in MySQL HeatWave AutoML: In-database machine learning •Eliminates tedious and laborious steps•Simple to use interface for beginners oradvanced ML users•Automatically selects algorithm and tunes it•Explainable model behavior and predictions•Fast training allows to quickly iterate andachieve desired outcome•ML on data in InnoDB and Object Store(Lakehouse) Native Vector Processing in MySQL HeatWave •MySQL &HeatWavesupports new Vector data type•In-memory hybrid-columnar storage format for vector columns Vector Datatype •Leverage SIMD instructions for vector processing•Processes at near memory bandwidth Vector Processing •End to end data management including embedding generation•Integrated with features like in-bound replication Data Management Unstructured data is transformed inHeatWaveVector Store Automatically generate embedding for text from multiple file formats Scale out Vector Store creation withHeatWaveLakehouseParsesource fileswithOutsideIn(OIT)andconcurrentembedding generation across nodes Exact Nearest Neighbor Search using SQL SELECTdigit,imagena