OPENING 李钰(绝顶)ASFMember, ApacheCeleborn/Flink/HBase/PaimonPMC Member阿里云智能EMR负责人 AIGC further promotes the explosion of big data ➢Data Volume: AI further drives massive data explosion, far exceeding the data growth of the previous era ➢Data Diversity: Multimodal data processing will become a standard for future data processing, including storage,computation, and management Data Warehouse Data Lake Data Warehouse BUILD OPEN SOURCE COMPATIBLELAKEHOUSE ON ALIBABA CLOUD 李钰(绝顶)ASFMember, ApacheCeleborn/Flink/HBase/PaimonPMC Member阿里云智能EMR负责人 Serverless Spark Transforms Data Management withOne-Stop, Fully ManagedServices for Seamless Development, Scheduling, and Maintenance.100% CompatiblewithOpen-sourceSpark,3XFaster with Fusion, an Enterprise Native Engine. Resilient Easy to Use •Native Engine supported, 3X fasterthan open source Spark•Enhanced RSS supplies 1.5Xthroughput for IO-intensive apps •Enterprise remote shuffle service(RSS) solution to support betterelasticity•On-demand and seamless rescaling•Native integration with DLF and OSS •Rich Open API supplied forintegration•100% compatible with open sourceusage, both API and binary aspect •One-stop data engineering support•Visualized job and workflowmonitor•Convenient resource and sessionmanagement Session Management(Resource for Interactive Query) Fusion is an enterprise native engine which is 3X Faster than the open source Spark Java engine Vectorized Execution Engine Fast Columnar Shuffle •NativeOperator•SIMDJsonOptimization •Enterprise RSS based on ApacheCeleborn•Data shuffle reduced up to 40% x86(Intel/AMD) and ARM support NativeC++Integration •OSS-HDFS Support•Deep Parquet and ORC integration•Paimon、DeltaLake andIceberg support Hardware awareness optimization•SVE SIMDacceleration•zstd-ptgcompression acceleration Testing Environment•6 d3s.16xlargeECS server•AlibabaCloudLinux3•OpenJDK1.8.0 RSS removes the dependency on local disk for shuffle data and enables 100% disaggregation of compute and storage •Apache Top Level Project, donated by Alibaba Cloud•De-facto RSSchoice, used by Alibaba, LinkedIn, etc. •Enterprise security assurance with data encryption•Enhanced IO scheduling, flow control and quota management •Widely adopted in Alibaba, used by both Spark andFlink•Successfully supports job with 600TB+shuffle data •69% Performance boost than YARNexternalshuffle•Performance gain increases with shuffle data scale •SupportsSparkDRA•SupportsSparkAQE Tools Open API •Spark-submit Compatible JobSubmission•Notebook•Git integration (Planning) •Workspace•JobRuns•SQL Editor•Workflows Open Source Workflow Integration AlibabaCloud Product Integration ServerlessStarRocksOffers a High-Performance, All-Scenario, Blazing-Fast andUnified Data LakehouseAnalytics Service.100% CompatiblewithOpen-sourceStarRocks,3XFasterthantraditionalOLAP(Presto/Trino,ClickHouse,Druid..) providing. Cloud-native Unified Easy-to-use Fast •Maintenance free with high SLA•Compatible with MySQL protocol•Compatible with multiple BI tools•Supports slow query diagnose•Visual metadata management•Easy migration with cluster link tool •Multi-dimensionalLakehouseAnalytics with rich lake dataformatsupport•Materialized Views and ETL support•Highconcurrency support (10k persec)•Real-time data analysis•Diverse data model support •Largescaledataanalytics•SIMD-Optimizedqueryengine•Highspeedreal-timedataingestion•Innovative pipelineexecutionengine•Fullstackvectorizedtechnology•InnovativeCBO technology •Out-of-box, minute level delivery•Efficient resilience support•Deep integration with DLF and VVP•DisAggand Virtual Warehouse Fastandunified •A comprehensive vectorized execution engine, modernizedcost-based optimizer (CBO), with concurrency reachingtens of thousands of queries per second (QPS).•Fully compatible with data lake formats, offering morethan a 3X performance improvement relative to Trino.•Supports materialized view ELT scenarios, enabling one-step data tier processing. Separationofstorageandcompute •Optimized computational elasticity for on-demand usage,with the potential to reduce storage costs by up to 60%.•Offers multi-computing cluster capabilities, ensuringresource isolation between different business unitswithout interference.•Various caching strategies available, allowing customersto flexibly configure according to their business needs.Usewithease •Outofbox,the StarRocks Manager offers a wide range ofenterprise-level features.•Intelligent diagnostics and analysis, providingcomprehensive analysis in conjunction with customerbusiness operations. One-Stop SQL Edit and Query Instance Monitor Fully Managed Extreme Elasticity SlowSQL Profile and Diagnose Dis-aggregation Support One-stop Dev and Analyze Lakehouse Hierarchy Elasticity Lake Query Acceleration On-demand Second-level Elasticity with Low CostComprehensive load analysis and diagnostic High Perf 3x-5x faster than TrinoSignificantly faster thanClickHouseand Apache Doris Maturity