AI智能总结
演讲人:陈俊杰-腾讯-资深研发工程师 目录 Par t 01 Iceberg社区高级特性介绍 BranchandTag --Create a branch/tag for tableALTERTABLE table CREAT TAG/BRANCH tagName[AS OF {VERSIONsnapshotId}][RETAIN interval {DAYS | HOURS | MINUTES}]--Read from a branchSELECT*FROM table BRANCH/TAG branch_name--Insert into a branchINSERTINTO table BRANCH branch_name SELECT... NewTableAPI createBranch(String name, longsnapshotId);createTag(String name, longsnapshotId); A-> B-> C (master)\(tag1)D-> E (archivebranch)\F-> G (testbranch) spark().read() .format("iceberg").option("branch",branchName).load(table) spark().write() .format("iceberg").option("branch",branchName).mode(SaveMode.Append).save(table) Puffinformat Afile format designed to store information such asindexesandstatisticsabout datamanaged in an Iceberg table that cannot be stored directly within the Iceberg manifest. public interfaceUpdateStatisticsextendsPendingUpdate<List<StatisticsFile>> {/** /*** Remove the table's statistics file for given snapshot.**@returnthis for method chaining*/UpdateStatisticsremoveStatistics(longsnapshotId);} Statistics ●Tablestatistics ●N u m b e ro fr o w s●N u m b e ro fd i s t i n c tv a l u e si nac o l u m n●T h e f a c t i o no fN U L Lv a l u e si nac o l u m n●M i n / m a xv a l u ei nac o l u m n●T h e a v e r a g ed a t as i z eo fac o l u m n ●HowstatisticshelpCBO? View A view is a logical table that can be referenced by future queries,theicebergviewdefinitionstandardizes the view metadata for ease of sharing the views acrossengines. Par t 02 Iceberg高级特性解锁新场景 BRANCH解锁场景一:CDC入湖 WriterawCDCeventstothechangebranch,producechangelogfeedfromthebranch. --Create a snapshot view for usersCREATEVIEW usersAS SELECTuser_cols.*,--the columns of the original tabletxId--the incremental transaction id, or timestampFROM (SELECTROW_NUMBER() OVER (PARTITION BYrow.idORDER BYtxId DESC) as row_numberoperation,row asuser_cols FROM users BRANCH changes)WHERErow_number= 1 AND operation != 'delete' MERGEINTO Users BRANCH optimized as tUSINGincr_changes as sONs.id=t.idWHENMATCHED[and(time cond)]updateWHENNOT MATCHED insert all BRANCH解锁场景二:多流拼接 Writepartialinsertstoonebranch,mergeincrementaltomergedbranch //step3:mergeintothetargetbranchmergeintotablebranchoptimizedastusingaggDfassont.key=s.keywhenmatchupdate*whennotmatchinsert* //step2:compactviawindowaggregations //step1:definewindow WindowSpecwindowSpec=Window.partitionBy(primaryKey).orderBy(functions.desc(orderColumn)).rangeBetween(Window.unboundedPreceding(),Window.unboundedFollowing()); Primarykey->col(keycolumn)Ordercolumn->max(order column);Datacolumn->first(data column,true) Puffin解锁场景一:异步Stat构建 Storetablestats Puffin解锁场景:index构建 View解锁场景:MV A materialized view isapre-computeddata set derived from a query specification (theSELECT in the view definition) and stored for later use. Par t 03 Iceberg新特性在腾讯应用实践 CBO ●Buildtablestatisticsasynchronously,and updatepartitionlevelstatisticsincrementallyviathetasketch. Indexing ●Asyncindexing,supportBloomfilterandBitmapIndex CREATEINDEX index_name ON[TABLE]table_nameUSINGBLOOMFILTER ( { colName1 [ options ] } [, ...] ) ][ options ] OPTIONS ( { key1 [ = ] val1 } [, ...] ) Authorization ●Thousandsofcolumnsinatable●Differentdepartmentsfocusonseparatedcolumns●Useauthorizedviewinsteadoftable A/Btesting ●Asyncindexinguponqueryanalysis●Asyncz-orderclusteringuponqueryanalysis●Effectvalidationonthe branch 感谢观看!