您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。[中国移动]:互联网电视业务故障自动监测系统 - 发现报告
当前位置:首页/其他报告/报告详情/

互联网电视业务故障自动监测系统

2023-11-08-中国移动胡***
互联网电视业务故障自动监测系统

中国移动通信有限公司研究院2023-11 •传统网络/网元质量指标,不能完全准确反映用户体验•不同业务的用户体验与网络质量指标之间的影响关系不同,没有完整的指标体系•无法及时获取用户视频业务实际体验,靠用户反馈主观且滞后•仅靠网络报文分析,没有用户体验关联,准确度差•端到端路径长,分段分域各自排障效率低,缺乏方法和工具,定界难且时间长•故障诊断维度和层级多且相互交织,复杂度高•截至2023年6月,网络视频的用户规模达到10.44亿,视频在网络流量中占比将超过70%•视频业务形态多样:IPTV、互联网电视、长视频、短视频、直播、VR/AR、元宇宙等 用户体验质量采集AI故障检测与诊断摄像头手机/平板头机顶盒VR眼镜电脑智能电视手机/平板头智能设备VR眼镜智能电视摄像头机顶盒智能电视家庭个人政府行业故障监测平台本系统已用于互联网电视领域,同样适用于其他视频业务场景 精确捕捉用户体验相关业网关键指标快速、准确的故障智能监测技术构建全面的视频质量评测指标体系 •参照ITU-T、3GPP、CCSA等国内外标准•通过大数据建模分析,筛选出与用户体验强相关的指标•“四层六类”体系,指标数300+网络质量指标业务质量指标用户体验指标TCP/IP层 接入网接收光功率PPPoE拨号次数应用层 内容/平台DNS解析时延物理/链路层 家庭网络)LAN口数据误码率TCP重传率TCP次握手时延下挂设备类型DNS解析成功率协商速率EPG成功率/响应时延卡顿花屏时长占比首次加载时长视频播放成功率M3U8请求时延/成功率媒体分片请求时延/成功率媒体分片请求时延/成功率媒体分片请求时延/成功率CPU使用率RAM使用率用户终端超长在线占比网络链接方式PON口收光误码率WAN口平均/峰值流量智能路由器下挂设备RTSP时延HTTP/RTSP响应码HTTP下载速率Ping时延抖动Traceroute时延片源分辨率码率拖拽时延播放时长丢包率终端指标WIFI信号量下挂设备接入方式RSRP/SINR视频用户体验优良率视频质量评分视频播放体验评分视频交互体验评分电视长短视频视频直播元宇宙VR/AR视频通话... 精确采集与用户体验劣化关联的网络指标适配数十个厂商、数百款型号、亿级的用户终端精确捕获视频播放全周期的用户体验指标痛点:传统技术采集视频体验指标不全或不精准,易发生漏检、错检点击播放首帧画面卡顿加载花屏显示 首次提出基于前沿“启发式搜索”算法构建根因定界模型,具备“精确定界”“快速检索”优势Fig.2.Visualisationofmetricsacrossvariousdevices.(a)aretwoperformancemetricsofthreedevicesfromanInternetcompany,while(b)aretwoservicemetricsofthreedevicesfromChinaMobile.appliedtoallotherdevicesprovidingthesameservice.Thereareessentiallytwoanomalydetectionscenarios:SelfandCross.IntheSelf,boththetrainingdataandtestdataoriginatefromthesamedevice.Conversely,intheCross,thetrainingdataandtestdatastemfromdifferentdevices.Wedirectlyemploythemodel(andscaler,threshold,etc.)derivedfromthetrainingdatatomonitorthetargetdevicewithoutanyfine-tuning.Incomparisontothefew-shotlearningmethodforanomalydetection,thereisnoneedforasubstantialdurationtoaccumulatenewtrainingdataforeachadditionaltargetdevice.Asaresult,theunmonitoredphaseforthenewdevicescanbeavoided.Anomalydetectioncommonlyemploysunsupervisedlearn-ingduetothescarcityoffailuredatasamplesandthehighcostsassociatedwithlabeling.Overthepastfewyears,variousclassicalunsupervisedmethods[1]–[6]havebeendeveloped.Thesemethodstypicallytreateachtimestampasanindividualdatapointandemployrelativelysimplisticmodelstocap-turethecorrelationamongmetrics.Recently,deeplearningtechniques[7]–[12]havedemonstratedsuccessfulapplicationsincapturingcomplextemporal-spatialdependencieswithanon-linearmannerintimeseries.However,directlyapplyingthesemethodstoSelfandCrossscenariosintheindustrialenvironmentstillpresentschallenges.Thechallengesmainlycomefromthreedistinctaspects:•Numerousnoise.Datacollectedfromreal-worldoftencontainsasignificantamountofnoise,asdepictedinFig.2(a).Excessivenoisecanobscurethetypicalpresentintimeseries.•Variableperiodcomponent.Duetothedistinctcharac-teristicsofusersserved,thebehaviorofeachdevicecanvarywithineachperiod.Forexample,inFig.2(a),thefirstdevicedemonstratesasharpincreasefollowedbyagradualdecrease,whereastheseconddeviceexhibitstwodistinctpeaks.•Driftingdatadistribution.Distributiondriftcanoccurbothintra-deviceandinter-device.Intra-devicedriftisacommonoccurrence.Asdataiscollectedfromanon-stationaryenvironment,temporalpatterncanchangeovertime.Forinstance,asthepopularityofaserviceincreases,therequestcountgraduallyescalates.Inter-devicedriftprimarilyarisesfromthedistinctregionalcharacteris-tics.Forinstance,asillustratedinFig.2(b),thesecondprovince,withalargerpopulation,exhibitsapproximatelythreetimesthenumberofactiveuserscomparedtothefirstprovincewithasparserpopulation.Inresponsetotheabovechallenges,weintroduceGenAD,aGeneralAnomalyDetectionframeworkformultivariatetimeseries.GenADefficientlymonitorsdiversedevicesincross-regionallarge-scalesystemsusingamonolithicmodel.Inourapproach,webeginbyemployingSingularValueDecompo-sition(SVD)basedinformationcompressiontoreducenoise.Followingthat,weutilizewindownormalizationtoaddressdistributiondrift.Tohandlethevariabledistributionandperiodcomponents,weintroduceMetric-Patch-Wise(MPW)embed-dingtoextracthigh-levelfeatures.WithinMPWembedding,eachmetricisdividedintomultiplepatches,andeachpatchisindependentlyembeddedintothefeaturevectors.Finally,toleveragetheobservedtemporal-spatialconsistencies,weincor-porateaseriespredictiontasktocapturetemporaldependencyandametricreconstructiontasktocapturespatialdependency.Ingeneral,ourcontributionsaresummarizedasfollows:•Weproposeanoveltimeseriesembeddingmethod,Metric-Patch-Wise,toensuregeneralizationamongde-viceswithvariabledistributionandperiodcomponents.•Weintegratethetasksofseriespredictionandmetricreconstructiontomodeltemporal-spatialconsistenciesamongdevicesacrossdifferentregions.•WeintroduceGenAD,whichachievesaccurateanomalydetectionforalldevicesprovidingthesameserviceafterbeingtrainedsolelyonasingledeviceincross-regionallarge-scalesystems.•WevalidatethesuperiorityoftheproposedmethodbyextensiveexperimentsontworealdatasetsfromInternetcompanyandChinaMobile.创新研发基于多维指标时序表征方法,实现跨省网络“时空一致性”建模,构建省间泛化能力痛点1业务高峰期易出现“非平稳时间序列漂移”,产生虚警痛点2各省时序数据模式多样化,逐省标注、建模成本高基于深度学习异常检测算法,动态解耦时间序列,学习指标内在变化规律,应对时序非平稳问题痛点3长流程、多层多维组网结构引发“维度爆炸”,影响定界效率 6项2项3篇立项7项•CN109769131A 一种视频质量监测方法及机顶盒•CN109729051A 一种信