您的浏览器禁用了JavaScript(一种计算机语言,用以实现您与网页的交互),请解除该禁用,或者联系我们。 [Snowflake]:Generative AI and Agent Unstructured Data Essential Guide - 发现报告

Generative AI and Agent Unstructured Data Essential Guide

信息技术 2025-12-31 Snowflake 健康🧧
报告封面

UNSTRUCTUREDDATA FORGENERATIVE AIAND AGENTSESSENTIAL GUIDE TABLE OF CONTENTS Introduction3The New Data Foundation4Getting Your Data Ready for AI8 INTRODUCTION In 2024, the total volume of information generated by humans and machineswas estimated to reach 147 zettabytes, or 147 trillion gigabytes. If you convertedall that data to recorded speech, it would generate more than 400 million years We live inside a sea of data. A recent report highlights how much — in everyminuteof each day — human beings are creating that data: •Asking Siri 1 million questions•Viewing 3.4 million YouTube videos•Performing5.9millionGooglesearches•Sending18.8milliontextmessages each year. By the end of 2025, global data production is expected to exceed180 zettabytes. This data represents an enormous untapped opportunity forinsights and innovation — once we uncover the right ways to unlock it. for use in training language models and building innovative AI applications. individuals and organizations create each day that’s stored on personaldevices and network servers, or data used for monitoring systems and THE NEW DATA FOUNDATION As much as 90% of global data is unstructured, meaning that this information is not organized ina manner that is easily understood by computers. Unstructured (or semi-structured) data includes: •Wordprocessordocuments•Spreadsheets•PDFfiles•Presentation slides•Chat and instant message logs •Online forums•Wikicontent•Photos•Illustrations•Screenshots •Music•Media metadata•Medicalimaging•IoTsensordata Buriedinsidethisunstructureddataisalargelyuntappedtroveofpotentialinsights.Forexample,aconsumerproductscompanycouldscanonlinefeedback,productreviewsandsocialmediapoststoidentifynewfeaturesthatcustomersare areorganizedbyschema,aframeworkthatdefineswhateachdatafieldshouldcontainandhowdifferentdataelementsrelatetoeachother.Informationthatcanbebrokendownintosmallercomponentsandlabeledisidealforuseinrelational lowersthecostofanalyzingunstructureddata,unlockingvasttrovesofenterpriseassetsthatwerepreviouslytooexpensive yearsbecauseofitsrelevanceasanassettofuelgenerativeAIapplicationsandagents.ThisthenopenssomeofthebenefitsofAI—specifically,costefficiency,itsabilitytogetnewinsights isn’teasilybrokendownintosmallercomponentsthatfitinsiderow-and-columndatabasestructures.Thatmakesithardertosearchandanalyzethedata. largelymanual,complexandcostly.Itrequiredorganizationstotranslateunstructureddataintoformatsthatmachinescouldunderstandorusespecializedtoolstoextractmeaningfulinsights. toolsandprocessesbeforetheycanbefedintoamachinelearning model. unstructuredelementsbutdoesn’tconformtotherigidrequirementsofrelationaldatabases.Thesesemi-structuredfilesincludethingslikeemailmessages,spreadsheets, semi-structured dataLet’slookatthethreemaintypesofdatathatbusinesses includingrule-basedanalyticstoutilizeunstructureddata,withlimitedsuccess.Butthevolume,velocityandvarietyofunstructuredformatsconsistentlyoverwhelmedtraditional contendwith. address,phone,creditcardnumberandsoon.Thesedatabases andthusevenmorevaluabletoorganizations.GenAIdramatically Bothunstructuredandsemi-structureddataneedadditionalprocessingbeforetheycanbefedtoanAImodel.Here’showthethreetypesofdatacompare: A WORD ABOUTHALLUCINATIONS from perfect. Sometimes, in an effort torespond to user prompts, a gen AI chatbotwill produce authoritative-sounding answersthat aren’t actually true. These are knownas hallucinations. Techniques like retrieval-augmented generation (RAG) can reduce the Thistechnologycanalsobedeployedtoanalyzeunstructureddata.Forexample,anAIagentcouldautomaticallyscansocialmedianetworksforshiftsincustomersentiment,extractkeyinformationfromcontractsandotherlegaldocuments,flag AI agents are changing the wayunstructured data is leveraged Today,mostgenAItoolsarelimitedtoretrievinginformationandproducingcontentinresponsetouserprompts.ThenextstepintheevolutionofgenAItoolsisperformingsimpletasksonbehalfofuserswithoutexplicitinstruction.Thesetools,calledAIagents,canautonomouslyplan,executeandadaptworkflowsinternalcommunicationsforcompliancepolicyviolations,oranalyzenewsreportstopredictmarketmovements.AsAIagentsevolve,morespecializedtoolswillemerge.Theywill begintocoordinatetheiractions,handingtasksofffromoneagenttoanother.Forexample,ateamofsoftwaredevelopmentagentswillcloselyreplicatehumanworkflows:Aproductmanageragentwilldefineprojectrequirements,asoftwarearchitectagentwilldesignhowthesystemisstructured,adeveloperagentwillwritethecode,anotheragentwilltestit,andthefinalagentwillhandle thesetoolshavealreadybeenusedtoautomatically: •Sendfollow-upemailsaftermeetings•Processinsuranceclaims•Analyzecodeandfixbugs•Scheduleinterviewswithjobcandidates actionsforhumanapproval.Overtimetheymaybeentrustedtotakeamajorityofactionsautonomously,leavinghumanstoweighinwhenrecommendationsareunclearortheproposedactions GETTING YOUR DATAREADY FOR AI information, you need to get your data ready for AI. As noted in the previous chapter, AI excels at ide