San Jose, CA Augmenting GenAI Workloadson IBM Fusion HCI Purnanand Kumar/ Dipali Chatterjee 28 July 2025 Augmenting GenAI Workloads with Content-Aware Storage on IBMFusion HCI for Scalable, Trustworthy, and Accelerated Enterprise AI The objective of the Content-Aware Storage (CAS) architecture is to facilitate smooth interactions betweenLLMs and extensive quantities of unstructured data, amplifying insight derivation and suggestionfunctionalities. IBM Fusion CAS uses IBM Storage Scale CNSA to remote mount a cluster where documents are parsed byNVIDIA NeMo services. Parsed content and metadata are embedded and indexed into a vector DB. Watchfolders and optional AFM enable scalable, incremental ingestion. CAS provides semantic, keyword, andhybrid search APIs with optional reranking. These results can be integrated into enterprise RAG pipelines. Introducing IBM Content-Aware Storage (CAS): a software-defined storage data service thatalleviates the knowledge base challenges for GenAI implementations IBM CAS combines the power of AI document processingwith IBM’sAI storage software, & research innovations… …to jointlybring to market a state-of-the-artstorage-based Knowledge solution that is… Simple: Automated RAG solution Enables Gen AI capabilities on unstructured data in anyon-prem location Efficient: a)Cost/performanceb)Works with legacy data Only process incrementally changed data; High performanceshared storage for data processing; GPU optimized storage foroptimized Document Processing and Search performance Secure: Preserve data ACL; Data encryption for embedding Content Aware Storage (CAS) Improved performance and reduced cost Enables rapid data insights •Caches intermediate processing artifacts for later use•Simplified deployment and enablement of Storage forAI•That is Scalable•Turnkey•Smart detection of metadata versus contentchanges for processing •Augments the storage with a Prompt ready interfacefor GenAI applications•Enact disparate data store across data silos fromStorage infrastructure•Present a unified view of knowledge by aggeratingData pipelines from multiple data sources from yourstorage infrastructure (File and Object)•Bringing more content into knowledge in minutes•Preserve Security Reduces copies of data •Includes data from legacy storage through Scale AFM Keeps the knowledge base up to date •Leverages change tracking to perform incrementalupdates to the searchable knowledge base•Keep knowledge base in sync with the changes inStorage Storage for AI Integrations Advancing AI - NextStage of Innovation A Data-Driven Approach to LLM Optimization ThroughOnline Experimentation Jimin (Anna) Yoon 20 August 2025 From Master Disks to Daily Deploys: A Historical Analogy ●Recap of 1990s software: slow, centralized, one-shot releases.●Comparison to early AI development: long offline cycles, little real user feedback.●Just as software moved to agile and CI/CD, AI is moving to continuous online testing.●Takeaway: You wouldn’t ship code without telemetry or testing—don’t do it with AI. Why AI Needs a New Testing Paradigm ●Offline testing still matters, but it’s insufficient alone: costly, slow, disconnected from user outcomes.●New foundation models drop week after week—precision without speed is obsolete.●Evaluating generative AI requires understanding real-world impact: engagement, task success, cost, latency.●The only way to truly test a generative model is to put it in front of users. Online Experimentation as the Core Engine The evaluation flywheel: ●Build AI feature → test variants (prompt/model/params) → observebehavior → improve.●How A/B testing can evaluate not just product UX, but LLM systemquality and alignment.●Tradeoffs in real deployments: speed vs safety, cost vs performance.●Key questions teams can answer:○Which prompt leads to more retained users?○Does upgrading to GPT-4 Turbo increase engagement enoughto justify the cost?○Does a new reranking logic reduce hallucinations? Case Studies from the Field ●Example: Testing prompt variations in a helpbot for resolution rate.●Example: Evaluating a smaller model’s latency vs quality tradeoff.●What failed: Measuring token-level metrics without understanding user goals.●Real-life tooling setup (Statsig or others): feature gates, logging, standardized inputs/outputs, cost tracking. Best Practices and Guardrails ●Embrace progressive rollout—don’t go 0 → 100% in production.●Log everything: input/output pairs, model config, latency, token count.●Treat LLMs like experiment arms—not static APIs.●Use a culture of experimentation to manage risk and improve outcomes. Final Takeaways ●Online testing is the CI/CD of AI development—fast, safe, user-centric.●This shift isn’t just for big tech: modern tooling (like Statsig) levels the playing field.●AI product success will depend not on who builds the best prompt, but who iterates fastest and safest. Thank You Anna (Jimin) Yoon linkedin.com/in/anaanna417/anna@statsig.co