AI智能总结
XiaoqianWang, PhDAssistant Professor GhassanAlRegib, PhDProfessor Mohit Prabhushankar, PhDPostdoctoral Fellow Omni Lab for Intelligent Visual Engineering and Science (OLIVES)School of Electrical and Computer EngineeringGeorgia Institute of Technology{alregib,mohit.p}@gatech.edu Electrical and Computer EngineeringPurdue Universityjoywang@purdue.edu Feb.26,2025–Philadelphia, USA Tutorial Materials Accessible Online https://alregib.ece.gatech.edu/courses-and-tutorials/aaai-2025-tutorial/{alregib,mohit.p}@gatech.edu,joywang@purdue.edu [Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025] 2of195 Foundation Models Expectation vs Reality Expectation vs Reality of Foundation Models Foundation Models Segment Anything Model Segment Anything Model (SAM) released by Meta on April5,2023was trainedon Segment Anything1Billiondataset with1.1billion high-quality segmentation masks from11million images Kirillov, Alexander, EricMintun,NikhilaRavi,HanziMao, Chloe Rolland, Laura Gustafson,TeteXiao et al."Segment anything."arXivpreprint arXiv:2304.02643(2023).[Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025] 4of195 Foundation Models Segment Anything Model Cityscapes datasetsemantic segmentationannotationtook ~90minsper image Kirillov, Alexander, EricMintun,NikhilaRavi,HanziMao, Chloe Rolland, Laura Gustafson,TeteXiao et al."Segment anything."arXivpreprint arXiv:2304.02643(2023).[Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025] Foundation Models ‘Trial and Error’Interventions in Segment Anything Model [1] Quesada, Jorge, et al. "PointPrompt: A Multi-modal Prompting Dataset for Segment AnythingModel."Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.2024. Foundation Models Vision-Language Models are‘Doomed to Choose’ Goal: Given a long video sequence, vision language models (VLMs) can process, interpret,and answer questions VLMs (and all other deeplearning-based systems) are‘doomed to choose’–nomechanism to understand ifsufficient information isavailable at inference Demo created at Inference on“LLaVA-v1.5-13B”model on Daily ActivityRecognition (DARai) dataset [1] [Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025] 7of195 [1]GhazalKaviani, YavuzYarici, Mohit Prabhushankar, GhassanAlRegib, MashhourSolh, Ameya Patil, June12,2024,"DARai: Daily Activity Recordings for AI and ML applications", IEEEDataport,doi: https://dx.doi.org/10.21227/ecnr-hy49. Foundation Models DARaiDataset Vision-Language Models are Sensitive to Granularity of Tasks VLMs (encoder finetuned on dataset) fail when recognizing fine-grained hierarchical activities Otherfindings: Foundation Models Vision-Language Models are sensitive to experimental setup VLMs (encoder finetuned on dataset) fail when recognizing domain-shifted inputs Foundation Models Debiasing VLMs Vision-Language Models are Biased towards Societal Stereotypes Uncuratedtraining datainvariably reflectsbiases present insociety. Utilizing suchmodels in downstreamtasks perpetuatesbiases 10of195 Jung,Hoin,TaeukJang, and Xiaoqian Wang.“A Unified Debiasing Approach forVision-Language Model acrossModalities and Tasks”. InNeurIPS.2024. Foundation Models Requirements and Challenges for Deep Learning Requirements: Foundation model-enabled systems must predict correctly and fairly on noveldata and explain their outputs Noveldata sources: •Test distributions•Anomalous data•Out-Of-Distribution data•Adversarial data•Corrupted data•Noisy data•New classes•… [Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025] 11of195 Temel,Dogancan, et al. "Cure-tsd: Challenging unreal and real environments for traffic sign detection."IEEETransactions on Intelligent Transportation Systems(2017). Deep Learning at Training Overcoming Challenges at Training: Part1 The most novel/aberrant samples shouldnotbe used in early training •The first instance of training must occur withless informative samples•Ex: For autonomous vehicles, less informativemeans•Highway scenarios•Parking•No accidents•No aberrant events [Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025] 12of195 Benkert, R.,Prabushankar, M.,AlRegib, G.,Pacharmi, A., & Corona, E. (2023). Gaussian Switch Sampling:A Second Order Approach to Active Learning.IEEE Transactions on Artificial Intelligence. Deep Learning at Training Overcoming Challenges at Training: Part2 Subsequent training mustnotfocus only on novel data •The model performs well on the newscenarios,while forgetting the oldscenarios•Several techniques exist to overcome thistrend•However, they affect the overall performancein large-scale settings•It is not always clearif and whentoincorporate novel scenarios in training [Tutorial@AAAI'25] | [Ghassan AlRegib, Mohit Prabhushankar, and Joy Wang] | [Feb26,2025]