ZhenyuBi1,MinghaoXu2, Jian Tang2, Xuan Wang1 1Department of Computer Science, Virginia Tech, USA2Mila-Quebec AI Institute, Canada About the Tutors ZhenyuBi, PhD Student,Department of ComputerScience, Virginia Tech MinghaoXu, PhD Student,Mila-Quebec AI Institute Xuan Wang,AssistantProfessor,Department of ComputerScience, Virginia Tech Jian Tang,Associate Professor,Mila-Quebec AI Institute Tutorial Outline •9:00 am–9:10 am:Introduction•9:10 am–10:00 am:Part I: Scientific Text•10:00 am–10:10 am: Break•10:10 am–11:10 am:Part II: Brain Signals•11:10 am–11:20 am: Break•11:20 am–12:20 pm:Part III: Biological Sequences•12:20 pm–12:30 pm:Summary and Q&A AI for Sciences Large Language Models (LLMs) BERTKentonet al., 2019 GPTBrownet al., 2019 T5Raffelet al., 2020 Machine Translation Dialog Systems, Chatbots, Digital Assistants Natural Language Generation Resemblance Between Scientific Data and Language •Sequences! •Scientific Textual Data: Scientific Literature, Electronic Health Record•Sensor Data: Brain Electroencephalogram (EEG) Signals•Biological Sequences: DNA, RNA, protein This Tutorial: Can we harness the potential oftheserecent LLMsto drivescientific progress? Scientific Large Language Models:Challenges and Opportunities Xuan WangAssistant ProfessorDepartment of Computer ScienceSanghaniCenter for AI and Data AnalyticsVirginia Tech Tutorial Outline •9:00 am–9:10 am:Introduction•9:10 am–10:00 am:Part I: Scientific Text•10:00 am–10:10 am: Break•10:10 am–11:10 am:Part II: Brain Signals•11:10 am–11:20 am: Break•11:20 am–12:20 pm:Part III: Biological Sequences•12:20 pm–12:30 pm:Summary and Q&A Outline •Scientific LargeLanguageModels •Future Directions: •Complex Reasoning and Planning•Multi-modalLearning•Trustworthiness of LLMs LargeLanguageModels(LLMs) Yang, J., Jin, H., Tang, R.,Han, X., Feng, Q., Jiang, H., ...& Hu, X. (2024). HarnessingthePowerofLLMsinPractice:ASurveyonChatGPTandBeyond.ACM Transactions onKnowledge Discovery fromData,18(6), 1-32. Non-comprehensiveevolutionarytree forProtein/RNA/DNAlanguage models Jian Ma, “Large LanguageModels in ComputationalBiology–A Primer (2024Update)”, 2024 A Comprehensive Survey of Scientific Large LanguageModels and Their Applications in Scientific Discovery(Zhanget al.,EMNLP 2024) •Survey over260 scientific LLMs •Across fields: 1) general science, 2) mathematics, 3) physics, 4)chemistry and material science, 5) biology and medicine, and 6)geography, geology, and environmental science •Acrossmodalities: 1) text, 2) graph, 3) vision, and 4) time series •Website:https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models A Comprehensive Survey of ScientificLarge Language Models and TheirApplications in Scientific Discovery(Zhang et al., EMNLP 2024) Towards Expert-LevelMedical QuestionAnswering withLargeLanguage Models (Med-Palm-2, Google, 2024) Figure 1|Med-PaLM 2 performance on MultiMedQALeft: Med-PaLM 2 achieved an accuracy of 86.5% on USMLE-stylequestions in the MedQA dataset. Right: In a pairwise ranking study on 1066 consumer medical questions, Med-PaLM 2 answerswere preferred over physician answers by a panel of physicians across eight of nine axes in our evaluation framework.Singhal, K., Tu, T.,Gottweis, J.,Sayres, R.,Wulczyn, E., Hou, L., ... & Natarajan, V. (2023). Towards expert-level medical questionanswering with large language models.arXivpreprint arXiv:2305.09617. OpenAI o1 Surpasses Human Performance on PhD-Level Science Questions (OpenAI, 2024) OpenAI o1 Surpasses Human Performance on PhD-Level Science Questions (OpenAI, 2024) Outline •Scientific LargeLanguageModels •Future Directions:•Complex Reasoning and Planning•Multi-modalLearning•Trustworthiness of LLMs Evaluation and Mitigation of the Limitations of LargeLanguageModels in Clinical Decision-Making(Hageret al.,Nature Medicine 2024)https://doi.org/10.1038/s41591-024-03097-1AppendicitisCholecystitisDiverticulitisPancreatitis2,400INPUTTOOLSHistory of present illnessPhysicalexaminationLaboratoryresultsRadiologistreports Evaluation and Mitigation of the Limitations of LargeLanguage Models in Clinical Decision-Making(Hageret al.,Nature Medicine 2024)Articlehttps://doi.org/10.1038/s41591-024-03097-1 LLMsDiagnoseSignificantly WorsethanCliniciansge agreement of MIMIC-IV, only open-access models that can be downloaded can be used with the data; thus, only LLMs based on Llama 2 were used in thiation has been made public.aMeta defines ‘public data’ as a ‘mix of data from publicly available sources’.bNo further information provided. DiagnosticAccuracyof LLMsDecreasedin anAutonomous Clinical Decision-Making Scenariohttps://doi.org/10.1038/s41591-02 LLMsDo NotConsistently Recommend EssentialandPatient-Specific Treatment LLMs Are Sensitiveto theQuantityofInformationProvidedhttps://doi.org/10.1038/s41591-0 LLMs Are Sensitive to the Order of Informationon the MIMIC-CDM-FI dataset. This suggests that LLMs cannotkey facts and degrade in performance when too much infor