Towards Robust Machine Learning under Distribution Shift and Adversarial Attack

October 27, 2021 (Xintao Wu)

As big data and AI technologies are deployed to make critical decisions that potentially affect individuals (e.g., employment, college admissions, credit, and health insurance), there are increasing concerns from the public on privacy, fairness, safety, and robustness issues of data analytics, collection, sharing and decision making. In this talk, we first overview our social awareness research, in particular, on how to mitigate side effect of enforcing one social concern on another, and how to address multiple social concerns simultaneously. We then focus on robustness of machine learning under two representative scenarios, distribution shift and adversarial attack. In the former scenario, we present robust learning based on kernel reweighing and Heckman model. In the second scenario, we present adaptive defense that purposely leverages multiple types of adversarial samples to learn the context information in the training. We conclude the talk with some future research directions.

Data Curation and SARS-CoV-2: population genomics of 2.5 million genomes

Covid-19 has become a global pandemic, and recently Arkansas has seen a dramatic increase in number of cases, mainly due to a new variant (“Delta”). Using a population genomics approach, we are in the third wave with the current Delta variant accounting for about 83% of the strains sequenced. This has been preceded by the Alpha variant, which peaked in March of this year (2021), and another less characterized variant (Janus), which peaked in September 2020. Each of these variants has become better adapted for infecting and spreading within the human population.

Education, Outreach, and Workforce Development

Through DART, we plan to implement a wide range of professional development and data science education activities to engage K20 learners in Arkansas. Our vision is that Arkansas will have a statewide educational ecosystem, where learners of any age can receive a designed, consistent, scaffolded education in data science, with further educational opportunities or job opportunities at appropriate points in their careers. To accomplish this, our mission is to create a model data science and analytics program for Arkansas schools that will promote problem-based and experiential-based pedagogy in critical thinking and analysis, technology familiarity, and a foundation in math and statistics.

Learning-based Approaches to Data-driven Predictions

A major challenge in building secure and widely adopted deep learning systems is that they sometimes make wrong, unexplainable, and/or unpredictable misclassifications. This talk overviews initial efforts towards techniques using large-scale deep learning with multi-source integrated data sets. In addition, we introduce the integration of statistical learning approaches with learning-based frameworks.

Media Matters: Innovations to Improve the Value Derived from Social Media Networks

Social media platforms have billions of active users and significantly impacted our society. New types of platforms or new features in existing platforms continue to be developed to meet users’ demands. With an increasingly large amount of unstructured social data on these platforms, social media and networking analysis research aims to develop efficient, reliable, scalable, explainable, reproducible, and theoretically grounded data science approaches to understand our digital behaviors and make it a safer and valuable place. Our talk will focus on collective opinions and their evolution, deviant behavior modeling, automatic annotation of multimedia data, and informing disaster response with social media.