Data Curation and SARS-CoV-2: population genomics of 2.5 million genomes

Covid-19 has become a global pandemic, and recently Arkansas has seen a dramatic increase in number of cases, mainly due to a new variant (“Delta”). Using a population genomics approach, we are in the third wave with the current Delta variant accounting for about 83% of the strains sequenced. This has been preceded by the Alpha variant, which peaked in March of this year (2021), and another less characterized variant (Janus), which peaked in September 2020. Each of these variants has become better adapted for infecting and spreading within the human population.

Education, Outreach, and Workforce Development

Through DART, we plan to implement a wide range of professional development and data science education activities to engage K20 learners in Arkansas. Our vision is that Arkansas will have a statewide educational ecosystem, where learners of any age can receive a designed, consistent, scaffolded education in data science, with further educational opportunities or job opportunities at appropriate points in their careers. To accomplish this, our mission is to create a model data science and analytics program for Arkansas schools that will promote problem-based and experiential-based pedagogy in critical thinking and analysis, technology familiarity, and a foundation in math and statistics.

Learning-based Approaches to Data-driven Predictions

A major challenge in building secure and widely adopted deep learning systems is that they sometimes make wrong, unexplainable, and/or unpredictable misclassifications. This talk overviews initial efforts towards techniques using large-scale deep learning with multi-source integrated data sets. In addition, we introduce the integration of statistical learning approaches with learning-based frameworks.

Media Matters: Innovations to Improve the Value Derived from Social Media Networks

Social media platforms have billions of active users and significantly impacted our society. New types of platforms or new features in existing platforms continue to be developed to meet users’ demands. With an increasingly large amount of unstructured social data on these platforms, social media and networking analysis research aims to develop efficient, reliable, scalable, explainable, reproducible, and theoretically grounded data science approaches to understand our digital behaviors and make it a safer and valuable place. Our talk will focus on collective opinions and their evolution, deviant behavior modeling, automatic annotation of multimedia data, and informing disaster response with social media.

Socially Aware Data Analytics

There are increasing concerns from the public on privacy, fairness, safety, and robustness issues of data analytics, data collection, data sharing, and decision making. The social awareness thrust team will present their cutting-edge research on socially aware data analytics that can address social concerns and enable big data analytics to promote social good and prevent social harm.

First Steps toward a Data Washing Machine

Data has a life cycle from planning to acquiring, cleansing, storing & sharing, integrating, application, and disposing. While AI and machine learning have taken the application of data to new levels, the other phases remain largely manually mediated processes. The research goal for the Data Life Cycle and Curation thrust is to develop fully automated processes for the other phases of the data life cycle. The presentation today describes some of the progress of the research finding ways to automate data cleansing and data integration phases of the data life cycle.