Why is DART important?

The growing array of tools – powerful high-level programming languages, distributed data storage and computation, visualization tools, statistical modeling, and machine learning – along with a staggering array of big data sources, has the potential to empower people to make better and more timely decisions in science, business, and society. However, there remain fundamental barriers to practical application and acceptance of data analytics, any one of which could derail or impede its full development and contributions.

DART research will systematically investigate key aspects of these four barriers and develop novel, integrated solutions to address them by operating as a true, multi-institution, multi-disciplinary data science research center and educational ecosystem, in which faculty and students from campuses across the state work together on targeted problems important to the research community and the economy of Arkansas.

The 4 Integrative Barriers to Wide Implementation of Big Data Analytics

Big Data Management

Before data streams and datasets can be used in the many kinds of learning models, they are often manually curated, or at the least, curated for a specific problem. We still rely on hosts of analysts to assess the content and quality of source data, engineer features, define and transform data models, annotate training data, and track data processes and movement.

Security and Privacy

Government agencies and private entities collect and integrate large amounts of data, process it in real-time, and deliver products or services based on these data to consumers and constituents. There are increasing worries that both the acquisition and subsequent application of big data analytics are not secure or well-managed. This can create a risk of privacy breaches, enable discrimination, and negatively impact diversity in our society.

Model Interpretability

Machine learning models often sacrifice interpretability for predictive power and are difficult to generalize beyond their training and test data. But interpretability and generalizability of trained models is critical in many decision-making systems and/or processes, especially when learning from multi-modal and heterogeneous big data sources. There is a continuing to need to better balance the predictive power of complex machine learning models with the strengths of statistical models to better configure deep learning models to allow humans to see the reasoning behind the predictions.

Data-Skilled Workforce

As data-driven science and decision making become commonplace, our state and nation will need to rely on a well-educated workforce at almost all levels of responsibility to be aware of the power and pitfalls of using data in decision making. The gap between the demand for data-skilled workers and the available degree programs is so large that companies are desperate for more flexible credentialing and alternatives to traditional baccalaureate degrees. A range of offerings from certificates to advanced degrees and apprenticeships must be implemented nationwide to address this issue.

Participating Institutions

Project Component: Education Research Education & Research