Machine learning models are increasingly used in critial real-world applications such as self-driving cars, loan processing, fake news detection, and more. However these models are highly complex and have a reputation for being “black boxes” – when they make a prediction, it is unclear how the decision was made. Similarly, it is not clear what the model is using to make a prediction and how changes in the data would affect its predictions.

To this end, our lab develops algorithms to interpret complex machine learning models (e.g., deep neural networks, random forests, etc) by identifying training data that affected a prediction, describing what parts of the model are learning, and how user generated inputs can be improved to better help the model.

Deep Neural Inspection

Deep neural networks are revolutionizing many domains and increasingly employed in production and real-world environments. Yet, how do we ensure that learned models behave reliably and as intended? Software engineering principles such as abstraction and modularity help us build and understand reliable systems by principled construction. Yet neural networks are black boxes akin to a block of assembly code.

The Deep Neural Inspection (DNI) project aims to develop software primitives to identify whether subsets of a neural network have learned developer-understandable logic. This serves as the basis towards introducing software engineering concepts such as abstractions, modularity, and assertions to the development and understanding of neural network models.

FACE PALM

When a Deep Neural Network makes a misprediction, it can be challenging for a developer to understand why. While there are many models for interpretability in terms of predictive features, it may be more natural to isolate a small set of training examples that have the greatest influence on the prediction. However, it is often the case that every training example contributes to a prediction in some way but with varying degrees of responsibility.

Partition Aware Local Models (PALM) is a tool that learns and summarizes this responsibility structure to aide machine learning debugging. PALM approximates a complex model (e.g., a deep neural network) using a two-part surrogate model: a meta-model that partitions the training data, and a set of sub-models that approximate the patterns within each partition. These sub-models can be arbitrarily complex to capture intricate local patterns. However, the meta-model is constrained to be a decision tree. This way the user can examine the structure of the meta-model, determine whether the rules match intuition, and link problematic test examples to responsible training data efficiently. Queries to PALM are nearly 30x faster than nearest neighbor queries for identifying relevant data, which is a key property for interactive applications

Segment-Predict-Explain

Segement-Predict-Explain is a pattern for generating content-specific feedback for users writing text content such as product reviews, housing listings, posts. It uses a novel perturbation-based technique to generate Prescriptive Explanations. This technique uses a quality prediction model and the features of the user’s input text, and assigns responsibility to each feature in proportion to the amount that it will contribute to improving the model’s predicted quality. This can be used to generate feedback to explain why the user’s writing is low quality and specific suggests on how to improve the writing.

Publications

Kitana: A Data-as-a-Service Platform
Zachary Huang, Pranav Subramaniam, Raul Fernandez, Eugene Wu
In Review 2023
Calibration: A Simple Trick for Fast Interactive Join Analytics
Zachary Huang, Eugene Wu
arXiV 2022
How I Stopped Worrying About Training Data Bugs and Started Complaining
Lampros Flokas, Weiuan Wu, Jiannan Wang, Nakul Verma, Eugene Wu
DEEM Workshop 2022
A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More
Iddo Drori, Sunny Tran, Roman Wang, Newman Cheng, Kevin Liu, Leonard Tang, Elizabeth Ke, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang
PNAS 2022 (in review)
Complaint-Driven Training Data Debugging at Interactive Speeds
Lampros Flokas, Young Wu, Jiannan Wang, Nakul Verma, Eugene Wu
SIGMOD 2022
Enabling SQL-based training data debugging for federated learning
Young Wu, Yejia Liu, Lampros Flokas, Jiannan Wang, Eugene Wu
VLDB 2022
Explaining SQL-ML Queries with Bayesian Optimization
Brandon Lockhard, Jiannan Wang, Eugene Wu
VLDB 2021
From Cleaning Before ML to Cleaning For ML
Felix Neutatz, Binger Chen, Ziawasch Abedjan, Eugene Wu
Invited, IEEE Data Engineering Bulletin 2021
Complaint-driven Training Data Debugging for Query 2.0
Young Wu, Lampros Flokas, Jiannan Wang, Eugene Wu
SIGMOD 2020
Towards Complaint-driven ML Workflow Debugging
Lampros Flokas, Young Wu, Jiannan Wang, Eugene Wu
MLOps 2020
AlphaClean: Automatic Generation of Data Cleaning Pipelines
Sanjay Krishnan, Eugene Wu
ArXiv 2019
DeepBase: Deep Inspection of Neural Networks
Thibault Sellam, Kevin Lin, Ian Yiran Huang, Michelle Yang, Carl Vondrick, Eugene Wu
SIGMOD 2019
Deep Neural Inspection Using DeepBase
Yiru Chen, Yiliang Shi, Boyuan Chen, Thibault Sellam, Carl Vondrick, Eugene Wu
LearnSys 2018 Workshop at NIPS
CIDR2: Crazier Innovations in Databases JOIN Reinforcement-learning Research
Eugene Wu
CIDR 2019 Abstract
Leveraging Quality Prediction Models for Automatic Writing Feedback
Hamed Nilforoshan, Eugene Wu
ICWSM 2018
"I Like the Way You Think!" Inspecting the Internal Logic of Recurrent Neural Networks
Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
SysML 2018
PALM: Machine Learning Explanations For Iterative Debugging
Sanjay Krishnan, Eugene Wu
HILDA 2017
Segment-Predict-Explain for Automatic Writing Feedback
Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
Collective Intelligence 2017
Indexing Cost Sensitive Prediction
Leilani Battle, Edward Benson, Aditya Parameswaran, Eugene Wu
Technical Report 2016

ML Explanation

Deep Neural Inspection

FACE PALM

Segment-Predict-Explain

Publications