WuLab
Columbia University

goals

The future of industry relies on the ability to make data-driven decisions, however it is only accessible to technical and statistical experts that can program, clean and combine data, visualize large datasets, and debug complex analysis pipelines.

The goal of the WuLab is to dramatically accelerate the democratization of data, and to train high-quality, world-class researchers.

projects

Our overarching mission is to work on 🔥💣 projects, with a leaning towards addressing three bottlenecks in the future of data analysis: data cleaning, creating interactive data exploration and visualization interfaces, and understanding analysis results. These slides describe our lab’s vision and a few recent projects.

Several of our systems are named after Mortal Kombat ninjas

Data Cleaning Data analysis and machine learning are increasingly reliant on the quality of the input data—spurious errors and systematic corruptions can result in misleading and incorrect results. We work on automated data cleaning algorithms that are tailored towards data science applications, as well as crowdsourcing systems for collecting high-quality new data.

Explanation & Interpretation Data analysis is never one-shot – it is an iterative process where analysis results spur new analyses or ways to debug the analysis. We work on data explanation systems that enable analysts to highlight abnomalies in analysis results and explain potential reasons to investigate, as well as machine learning explanation techniques that explain how and what machine learning models (e.g., deep neural networks) learn to make their predictions.

Interactive Data Analysis System The current interface for data analysis is predominantly code. We are studying techniques to improve how to design, architect, and build scalable interactive visual analysis applications. The Data Visualization Management System makes it significantly easier to build and scale interactive data visualization systems. Precision Interfaces extends this technology to automatically generate new visual exploration interfaces tailored to a long tail of data analysis tasks.

join

We are always looking for hard-working, smart, driven students that are excited pushing forward how humans interact with data. If you are a prospective graduate student or postdoc, read our application document. If you are an undergraduate, masters, or potential intern, please fill out our questionnaire.

contact

Email us at ewu@cs.columbia.edu

people
Fotis Psallidas Grad Student
Thibault Sellam Postdoc
Haneen Mohammed Grad Student
Yiliang Shi Grad Student
Yiru Chen Grad Student
Yifan Wu Collab (Cal)
Sanjay Krishnan Collab (UChicago)
Young Wu Collab (SFU)
Robbie Netzorg
Undergrad
Hamed Nilforoshan Undergrad
HaoCi Zhang Masters
Qianrui Zhang
Intern
Lauren Arnett
Undergrad
Conder Shou
Undergrad
Amita Shukla
Undergrad
Bill Sun Intern
Alumni and Past Collaborators
Kevin Lin
Undergrad (now @AI2)
Ian Huang
Undergrad
Tejas Dharamsi Masters (now @Trifacta)
Lily-Xiaoxuan Liu
Intern
Xiaolan Wang Collab (UMass)
Daniel Haas Collab (Cal)
Lilong Jiang Collab (OSU)
Daniel Alabi Masters
Zhengjie Miao Masters
Larry Xu Undergrad
James Sands
Undergrad
Naina Sahrawat
Undergrad
Rahul Khanna Undergrad
Mengyang Lyu
Intern
Ziyun Wei
Intern
Alex Studer
High School
Gabriel Ryan
Masters
Salim M'jahad
Undergrad
publications
  1. DeepBase: Deep Inspection of Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Michelle Yang, Carl Vondrick, Eugene Wu
    SIGMOD 2019
  2. Deep Neural Inspection Using DeepBase
    Yiru Chen, Yiliang Shi, Boyuan Chen, Thibault Sellam, Carl Vondrick, Eugene Wu
    LearnSys 2018 Workshop at NIPS
  3. CIDR2: Crazier Innovations in Databases JOIN Reinforcement-learning Research
    Eugene Wu
    CIDR 2019 Abstract
  4. Ten Years of Web Tables
    Michael Cafarella, Alon Halevy, Daisy Zhe Wang, Hongrae Lee, Jayant Madhavan, Cong Yu, Eugene Wu,
    PVLDB 2018 Invited Paper,
  5. DeepBase: Deep Inspection of Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Michelle Yang, Carl Vondrick, Eugene Wu
    Technical Report
  6. At a Glance: Approximate Entropy as a Measure of Line Chart Visualization Complexity
    Gabriel Ryan, Abigail Mosca, Remco Chang, Eugene Wu
    InfoVIS 2018
  7. Provenance in Interactive Visualizations
    Fotis Psallidas, Eugene Wu
    HILDA 2018
  8. Leveraging Quality Prediction Models for Automatic Writing Feedback
    Hamed Nilforoshan, Eugene Wu
    ICWSM 2018
  9. Precision Interfaces for Different Modalities
    HaoCi Zhang, Viraj Rai, Thibault Sellam, Eugene Wu
    SIGMOD (demo) 2018
  10. Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications
    Fotis Psallidas, Eugene Wu
    SIGMOD (demo) 2018
  11. Deeper: A Data Enrichment System Powered by Deep Web.
    Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu.
    SIGMOD (demo) 2018
  12. “I Like the Way You Think!” Inspecting the Internal Logic of Recurrent Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
    SysML 2018
  13. A “Probabilistic” Model of Research
    Eugene Wu
    Blog Post 2018
  14. Smoke: Fine-grained Lineage at Interactive Speeds
    Fotis Psallidas, Eugene Wu
    VLDB 2018 Preprint
  15. Mining Precision Interfaces From Query Logs
    Haoci Zhang, Thibault Sellam, Eugene Wu
    Tech Report 2017
  16. BoostClean: Automated Error Detection and Repair for Machine Learning
    Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Eugene Wu
    Tech Report 2017
  17. Load-n-Go: Fast Approximate Join Visualizations That Improve Over Time
    Marianne Procopio, Carlos Scheidegger, Eugene Wu, Remco Chang
    DSIA 2017
  18. Approximate Entropy as a Measure of Line Chart Complexity
    Gabriel Ryan, Abigail Mosca, Eugene Wu, Remco Chang
    InfoVIS Poster 2017
  19. Towards a Bayesian Model of Data Visualization Cognition
    Yifan Wu, Larry Xu, Remco Chang, Eugene Wu
    DECISIVE 2017
  20. PreCog: Improving Crowdsourced Data Quality Before Acquisition
    Hamed Nilforoshan, Jiannan Wang, Eugene Wu
    Arxiv 2017
  21. Precision Interfaces
    Haoci Zhang, Thibault Sellam, Eugene Wu
    HILDA 2017
  22. PALM: Machine Learning Explanations For Iterative Debugging
    Sanjay Krishnan, Eugene Wu
    HILDA 2017
  23. Segment-Predict-Explain for Automatic Writing Feedback
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    Collective Intelligence 2017
  24. Dialectic: Enhancing Text Input Fields with Automatic Feedback to Improve Social Content Writing Quality
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    ArXiv 2017
  25. Skipping-oriented Partitioning for Columnar Layouts
    Liwen Sun, Michael J. Franklin, Jiannan Wang, Eugene Wu
    VLDB 2017
  26. Combining Design and Performance in a Data Visualization Management System
    Eugene Wu, Fotis Psallidas, Zhengjie Miao, Haoci Zhang,Laura Rettig, Yifan Wu, Thibault Sellam
    CIDR 2017
  27. CIDR: Chat-oriented Innovations in Database Research
    Eugene Wu
    CIDR 2017 Abstract
  28. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2017
  29. A DeVIL-ish Approach to Inconsistency in Interactive Visualizations
    Yifan Wu, Joe Hellerstein, Eugene Wu
    Hilda 2016
  30. PFunk-H: Approximate Query Processing using Perceptual Models
    Daniel Alabi, Eugene Wu
    Hilda 2016
  31. Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations
    Sanjay Krishnan, Daniel Haas, Michael J. Franklin, Eugene Wu
    Hilda 2016
  32. TrendQuery: A System for Interactive Exploration of Trends
    Niranjan Kamat, Eugene Wu, Arnab Nandi
    Hilda 2016
  33. ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning
    Sanjay Krishnan, Michael Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu
    SIGMOD 2016 Demo (Demo Award Winner!)
  34. Graphical Perception in Animated Bar Charts
    Eugene Wu, Lilong Jiang, Larry Xu, Arnab Nandi
    Arxiv 2016
  35. QFix: Demonstrating error diagnosis in query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2016 Demo
  36. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    Arxiv 2016
  37. ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models
    Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, Ken Goldberg
    Arxiv 2016
  38. Towards Perception-aware Interactive Data Visualization Systems
    Eugene Wu, Arnab Nandi
    DSIA 2015 Slides
  39. SampleClean: Fast and Reliable Analytics on Dirty Data (overview paper)
    Sanjay Krishnan, Jiannan Wang, Michael J Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu
  40. CLAMShell: Speeding up Crowds for Low-latency Data Labeling
    Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin
    VLDB 2016
  41. Automated Metadata Construction to Support Portable Building Applications
    Arka A. Bhattacharya, Dezhi Hong, David Culler, Jorge Ortiz, Kamin Whitehouse, Eugene Wu
    BuildSys 2015
  42. Wisteria: Nurturing Scalable Data Cleaning Infrastructure (Demo)
    Daniel Haas, Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Eugene Wu
    VLDB 2015
  43. Collaborative Data Analytics with Datahub (Demo)
    Anant Bhardwaj, Amol Deshpande, Aaron Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang
    VLDB 2015
  44. Indexing Cost Sensitive Prediction
    Leilani Battle, Edward Benson, Aditya Parameswaran, Eugene Wu
    Technical Report
  45. Explaining Data in Visual Analytic Systems
    Eugene Wu
    Doctoral Thesis
  46. The Case for Data Visualization Management Systems
    Eugene Wu, Leilani Battle, Samuel Madden
    VLDB 2014
  47. Data In Context: Aiding News Consumers while Taming Dataspaces
    Eugene Wu, Adam Marcus and Sam Madden
    DBCrowd 2013
  48. Mobile applications need Targeted Micro-updates
    Alvin Cheung, Lenin Ravindranath, Eugene Wu, Samuel Madden, Hari Balakrishnan
    APSYS 2013
  49. Scorpion: Explaining Away Outliers in Aggregate Queries
    Eugene Wu, Samuel Madden
    VLDB 2013 (Selected as one of the best papers of the conference!) Slides
  50. SubZero: a Fine-Grained Lineage System for Scientific Databases
    Eugene Wu, Samuel Madden, Michael Stonebraker
    ICDE 2013 (Best of conference)
  51. A Demonstration of DBWipes: Clean as You Query
    Eugene Wu, Samuel Madden, Michael Stonebraker
    VLDB 2012
  52. Human-powered Sorts and Joins
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    VLDB 2012
  53. Partitioning Techniques for Fine-Grained Indexing
    Eugene Wu, Sam Madden
    ICDE 2011
  54. Demonstration of Qurk: A Query Processor for Human Operators
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    SIGMOD 2011
  55. No Bits Left Behind
    Eugene Wu, Carlo Curino, Sam Madden
    CIDR 2011
  56. Crowdsourced Databases: Query Processing with People
    Adam Marcus, Eugene Wu, Sam Madden, Robert Miller
    CIDR 2011
  57. Relational Cloud: A Database-as-a-Service for the Cloud
    Carlo Curino, Evan Jones, Raluca Popa, Nirmesh Malviya, Eugene Wu, Sam Madden, Hari Balakrishnan, Nickolai Zeldovich
    CIDR 2011
  58. Relational Cloud: The Case for a Database Service
    Carlo Curino, Evan Jones, Yang Zhang, Eugene Wu, Sam Madden
  59. TrajStore: An Adaptive Storage System for Very Large Trajectory Data Sets
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    ICDE 2010
  60. Demonstration of the TrajStore System
    Eugene Wu, Philippe Cudre-Mauroux, Sam Madden
    VLDB 2009
  61. The Case for RodentStore: An Adaptive, Declarative Storage System
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    CIDR 2009
  62. WebTables: Exploring the Power of Tables on the Web
    Michael Cafarella, Alon Halevy, Daisy Wang, Eugene Wu, Yang Zhang
    VLDB 2008
  63. Uncovering the Relational Web
    Michael Cafarella, Nodira Khoussainova, Daisy Wang, Eugene Wu, Yang Zhang, Alon Halevy
    WebDB 2008
  64. SASE: Complex Event Processing over Streams (Demo)
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CIDR 2007
  65. High-performance complex event processing over streams
    Eugene Wu, Yanlei Diao, Shariq Rizvi
    SIGMOD 2006
  66. SASE: Complex Event Processing over Streams
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CoRR 2006
  67. Probabilistic Data Management for Pervasive Computing: The Data Furnace Project
    Minos N. Garofalakis, Kurt P. Brown, Michael J. Franklin, Joseph M. Hellerstein, Daisy Zhe Wang, Eirinaios Michelakis, Liviu Tancau, Eugene Wu, Shawn R. Jeffery, Ryan Aipperspach
    IEEE Data Eng. Bull.
  68. Design Considerations for High Fan-In Systems: The HiFi Approach
    Michael J. Franklin, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Owen Cooper, Anil Edakkunni, Wei Hong
    CIDR 2005
  69. HiFi: A Unified Architecture for High Fan-in Systems
    Owen Cooper, Anil Edakkunni, Michael J. Franklin, Wei Hong, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu
    VLDB 2004 Demo