WuLab
Columbia University

goals

The future of industry relies on the ability to make data-driven decisions, however it is only accessible to technical and statistical experts that can program, clean and combine data, visualize large datasets, and debug complex analysis pipelines.

The goal of the WuLab is to dramatically accelerate the democratization of data, and to train high-quality, world-class researchers.

projects

Our overarching mission is to work on 🔥💣 projects, with a leaning towards addressing three bottlenecks in the future of data analysis: data cleaning, creating interactive data exploration and visualization interfaces, and understanding analysis results. These slides describe our lab’s vision and a few recent projects.

Several of our systems are named after Mortal Kombat ninjas

Data Cleaning Data analysis and machine learning are increasingly reliant on the quality of the input data—spurious errors and systematic corruptions can result in misleading and incorrect results. We work on automated data cleaning algorithms that are tailored towards data science applications, as well as crowdsourcing systems for collecting high-quality new data.

Explanation & Interpretation Data analysis is never one-shot – it is an iterative process where analysis results spur new analyses or ways to debug the analysis. We work on data explanation systems that enable analysts to highlight abnomalies in analysis results and explain potential reasons to investigate, as well as machine learning explanation techniques that explain how and what machine learning models (e.g., deep neural networks) learn to make their predictions.

Interactive Data Analysis System The current interface for data analysis is predominantly code. We are studying techniques to improve how to design, architect, and build scalable interactive visual analysis applications. The Data Visualization Management System makes it significantly easier to build and scale interactive data visualization systems. Precision Interfaces extends this technology to automatically generate new visual exploration interfaces tailored to a long tail of data analysis tasks.

join

We are always looking for hard-working, smart, driven students that are excited pushing forward how humans interact with data. If you are a prospective graduate student or postdoc, read our application document. If you are an undergraduate, masters, or potential intern, please fill out our questionnaire.

contact

Email us at ewu@cs.columbia.edu

people
Zachary Huang Grad Student
Haneen Mohammed Grad Student
Yiru Chen Grad Student
Yifan Wu Collab (Cal)
Young Wu Collab (SFU)
Lana Ramjit
Collab (UCLA)
Lauren Arnett
Undergrad
Qianrui Zhang
Intern
Bill Sun Intern
Jake Fisher
Undergrad
Alumni and Past Collaborators
Fotis Psallidas Grad Student
Thibault Sellam Postdoc
Robbie Netzorg
Undergrad
Hamed Nilforoshan Undergrad
Conder Shou
Undergrad
Amita Shukla
Undergrad
HaoCi Zhang Masters
Kevin Lin
Undergrad (now @AI2)
Ian Huang
Undergrad
Tejas Dharamsi Masters (now @Trifacta)
Lily-Xiaoxuan Liu
Intern
Xiaolan Wang Collab (UMass)
Daniel Haas Collab (Cal)
Lilong Jiang Collab (OSU)
Daniel Alabi Masters
Zhengjie Miao Masters
Larry Xu Undergrad
James Sands
Undergrad
Naina Sahrawat
Undergrad
Rahul Khanna Undergrad
Mengyang Lyu
Intern
Ziyun Wei
Intern
Alex Studer
High School
Gabriel Ryan
Masters
Salim M'jahad
Undergrad
publications
  1. Facilitating Exploration with Interaction Snapshots under High Latency
    Yifan Wu, Remco Chang, Joe Hellerstein, Eugene Wu
    InfoVIS (short paper) 2020
  2. Continuous Prefetch for Interactive Data Applications
    Haneen Mohammed, Ziyun Wei, Ravi Netravali, Eugene Wu
    VLDB 2020
  3. Complaint-driven Training Data Debugging for Query 2.0
    Young Wu, Lampros Flokas, Jiannan Wang, Eugene Wu
    SIGMOD 2020
  4. Physical Visualization Design
    Lana Ramjit, Zhaoning Kong, Ravi Netravali, Eugene Wu
    SIGMOD (demo) 2020
  5. Towards Complaint-driven ML Workflow Debugging
    Lampros Flokas, Young Wu, Jiannan Wang, Eugene Wu
    MLOps 2020
  6. Monte Carlo Tree Search for Generating Interactive Data Analysis Interfaces
    Yiru Chen, Eugene Wu
    Intelligent Process Automation (IPA) 2020
  7. AlphaClean: Automatic Generation of Data Cleaning Pipelines
    Sanjay Krishnan, Eugene Wu
    ArXiv 2019
  8. Towards Democratizing Relational Data Visualization
    Nan Tang, Eugene Wu, Guoliang Li
    SIGMOD 2019 Tutorial
  9. Precision Interfaces
    Qianrui Zhang, Haoci Zhang, Viraj Rai, Thibault Sellam, Eugene Wu
    SIGMOD 2019
  10. Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment
    Pei Wang, Jiannan Wang, Ryan Shea, Eugene Wu
    SIGMOD 2019
  11. Cross-platform Interactions and Popularity in the Live-streaming Community
    Lauren Arnett, Robert Netzorg, Augustin Chaintreau, Eugene Wu
    CHI Latebreaking 2019
  12. DeepBase: Deep Inspection of Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Michelle Yang, Carl Vondrick, Eugene Wu
    SIGMOD 2019
  13. Deep Neural Inspection Using DeepBase
    Yiru Chen, Yiliang Shi, Boyuan Chen, Thibault Sellam, Carl Vondrick, Eugene Wu
    LearnSys 2018 Workshop at NIPS
  14. CIDR2: Crazier Innovations in Databases JOIN Reinforcement-learning Research
    Eugene Wu
    CIDR 2019 Abstract
  15. Ten Years of Web Tables
    Michael Cafarella, Alon Halevy, Daisy Zhe Wang, Hongrae Lee, Jayant Madhavan, Cong Yu, Eugene Wu
    PVLDB 2018 Invited Paper,
  16. At a Glance: Approximate Entropy as a Measure of Line Chart Visualization Complexity
    Gabriel Ryan, Abigail Mosca, Remco Chang, Eugene Wu
    InfoVIS 2018 Code
  17. Provenance in Interactive Visualizations
    Fotis Psallidas, Eugene Wu
    HILDA 2018
  18. Leveraging Quality Prediction Models for Automatic Writing Feedback
    Hamed Nilforoshan, Eugene Wu
    ICWSM 2018
  19. Precision Interfaces for Different Modalities
    Haoci Zhang, Viraj Rai, Thibault Sellam, Eugene Wu
    SIGMOD (demo) 2018
  20. Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications
    Fotis Psallidas, Eugene Wu
    SIGMOD (demo) 2018
  21. Deeper: A Data Enrichment System Powered by Deep Web.
    Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu
    SIGMOD (demo) 2018
  22. "I Like the Way You Think!" Inspecting the Internal Logic of Recurrent Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
    SysML 2018
  23. A "Probabilistic" Model of Research
    Eugene Wu
    Blog Post 2018
  24. Smoke: Fine-grained Lineage at Interactive Speeds
    Fotis Psallidas, Eugene Wu
    VLDB 2018
  25. Mining Precision Interfaces From Query Logs
    Haoci Zhang, Thibault Sellam, Eugene Wu
    Tech Report 2017
  26. BoostClean: Automated Error Detection and Repair for Machine Learning
    Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Eugene Wu
    Tech Report 2017
  27. Load-n-Go: Fast Approximate Join Visualizations That Improve Over Time
    Marianne Procopio, Carlos Scheidegger, Eugene Wu, Remco Chang
    DSIA 2017
  28. Approximate Entropy as a Measure of Line Chart Complexity
    Gabriel Ryan, Abigail Mosca, Eugene Wu, Remco Chang
    InfoVIS Poster 2017
  29. Towards a Bayesian Model of Data Visualization Cognition
    Yifan Wu, Larry Xu, Remco Chang, Eugene Wu
    DECISIVE 2017
  30. PreCog: Improving Crowdsourced Data Quality Before Acquisition
    Hamed Nilforoshan, Jiannan Wang, Eugene Wu
    Arxiv 2017
  31. Precision Interfaces
    Haoci Zhang, Thibault Sellam, Eugene Wu
    HILDA 2017
  32. PALM: Machine Learning Explanations For Iterative Debugging
    Sanjay Krishnan, Eugene Wu
    HILDA 2017
  33. Segment-Predict-Explain for Automatic Writing Feedback
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    Collective Intelligence 2017
  34. Dialectic: Enhancing Text Input Fields with Automatic Feedback to Improve Social Content Writing Quality
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    ArXiv 2017
  35. Skipping-oriented Partitioning for Columnar Layouts
    Liwen Sun, Michael J. Franklin, Jiannan Wang, Eugene Wu
    VLDB 2017
  36. Combining Design and Performance in a Data Visualization Management System
    Eugene Wu, Fotis Psallidas, Zhengjie Miao, Haoci Zhang, Laura Rettig, Yifan Wu, Thibault Sellam
    CIDR 2017
  37. CIDR: Chat-oriented Innovations in Database Research
    Eugene Wu
    CIDR 2017 Abstract
  38. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2017
  39. A DeVIL-ish Approach to Inconsistency in Interactive Visualizations
    Yifan Wu, Joe Hellerstein, Eugene Wu
    Hilda 2016
  40. PFunk-H: Approximate Query Processing using Perceptual Models
    Daniel Alabi, Eugene Wu
    Hilda 2016
  41. Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations
    Sanjay Krishnan, Daniel Haas, Michael J. Franklin, Eugene Wu
    Hilda 2016
  42. TrendQuery: A System for Interactive Exploration of Trends
    Niranjan Kamat, Eugene Wu, Arnab Nandi
    Hilda 2016
  43. ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning
    Sanjay Krishnan, Michael Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu
    SIGMOD 2016 Demo (Demo Award Winner!)
  44. Graphical Perception in Animated Bar Charts
    Eugene Wu, Lilong Jiang, Larry Xu, Arnab Nandi
    Arxiv 2016
  45. QFix: Demonstrating error diagnosis in query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2016 Demo
  46. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    Arxiv 2016
  47. ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models
    Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, Ken Goldberg
    Arxiv 2016
  48. Towards Perception-aware Interactive Data Visualization Systems
    Eugene Wu, Arnab Nandi
    DSIA 2015 Slides
  49. SampleClean: Fast and Reliable Analytics on Dirty Data (overview paper)
    Sanjay Krishnan, Jiannan Wang, Michael J Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu
    IEEE Data Eng. Bulletin 2015
  50. CLAMShell: Speeding up Crowds for Low-latency Data Labeling
    Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin
    VLDB 2016
  51. Automated Metadata Construction to Support Portable Building Applications
    Arka A. Bhattacharya, Dezhi Hong, David Culler, Jorge Ortiz, Kamin Whitehouse, Eugene Wu
    BuildSys 2015
  52. Wisteria: Nurturing Scalable Data Cleaning Infrastructure (Demo)
    Daniel Haas, Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Eugene Wu
    VLDB 2015
  53. Collaborative Data Analytics with Datahub (Demo)
    Anant Bhardwaj, Amol Deshpande, Aaron Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, Rebecca Zhang
    VLDB 2015
  54. Indexing Cost Sensitive Prediction
    Leilani Battle, Edward Benson, Aditya Parameswaran, Eugene Wu
    Technical Report 2016
  55. Explaining Data in Visual Analytic Systems
    Eugene Wu
    Doctoral Thesis 2015
  56. The Case for Data Visualization Management Systems
    Eugene Wu, Leilani Battle, Samuel Madden
    VLDB 2014
  57. Data In Context: Aiding News Consumers while Taming Dataspaces
    Eugene Wu, Adam Marcus, Sam Madden
    DBCrowd 2013
  58. Mobile applications need Targeted Micro-updates
    Alvin Cheung, Lenin Ravindranath, Eugene Wu, Samuel Madden, Hari Balakrishnan
    APSYS 2013
  59. Scorpion: Explaining Away Outliers in Aggregate Queries
    Eugene Wu, Samuel Madden
    VLDB 2013 (Best-of) Slides
  60. SubZero: a Fine-Grained Lineage System for Scientific Databases
    Eugene Wu, Samuel Madden, Michael Stonebraker
    ICDE 2013 (Best-of)
  61. A Demonstration of DBWipes: Clean as You Query
    Eugene Wu, Samuel Madden, Michael Stonebraker
    VLDB 2012
  62. Human-powered Sorts and Joins
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    VLDB 2012
  63. Partitioning Techniques for Fine-Grained Indexing
    Eugene Wu, Sam Madden
    ICDE 2011
  64. Demonstration of Qurk: A Query Processor for Human Operators
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    SIGMOD 2011
  65. No Bits Left Behind
    Eugene Wu, Carlo Curino, Sam Madden
    CIDR 2011
  66. Crowdsourced Databases: Query Processing with People
    Adam Marcus, Eugene Wu, Sam Madden, Robert Miller
    CIDR 2011
  67. Relational Cloud: A Database-as-a-Service for the Cloud
    Carlo Curino, Evan Jones, Raluca Popa, Nirmesh Malviya, Eugene Wu, Sam Madden, Hari Balakrishnan, Nickolai Zeldovich
    CIDR 2011
  68. Relational Cloud: The Case for a Database Service
    Carlo Curino, Evan Jones, Yang Zhang, Eugene Wu, Sam Madden
    MIT Tech Report 2010
  69. TrajStore: An Adaptive Storage System for Very Large Trajectory Data Sets
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    ICDE 2010
  70. Demonstration of the TrajStore System
    Eugene Wu, Philippe Cudre-Mauroux, Sam Madden
    VLDB 2009
  71. The Case for RodentStore: An Adaptive, Declarative Storage System
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    CIDR 2009
  72. WebTables: Exploring the Power of Tables on the Web
    Michael Cafarella, Alon Halevy, Daisy Wang, Eugene Wu, Yang Zhang
    VLDB 2008
  73. Uncovering the Relational Web
    Michael Cafarella, Nodira Khoussainova, Daisy Wang, Eugene Wu, Yang Zhang, Alon Halevy
    WebDB 2008
  74. SASE: Complex Event Processing over Streams (Demo)
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CIDR 2007
  75. High-performance complex event processing over streams
    Eugene Wu, Yanlei Diao, Shariq Rizvi
    SIGMOD 2006
  76. SASE: Complex Event Processing over Streams
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CoRR 2006
  77. Probabilistic Data Management for Pervasive Computing: The Data Furnace Project
    Minos N. Garofalakis, Kurt P. Brown, Michael J. Franklin, Joseph M. Hellerstein, Daisy Zhe Wang, Eirinaios Michelakis, Liviu Tancau, Eugene Wu, Shawn R. Jeffery, Ryan Aipperspach
    IEEE Data Eng. Bull. 2006
  78. Design Considerations for High Fan-In Systems: The HiFi Approach
    Michael J. Franklin, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Owen Cooper, Anil Edakkunni, Wei Hong
    CIDR 2005
  79. HiFi: A Unified Architecture for High Fan-in Systems
    Owen Cooper, Anil Edakkunni, Michael J. Franklin, Wei Hong, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu
    VLDB 2004 Demo