WuLab
Columbia University

goals

The future of industry relies on the ability to make data-driven decisions, however it is only accessible to technical and statistical experts that can program, clean and combine data, visualize large datasets, and debug complex analysis pipelines.

The goal of the WuLab is to dramatically accelerate the democratization of data, and to train high-quality, world-class researchers.

projects

Our overarching mission is to work on 🔥💣 projects, with a leaning towards addressing three bottlenecks in the future of data analysis: data cleaning, creating interactive data exploration and visualization interfaces, and understanding analysis results. These slides describe our lab’s vision and a few recent projects.

Several of our systems are named after Mortal Kombat ninjas

Data Cleaning Data analysis and machine learning are increasingly reliant on the quality of the input data—spurious errors and systematic corruptions can result in misleading and incorrect results. We work on automated data cleaning algorithms that are tailored towards data science applications, as well as crowdsourcing systems for collecting high-quality new data.

Explanation & Interpretation Data analysis is never one-shot – it is an iterative process where analysis results spur new analyses or ways to debug the analysis. We work on data explanation systems that enable analysts to highlight abnomalies in analysis results and explain potential reasons to investigate, as well as machine learning explanation techniques that explain how and what machine learning models (e.g., deep neural networks) learn to make their predictions.

Interactive Data Analysis System The current interface for data analysis is predominantly code. We are studying techniques to improve how to design, architect, and build scalable interactive visual analysis applications. The Data Visualization Management System makes it significantly easier to build and scale interactive data visualization systems. Precision Interfaces extends this technology to automatically generate new visual exploration interfaces tailored to a long tail of data analysis tasks.

join

We are always looking for hard-working, smart, driven students that are excited pushing forward how humans interact with data. If you are a prospective graduate student or postdoc, read our application document. If you are an undergraduate, masters, or potential intern, please fill out our questionnaire.

contact

Email us at ewu@cs.columbia.edu

people
Lampros Flokas Grad Student
Zachary Huang Grad Student
Haneen Mohammed Grad Student
Yiru Chen Grad Student
Young Wu Collab (SFU)
Jake Fisher
Undergrad
Alumni and Past Collaborators
Fotis Psallidas Grad Student
Thibault Sellam Postdoc
Robbie Netzorg
Undergrad
Hamed Nilforoshan Undergrad
Conder Shou
Undergrad
Amita Shukla
Undergrad
HaoCi Zhang Masters
Kevin Lin
Undergrad (now @AI2)
Ian Huang
Undergrad
Tejas Dharamsi Masters (now @Trifacta)
Lily-Xiaoxuan Liu
Intern
Xiaolan Wang Collab (UMass)
Daniel Haas Collab (Cal)
Yifan Wu Collab (Cal)
Lana Ramjit
Collab (UCLA)
Lauren Arnett
Undergrad
Qianrui Zhang
Intern
Bill Sun Intern
Lilong Jiang Collab (OSU)
Daniel Alabi Masters
Zhengjie Miao Masters
Larry Xu Undergrad
James Sands
Undergrad
Naina Sahrawat
Undergrad
Rahul Khanna Undergrad
Mengyang Lyu
Intern
Ziyun Wei
Intern
Alex Studer
High School
Gabriel Ryan
Masters
Salim M'jahad
Undergrad
publications
  1. NL2INTERFACE: Interactive Visualization Interface Generation from Natural Language Queries
    Yiru Chen, Ryan Li, Austin Mac, Tianbao Xie, Tao Yu, Eugene Wu
    VIS nlviz workshop 2022
  2. How Do Captions Affect Visualization Reading?
    Shelly Cheng, Hazel Zhu, Eugene Wu
    VIS Viscomm 2022
  3. Extending the View Composition Algebra to Hierarchical Data
    Eugene Wu
    arXiV 2022
  4. FaDE: Answering "Why?" Made Fast
    Alexander Yao, Lampros Flokas, Eugene Wu
    In Review 2023
  5. Kitana: A Data-as-a-Service Platform
    Zachary Huang, Pranav Subramaniam, Raul Fernandez, Eugene Wu
    In Review 2023
  6. Calibration: A Simple Trick for Fast Interactive Join Analytics
    Zachary Huang, Eugene Wu
    arXiV 2022
  7. A Grammar for Hypothesis-Driven Visual Analysis
    Ashley Suh, Yilan Jiang, Ab Mosca, Eugene Wu, Remco Chang
    In Review 2023
  8. A Sensorless Drone-based System for Mapping Indoor 3D Airflow Gradients
    Yanchen Liu, Minghui Zhao, Stephen Xia, Eugene Wu, Xiaofan Jiang
    MobiSys 2022 Demo
  9. ConnectorX: Accelerating Data Loading From Databases to Dataframes
    Xiaoying Wang, Weiyuan Wu, Jinze Wu, Yizhou Chen, Nick Zrymiak, Changbo Qu, Lampros Flokas, George Chow, Jiannan Wang, Tianzheng Wang, Eugene Wu, Qingqing Zhou
    In Revision 2023
  10. How I Stopped Worrying About Training Data Bugs and Started Complaining
    Lampros Flokas, Weiuan Wu, Jiannan Wang, Nakul Verma, Eugene Wu
    DEEM Workshop 2022
  11. Interactive Interface Generation in Notebooks
    Jeffrey Tao, Yiru Chen, Eugene Wu
    SIGMOD 2022 demo
  12. PI2: Generating Visual Analysis Interfaces From Queries
    Yiru Chen, Eugene Wu
    SIGMOD 2022
  13. View Composition Algebra for Ad Hoc Comparisons
    Eugene Wu
    TVCG 2022
  14. Reptile: Aggregation-level Explanations for Hierarchical Data
    Zachary Huang, Eugene Wu
    SIGMOD 2022
  15. A Neural Network Solves and Generates Mathematics Problems by Program Synthesis: Calculus, Differential Equations, Linear Algebra, and More
    Iddo Drori, Sunny Tran, Roman Wang, Newman Cheng, Kevin Liu, Leonard Tang, Elizabeth Ke, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang
    PNAS 2022 (in review)
  16. Complaint-Driven Training Data Debugging at Interactive Speeds
    Lampros Flokas, Young Wu, Jiannan Wang, Nakul Verma, Eugene Wu
    SIGMOD 2022
  17. Dynamic Breakpoints for Y-axis Scales
    Jacob Fisher, Remco Chang, Eugene Wu
    InfoVIS 2021 (short paper)
  18. Enabling SQL-based training data debugging for federated learning
    Young Wu, Yejia Liu, Lampros Flokas, Jiannan Wang, Eugene Wu
    VLDB 2022
  19. Explaining SQL-ML Queries with Bayesian Optimization
    Brandon Lockhard, Jiannan Wang, Eugene Wu
    VLDB 2021
  20. DIEL: Interactive Visualization Beyond the Here and Now
    Yifan Wu, Remco Chang, Joseph Hellerstein, Arvind Satyanarayan, Eugene Wu
    VIS 2021
  21. PopFactor: Live-Streamer Behavior and Popularity
    Robert Netzorg, Lauren Arnett, Augustin Chaintreau, Eugene Wu
    ICWSM 2021
  22. Impact of Cognitive Biases on Progressive Visualization
    Marianne Procopio, Ab Mosca, Carlos Scheidegger, Eugene Wu, Remco Chang
    TVCG 2021
  23. From Cleaning Before ML to Cleaning For ML
    Felix Neutatz, Binger Chen, Ziawasch Abedjan, Eugene Wu
    Invited, IEEE Data Engineering Bulletin 2021
  24. Facilitating Exploration with Interaction Snapshots under High Latency
    Yifan Wu, Remco Chang, Joe Hellerstein, Eugene Wu
    InfoVIS (short paper) 2020
  25. ActiveDeeper: A Model-based Active Data Enrichment system
    Liang Zhao, Qingcan Li, Pei Wang, Jiannan Wang, Eugene Wu
    VLDB 2020 demo
  26. Continuous Prefetch for Interactive Data Applications
    Haneen Mohammed, Ziyun Wei, Ravi Netravali, Eugene Wu
    VLDB 2020
  27. Complaint-driven Training Data Debugging for Query 2.0
    Young Wu, Lampros Flokas, Jiannan Wang, Eugene Wu
    SIGMOD 2020
  28. Physical Visualization Design
    Lana Ramjit, Zhaoning Kong, Ravi Netravali, Eugene Wu
    SIGMOD (demo) 2020
  29. Towards Complaint-driven ML Workflow Debugging
    Lampros Flokas, Young Wu, Jiannan Wang, Eugene Wu
    MLOps 2020
  30. Monte Carlo Tree Search for Generating Interactive Data Analysis Interfaces
    Yiru Chen, Eugene Wu
    Intelligent Process Automation (IPA) 2020
  31. Acorn: Aggressive Result Caching in Spark SQL
    Lana Ramjit, Matteo Interlandi, Eugene Wu, Ravi Netravali
    SOCC 2019
  32. AlphaClean: Automatic Generation of Data Cleaning Pipelines
    Sanjay Krishnan, Eugene Wu
    ArXiv 2019
  33. Towards Democratizing Relational Data Visualization
    Nan Tang, Eugene Wu, Guoliang Li
    SIGMOD 2019 Tutorial
  34. Precision Interfaces
    Qianrui Zhang, Haoci Zhang, Viraj Rai, Thibault Sellam, Eugene Wu
    SIGMOD 2019
  35. Progressive Deep Web Crawling Through Keyword Queries For Data Enrichment
    Pei Wang, Jiannan Wang, Ryan Shea, Eugene Wu
    SIGMOD 2019
  36. Cross-platform Interactions and Popularity in the Live-streaming Community
    Lauren Arnett, Robert Netzorg, Augustin Chaintreau, Eugene Wu
    CHI Latebreaking 2019
  37. DeepBase: Deep Inspection of Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Michelle Yang, Carl Vondrick, Eugene Wu
    SIGMOD 2019
  38. Deep Neural Inspection Using DeepBase
    Yiru Chen, Yiliang Shi, Boyuan Chen, Thibault Sellam, Carl Vondrick, Eugene Wu
    LearnSys 2018 Workshop at NIPS
  39. CIDR2: Crazier Innovations in Databases JOIN Reinforcement-learning Research
    Eugene Wu
    CIDR 2019 Abstract
  40. Ten Years of Web Tables
    Michael Cafarella, Alon Halevy, Daisy Zhe Wang, Hongrae Lee, Jayant Madhavan, Cong Yu, Eugene Wu
    PVLDB 2018 Invited Paper,
  41. At a Glance: Approximate Entropy as a Measure of Line Chart Visualization Complexity
    Gabriel Ryan, Abigail Mosca, Remco Chang, Eugene Wu
    InfoVIS 2018 Code
  42. Provenance in Interactive Visualizations
    Fotis Psallidas, Eugene Wu
    HILDA 2018
  43. Leveraging Quality Prediction Models for Automatic Writing Feedback
    Hamed Nilforoshan, Eugene Wu
    ICWSM 2018
  44. Precision Interfaces for Different Modalities
    Haoci Zhang, Viraj Rai, Thibault Sellam, Eugene Wu
    SIGMOD (demo) 2018
  45. Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications
    Fotis Psallidas, Eugene Wu
    SIGMOD (demo) 2018
  46. Deeper: A Data Enrichment System Powered by Deep Web.
    Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu
    SIGMOD (demo) 2018
  47. "I Like the Way You Think!" Inspecting the Internal Logic of Recurrent Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
    SysML 2018
  48. Smoke: Fine-grained Lineage at Interactive Speeds
    Fotis Psallidas, Eugene Wu
    VLDB 2018
  49. Mining Precision Interfaces From Query Logs
    Haoci Zhang, Thibault Sellam, Eugene Wu
    Tech Report 2017
  50. BoostClean: Automated Error Detection and Repair for Machine Learning
    Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Eugene Wu
    Tech Report 2017
  51. Load-n-Go: Fast Approximate Join Visualizations That Improve Over Time
    Marianne Procopio, Carlos Scheidegger, Eugene Wu, Remco Chang
    DSIA 2017
  52. Approximate Entropy as a Measure of Line Chart Complexity
    Gabriel Ryan, Abigail Mosca, Eugene Wu, Remco Chang
    InfoVIS Poster 2017
  53. Towards a Bayesian Model of Data Visualization Cognition
    Yifan Wu, Larry Xu, Remco Chang, Eugene Wu
    DECISIVE 2017
  54. PreCog: Improving Crowdsourced Data Quality Before Acquisition
    Hamed Nilforoshan, Jiannan Wang, Eugene Wu
    Arxiv 2017
  55. Precision Interfaces
    Haoci Zhang, Thibault Sellam, Eugene Wu
    HILDA 2017
  56. PALM: Machine Learning Explanations For Iterative Debugging
    Sanjay Krishnan, Eugene Wu
    HILDA 2017
  57. Segment-Predict-Explain for Automatic Writing Feedback
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    Collective Intelligence 2017
  58. Dialectic: Enhancing Text Input Fields with Automatic Feedback to Improve Social Content Writing Quality
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    ArXiv 2017
  59. Skipping-oriented Partitioning for Columnar Layouts
    Liwen Sun, Michael J. Franklin, Jiannan Wang, Eugene Wu
    VLDB 2017
  60. Combining Design and Performance in a Data Visualization Management System
    Eugene Wu, Fotis Psallidas, Zhengjie Miao, Haoci Zhang, Laura Rettig, Yifan Wu, Thibault Sellam
    CIDR 2017
  61. CIDR: Chat-oriented Innovations in Database Research
    Eugene Wu
    CIDR 2017 Abstract
  62. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2017
  63. A DeVIL-ish Approach to Inconsistency in Interactive Visualizations
    Yifan Wu, Joe Hellerstein, Eugene Wu
    HILDA 2016
  64. PFunk-H: Approximate Query Processing using Perceptual Models
    Daniel Alabi, Eugene Wu
    HILDA 2016
  65. Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations
    Sanjay Krishnan, Daniel Haas, Michael J. Franklin, Eugene Wu
    HILDA 2016
  66. TrendQuery: A System for Interactive Exploration of Trends
    Niranjan Kamat, Eugene Wu, Arnab Nandi
    HILDA 2016
  67. ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning
    Sanjay Krishnan, Michael Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu
    SIGMOD 2016 Demo (Demo Award Winner!)
  68. Graphical Perception in Animated Bar Charts
    Eugene Wu, Lilong Jiang, Larry Xu, Arnab Nandi
    Arxiv 2016
  69. QFix: Demonstrating error diagnosis in query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2016 Demo
  70. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    Arxiv 2016
  71. ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models
    Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, Ken Goldberg
    Arxiv 2016
  72. Towards Perception-aware Interactive Data Visualization Systems
    Eugene Wu, Arnab Nandi
    DSIA 2015 Slides
  73. SampleClean: Fast and Reliable Analytics on Dirty Data (overview paper)
    Sanjay Krishnan, Jiannan Wang, Michael J Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu
    IEEE Data Eng. Bulletin 2015
  74. CLAMShell: Speeding up Crowds for Low-latency Data Labeling
    Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin
    VLDB 2016
  75. Automated Metadata Construction to Support Portable Building Applications
    Arka A. Bhattacharya, Dezhi Hong, David Culler, Jorge Ortiz, Kamin Whitehouse, Eugene Wu
    BuildSys 2015
  76. Wisteria: Nurturing Scalable Data Cleaning Infrastructure
    Daniel Haas, Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Eugene Wu
    VLDB 2015 demo
  77. Collaborative Data Analytics with Datahub
    Anant Bhardwaj, Amol Deshpande, Aaron Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, Rebecca Zhang
    VLDB 2015 demo
  78. Indexing Cost Sensitive Prediction
    Leilani Battle, Edward Benson, Aditya Parameswaran, Eugene Wu
    Technical Report 2016
  79. Explaining Data in Visual Analytic Systems
    Eugene Wu
    Doctoral Thesis 2015
  80. The Case for Data Visualization Management Systems
    Eugene Wu, Leilani Battle, Samuel Madden
    VLDB 2014
  81. Vertexica: Your Relational Friend for Graph Analytics!
    Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, Mike Stonebraker
    SIGMOD 2014 demo
  82. Data In Context: Aiding News Consumers while Taming Dataspaces
    Eugene Wu, Adam Marcus, Sam Madden
    DBCrowd 2013
  83. Mobile applications need Targeted Micro-updates
    Alvin Cheung, Lenin Ravindranath, Eugene Wu, Samuel Madden, Hari Balakrishnan
    APSYS 2013
  84. Scorpion: Explaining Away Outliers in Aggregate Queries
    Eugene Wu, Samuel Madden
    VLDB 2013 (Best-of) Slides
  85. SubZero: a Fine-Grained Lineage System for Scientific Databases
    Eugene Wu, Samuel Madden, Michael Stonebraker
    ICDE 2013 (Best-of)
  86. A Demonstration of DBWipes: Clean as You Query
    Eugene Wu, Samuel Madden, Michael Stonebraker
    VLDB 2012
  87. Human-powered Sorts and Joins
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    VLDB 2012
  88. Partitioning Techniques for Fine-Grained Indexing
    Eugene Wu, Sam Madden
    ICDE 2011
  89. Demonstration of Qurk: A Query Processor for Human Operators
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    SIGMOD 2011
  90. No Bits Left Behind
    Eugene Wu, Carlo Curino, Sam Madden
    CIDR 2011
  91. Crowdsourced Databases: Query Processing with People
    Adam Marcus, Eugene Wu, Sam Madden, Robert Miller
    CIDR 2011
  92. Relational Cloud: A Database-as-a-Service for the Cloud
    Carlo Curino, Evan Jones, Raluca Popa, Nirmesh Malviya, Eugene Wu, Sam Madden, Hari Balakrishnan, Nickolai Zeldovich
    CIDR 2011
  93. Relational Cloud: The Case for a Database Service
    Carlo Curino, Evan Jones, Yang Zhang, Eugene Wu, Sam Madden
    MIT Tech Report 2010
  94. TrajStore: An Adaptive Storage System for Very Large Trajectory Data Sets
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    ICDE 2010
  95. Demonstration of the TrajStore System
    Eugene Wu, Philippe Cudre-Mauroux, Sam Madden
    VLDB 2009 demo
  96. The Case for RodentStore: An Adaptive, Declarative Storage System
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    CIDR 2009
  97. WebTables: Exploring the Power of Tables on the Web
    Michael Cafarella, Alon Halevy, Daisy Wang, Eugene Wu, Yang Zhang
    VLDB 2008
  98. Uncovering the Relational Web
    Michael Cafarella, Nodira Khoussainova, Daisy Wang, Eugene Wu, Yang Zhang, Alon Halevy
    WebDB 2008
  99. SASE: Complex Event Processing over Streams (Demo)
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CIDR 2007
  100. High-performance complex event processing over streams
    Eugene Wu, Yanlei Diao, Shariq Rizvi
    SIGMOD 2006
  101. SASE: Complex Event Processing over Streams
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CoRR 2006
  102. Probabilistic Data Management for Pervasive Computing: The Data Furnace Project
    Minos N. Garofalakis, Kurt P. Brown, Michael J. Franklin, Joseph M. Hellerstein, Daisy Zhe Wang, Eirinaios Michelakis, Liviu Tancau, Eugene Wu, Shawn R. Jeffery, Ryan Aipperspach
    IEEE Data Eng. Bulletin 2006
  103. Design Considerations for High Fan-In Systems: The HiFi Approach
    Michael J. Franklin, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Owen Cooper, Anil Edakkunni, Wei Hong
    CIDR 2005
  104. HiFi: A Unified Architecture for High Fan-in Systems
    Owen Cooper, Anil Edakkunni, Michael J. Franklin, Wei Hong, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu
    VLDB 2004 Demo