WuLab
Columbia University

goals

The future of industry relies on the ability to make data-driven decisions, however it is only accessible to technical and statistical experts that can program, clean and combine data, visualize large datasets, and debug complex analysis pipelines.

The goal of the WuLab is to dramatically accelerate the democratization of data, and to train high-quality, world-class researchers.

projects

Our overarching mission is to work on 🔥💣 projects, with a leaning towards addressing three bottlenecks in the future of data analysis: data cleaning, creating interactive data exploration and visualization interfaces, and understanding analysis results. These slides describe our lab’s vision and a few recent projects.

Several of our systems are named after Mortal Kombat ninjas

Data Cleaning Data analysis and machine learning are increasingly reliant on the quality of the input data—spurious errors and systematic corruptions can result in misleading and incorrect results. We work on automated data cleaning algorithms that are tailored towards data science applications, as well as crowdsourcing systems for collecting high-quality new data.

Explanation & Interpretation Data analysis is never one-shot – it is an iterative process where analysis results spur new analyses or ways to debug the analysis. We work on data explanation systems that enable analysts to highlight abnomalies in analysis results and explain potential reasons to investigate, as well as machine learning explanation techniques that explain how and what machine learning models (e.g., deep neural networks) learn to make their predictions.

Interactive Data Analysis System The current interface for data analysis is predominantly code. We are studying techniques to improve how to design, architect, and build scalable interactive visual analysis applications. The Data Visualization Management System makes it significantly easier to build and scale interactive data visualization systems. Precision Interfaces extends this technology to automatically generate new visual exploration interfaces tailored to a long tail of data analysis tasks.

join

We are always looking for hard-working, smart, driven students that are excited pushing forward how humans interact with data. If you are a prospective graduate student or postdoc, read our application document. If you are an undergraduate, masters, or potential intern, please fill out our questionnaire.

contact

Email us at ewu@cs.columbia.edu

people
Fotis Psallidas Grad Student
Thibault Sellam Postdoc
Xiaolan Wang Collab (UMass)
Yifan Wu Collab (Cal)
Sanjay Krishnan Collab (Cal)
Young Wu Collab (SFU)
Conder Shou
Undergrad
Robbie Netzorg
Undergrad
Kevin Lin
Undergrad
Ian Huang
Undergrad
Hamed Nilforoshan Undergrad
HaoCi Zhang Masters
Alumni and Past Collaborators
Tejas Dharamsi Masters
Lily-Xiaoxuan Liu
Intern
Daniel Haas Collab (Cal)
Lilong Jiang Collab (OSU)
Daniel Alabi Masters
Zhengjie Miao Masters
Larry Xu Undergrad
James Sands
Undergrad
Naina Sahrawat
Undergrad
Rahul Khanna Undergrad
Mengyang Lyu
Intern
Ziyun Wei
Intern
Alex Studer
High School
Gabriel Ryan
Masters
Salim M'jahad
Undergrad
publications
  1. At a Glance: Approximate Entropy as a Measure of Line Chart Visualization Complexity
    Gabriel Ryan, Abigail Mosca, Remco Chang, Eugene Wu
    InfoVIS 2018
  2. Provenance in Interactive Visualizations
    Fotis Psallidas, Eugene Wu
    HILDA 2018
  3. Leveraging Quality Prediction Models for Automatic Writing Feedback
    Hamed Nilforoshan, Eugene Wu
    ICWSM 2018
  4. Precision Interfaces for Different Modalities
    HaoCi Zhang, Viraj Rai, Thibault Sellam, Eugene Wu
    SIGMOD (demo) 2018
  5. Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications
    Fotis Psallidas, Eugene Wu
    SIGMOD (demo) 2018
  6. Deeper: A Data Enrichment System Powered by Deep Web.
    Pei Wang, Yongjun He, Ryan Shea, Jiannan Wang, Eugene Wu.
    SIGMOD (demo) 2018
  7. “I Like the Way You Think!” Inspecting the Internal Logic of Recurrent Neural Networks
    Thibault Sellam, Kevin Lin, Ian Yiran Huang, Carl Vondrick, Eugene Wu
    SysML 2018
  8. A “Probabilistic” Model of Research
    Eugene Wu
    Blog Post 2018
  9. Smoke: Fine-grained Lineage at Interactive Speeds
    Fotis Psallidas, Eugene Wu
    VLDB 2018 Preprint
  10. Mining Precision Interfaces From Query Logs
    Haoci Zhang, Thibault Sellam, Eugene Wu
    Tech Report 2017
  11. BoostClean: Automated Error Detection and Repair for Machine Learning
    Sanjay Krishnan, Michael J. Franklin, Ken Goldberg, Eugene Wu
    Tech Report 2017
  12. Load-n-Go: Fast Approximate Join Visualizations That Improve Over Time
    Marianne Procopio, Carlos Scheidegger, Eugene Wu, Remco Chang
    DSIA 2017
  13. Approximate Entropy as a Measure of Line Chart Complexity
    Gabriel Ryan, Abigail Mosca, Eugene Wu, Remco Chang
    InfoVIS Poster 2017
  14. Towards a Bayesian Model of Data Visualization Cognition
    Yifan Wu, Larry Xu, Remco Chang, Eugene Wu
    DECISIVE 2017
  15. PreCog: Improving Crowdsourced Data Quality Before Acquisition
    Hamed Nilforoshan, Jiannan Wang, Eugene Wu
    Arxiv 2017
  16. Precision Interfaces
    Haoci Zhang, Thibault Sellam, Eugene Wu
    HILDA 2017
  17. PALM: Machine Learning Explanations For Iterative Debugging
    Sanjay Krishnan, Eugene Wu
    HILDA 2017
  18. Segment-Predict-Explain for Automatic Writing Feedback
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    Collective Intelligence 2017
  19. Dialectic: Enhancing Text Input Fields with Automatic Feedback to Improve Social Content Writing Quality
    Hamed Nilforoshan, James Sands, Kevin Lin, Rahul Khanna, Eugene Wu
    ArXiv 2017
  20. Skipping-oriented Partitioning for Columnar Layouts
    Liwen Sun, Michael J. Franklin, Jiannan Wang, Eugene Wu
    VLDB 2017
  21. Combining Design and Performance in a Data Visualization Management System
    Eugene Wu, Fotis Psallidas, Zhengjie Miao, Haoci Zhang,Laura Rettig, Yifan Wu, Thibault Sellam
    CIDR 2017
  22. CIDR: Chat-oriented Innovations in Database Research
    Eugene Wu
    CIDR 2017 Abstract
  23. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2017
  24. A DeVIL-ish Approach to Inconsistency in Interactive Visualizations
    Yifan Wu, Joe Hellerstein, Eugene Wu
    Hilda 2016
  25. PFunk-H: Approximate Query Processing using Perceptual Models
    Daniel Alabi, Eugene Wu
    Hilda 2016
  26. Towards Reliable Interactive Data Cleaning: A User Survey and Recommendations
    Sanjay Krishnan, Daniel Haas, Michael J. Franklin, Eugene Wu
    Hilda 2016
  27. TrendQuery: A System for Interactive Exploration of Trends
    Niranjan Kamat, Eugene Wu, Arnab Nandi
    Hilda 2016
  28. ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning
    Sanjay Krishnan, Michael Franklin, Ken Goldberg, Jiannan Wang, Eugene Wu
    SIGMOD 2016 Demo (Demo Award Winner!)
  29. Graphical Perception in Animated Bar Charts
    Eugene Wu, Lilong Jiang, Larry Xu, Arnab Nandi
    Arxiv 2016
  30. QFix: Demonstrating error diagnosis in query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    SIGMOD 2016 Demo
  31. QFix: Diagnosing errors through query histories
    Xiaolan Wang, Alexandra Meliou, Eugene Wu
    Arxiv 2016
  32. ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models
    Sanjay Krishnan, Jiannan Wang, Eugene Wu, Michael J. Franklin, Ken Goldberg
    Arxiv 2016
  33. Towards Perception-aware Interactive Data Visualization Systems
    Eugene Wu, Arnab Nandi
    DSIA 2015 Slides
  34. SampleClean: Fast and Reliable Analytics on Dirty Data (overview paper)
    Sanjay Krishnan, Jiannan Wang, Michael J Franklin, Ken Goldberg, Tim Kraska, Tova Milo, Eugene Wu
  35. CLAMShell: Speeding up Crowds for Low-latency Data Labeling
    Daniel Haas, Jiannan Wang, Eugene Wu, Michael J. Franklin
    VLDB 2016
  36. Automated Metadata Construction to Support Portable Building Applications
    Arka A. Bhattacharya, Dezhi Hong, David Culler, Jorge Ortiz, Kamin Whitehouse, Eugene Wu
    BuildSys 2015
  37. Wisteria: Nurturing Scalable Data Cleaning Infrastructure (Demo)
    Daniel Haas, Sanjay Krishnan, Jiannan Wang, Michael J. Franklin, Eugene Wu
    VLDB 2015
  38. Collaborative Data Analytics with Datahub (Demo)
    Anant Bhardwaj, Amol Deshpande, Aaron Elmore, David Karger, Sam Madden, Aditya Parameswaran, Harihar Subramanyam, Eugene Wu, and Rebecca Zhang
    VLDB 2015
  39. Indexing Cost Sensitive Prediction
    Leilani Battle, Edward Benson, Aditya Parameswaran, Eugene Wu
    Technical Report
  40. Explaining Data in Visual Analytic Systems
    Eugene Wu
    Doctoral Thesis
  41. The Case for Data Visualization Management Systems
    Eugene Wu, Leilani Battle, Samuel Madden
    VLDB 2014
  42. Data In Context: Aiding News Consumers while Taming Dataspaces
    Eugene Wu, Adam Marcus and Sam Madden
    DBCrowd 2013
  43. Mobile applications need Targeted Micro-updates
    Alvin Cheung, Lenin Ravindranath, Eugene Wu, Samuel Madden, Hari Balakrishnan
    APSYS 2013
  44. Scorpion: Explaining Away Outliers in Aggregate Queries
    Eugene Wu, Samuel Madden
    VLDB 2013 (Selected as one of the best papers of the conference!) Slides
  45. SubZero: a Fine-Grained Lineage System for Scientific Databases
    Eugene Wu, Samuel Madden, Michael Stonebraker
    ICDE 2013 (Best of conference)
  46. A Demonstration of DBWipes: Clean as You Query
    Eugene Wu, Samuel Madden, Michael Stonebraker
    VLDB 2012
  47. Human-powered Sorts and Joins
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    VLDB 2012
  48. Partitioning Techniques for Fine-Grained Indexing
    Eugene Wu, Sam Madden
    ICDE 2011
  49. Demonstration of Qurk: A Query Processor for Human Operators
    Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert Miller
    SIGMOD 2011
  50. No Bits Left Behind
    Eugene Wu, Carlo Curino, Sam Madden
    CIDR 2011
  51. Crowdsourced Databases: Query Processing with People
    Adam Marcus, Eugene Wu, Sam Madden, Robert Miller
    CIDR 2011
  52. Relational Cloud: A Database-as-a-Service for the Cloud
    Carlo Curino, Evan Jones, Raluca Popa, Nirmesh Malviya, Eugene Wu, Sam Madden, Hari Balakrishnan, Nickolai Zeldovich
    CIDR 2011
  53. Relational Cloud: The Case for a Database Service
    Carlo Curino, Evan Jones, Yang Zhang, Eugene Wu, Sam Madden
  54. TrajStore: An Adaptive Storage System for Very Large Trajectory Data Sets
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    ICDE 2010
  55. Demonstration of the TrajStore System
    Eugene Wu, Philippe Cudre-Mauroux, Sam Madden
    VLDB 2009
  56. The Case for RodentStore: An Adaptive, Declarative Storage System
    Philippe Cudre-Mauroux, Eugene Wu, Sam Madden
    CIDR 2009
  57. WebTables: Exploring the Power of Tables on the Web
    Michael Cafarella, Alon Halevy, Daisy Wang, Eugene Wu, Yang Zhang
    VLDB 2008
  58. Uncovering the Relational Web
    Michael Cafarella, Nodira Khoussainova, Daisy Wang, Eugene Wu, Yang Zhang, Alon Halevy
    WebDB 2008
  59. SASE: Complex Event Processing over Streams (Demo)
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CIDR 2007
  60. High-performance complex event processing over streams
    Eugene Wu, Yanlei Diao, Shariq Rizvi
    SIGMOD 2006
  61. SASE: Complex Event Processing over Streams
    Daniel Gyllstrom, Eugene Wu, Hee-Jin Chae, Yanlei Diao, Patrick Stahlberg, Gordon Anderson
    CoRR 2006
  62. Probabilistic Data Management for Pervasive Computing: The Data Furnace Project
    Minos N. Garofalakis, Kurt P. Brown, Michael J. Franklin, Joseph M. Hellerstein, Daisy Zhe Wang, Eirinaios Michelakis, Liviu Tancau, Eugene Wu, Shawn R. Jeffery, Ryan Aipperspach
    IEEE Data Eng. Bull.
  63. Design Considerations for High Fan-In Systems: The HiFi Approach
    Michael J. Franklin, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Owen Cooper, Anil Edakkunni, Wei Hong
    CIDR 2005
  64. HiFi: A Unified Architecture for High Fan-in Systems
    Owen Cooper, Anil Edakkunni, Michael J. Franklin, Wei Hong, Shawn R. Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu
    VLDB 2004 Demo