The fundamentals of database design and application development using databases: entity-relationship modeling, logical design of relational databases, relational data definition and manipulation languages, SQL, XML, query processing, physical database tuning, transaction processing, security. Programming projects are required.
Continuation of COMS W4111, covers latest trends in both database research and industry: information retrieval, web search, data mining, data warehousing, OLAP, decision support, multimedia databases, and XML and databases. Programming projects required.
Human Data Interaction studies the interface between humans and data that help users achieve data-oriented tasks. Further, creating human data interfaces is extremely challenging because the responsiveness of the interface directly depends on the system architecture as well as the interface design. What system innovations are needed to simplify how effective human data interfaces can be created and used? Human Data Interaction is a nascent field, and we will study modern research in data visualization, HCI, data analysis, and data management systems. We will first survey research in visualization interfaces and data management, then deep dive into the interplay between the two fields, and finally explore how the ideas apply in different application domains.
An introduction to large-scale distributed systems with an emphasis on big-data processing and storage infrastructures. Topics include fundamental tradeoffs in distributed systems, techniques for exploiting parallelism, big-data computation and storage models, design and implementation of various well-known distributed systems infrastructures, and concrete exposure to programming big-data applications on top popular, open-source infrastructures for data processing and storage systems.
The principles and practice of building large-scale database management systems. Storage methods and indexing, query processing and optimization, materialized views, transaction processing and recovery, object-relational databases, parallel and distributed databases, performance considerations. Programming projects are required.
This course will survey modern research in data management – from large scale data processing, modern database engines, to data cleaning and visualization, to secure data management. To ground the discussion, we will host invited speakers that have (or are) transitioned their research work from academia to industry. Depending on timing and interest, select students may be invited to join the speakers for more in-depth discussions over dinner after class.
Human beings rely on summarizing and visualizing data to make informed decisions. The number and volume of data continues to increase at exponential rates, and new user-facing systems and modalities are needed to handle the scale and heterogeneity of future data. This course surveys the landscape interactive data exploration systems along several axes.