The fundamentals of database design and application development using databases: entity-relationship modeling, logical design of relational databases, relational data definition and manipulation languages, SQL, XML, query processing, physical database tuning, transaction processing, security. Programming projects are required.
The principles and practice of building large-scale database management systems. Storage methods and indexing, query processing and optimization, materialized views, transaction processing and recovery, object-relational databases, parallel and distributed databases, performance considerations. Programming projects are required.
An introduction to large-scale distributed systems with an emphasis on big-data processing and storage infrastructures. Topics include fundamental tradeoffs in distributed systems, techniques for exploiting parallelism, big-data computation and storage models, design and implementation of various well-known distributed systems infrastructures, and concrete exposure to programming big-data applications on top popular, open-source infrastructures for data processing and storage systems.
Continuation of COMS W4111, covers latest trends in both database research and industry: information retrieval, web search, data mining, data warehousing, OLAP, decision support, multimedia databases, and XML and databases. Programming projects required.
This course will survey modern research in data management – from large scale data processing, modern database engines, to data cleaning and visualization, to secure data management. To ground the discussion, we will host invited speakers that have (or are) transitioned their research work from academia to industry. Depending on timing and interest, select students may be invited to join the speakers for more in-depth discussions over dinner after class.
Human beings rely on summarizing and visualizing data to make informed decisions. The number and volume of data continues to increase at exponential rates, and new user-facing systems and modalities are needed to handle the scale and heterogeneity of future data. This course surveys the landscape interactive data exploration systems along several axes.