Welcome to the Columbia University Database Research Group! We are interested in high-performance database architectures, web-scale information extraction, social media, data analysis, visualization tools, and database theory.
The database group holds regular meetings Tuesdays 3-4:30PM in 417 Mudd. We typically discuss a relevant paper or host outside speakers. If you are interested in attending or speaking please get in contact with us. Below is a list of outside speakers.
Abstract:Apache Kafka is a high throughput, distributed messaging system. Since it's open sourced, various companies such as Twitter, Netflix, Uber, Cisco and Goldman Sachs have adopted Kafka in their big data eco-system. In this talk, Jun will first describe the journey of how Kafka was started and used at LinkedIn. Next, Jun will describe how Kafka delivers high throughput and supports redundancy, both of which have helped make Kafka popular. Finally, Jun will introduce what we are doing at Confluent to build a real time data platform based on Kafka.
Bio: Jun Rao is currently a co-founder of Confluent, a company that provides a stream data platform on top of Apache Kafka. Before Confluent, Jun Rao was a senior staff engineer at LinkedIn where he led the development of Kafka. Before LinkedIn, Jun Rao was a researcher at IBM's Almaden research data center, where he conducted research on database and distributed systems. Jun Rao is the PMC chair of Apache Kafka and a committer of Apache Cassandra. He is the co-author of more than 20 referenced research papers, and the co-inventor of more than a dozen U.S. software patents.