Events

[DB Seminar] Spring 2020 DB Group: Rockset: Realtime Indexing for fast queries on massive semi-structured data

Event Date: Monday July 20, 2020
Event Time: 04:30pm EDT
Location: https://cmu.zoom.us/j/562649242?pwd=djhicnFKWHdJM1o0MlFvYzg3SzB5Zz09
Speaker: Dhruba Borthakur [INFO]

Title: Rockset: Realtime Indexing For Fast Queries On Massive Semi-structured Data

Rockset is a realtime indexing database that powers fast SQL over semi-structured data such as JSON, Parquet, or XML without requiring any schematization. All data loaded into Rockset are automatically indexed and a fully featured SQL engine powers fast queries over semi-structured data without requiring any database tuning. Rockset exploits the hardware fluidity available in the cloud and automatically grows and shrinks the cluster footprint based on demand. Available as a serverless cloud service, Rockset is used by developers to build data-driven applications and microservices.

In this talk, we discuss some of the key aspects of Rockset such as:

  1. Smart Schema: a type system that allows for ingesting any semi-structured data set and presenting them as SQL tables,
  2. Converged indexing: a data indexing strategy that builds inverted indexes and columnar indexes on all fields in the data set, and
  3. The Aggregator Leaf Tailer architecture:  scale storage,  indexing compute and query compute separately and provide elastic storage management using RocksDB-Cloud

Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771)

Bio:
Dhruba Borthakur is CTO and co-founder of Rockset. Previously, he was an engineer on the database team at Facebook, where he was the founding engineer of the RocksDB data store. Earlier at Yahoo, he was one of the founding engineers of the Hadoop Distributed File System. He was also a contributor to the open source Apache HBase project. Dhruba previously held various roles at Veritas Software, founded an e-commerce startup, Oreceipt.com, and contributed to Andrew File System (AFS) at IBM-Transarc Labs.