News & Events
[DB Seminar] Spring 2020 DB Group: YugabyteDB: Bringing Together the Best of Amazon Aurora and Google Spanner
PostgreSQL, a single-node open-source RDBMS, is widely adopted for its powerful set of features. However, PostgreSQL is not built to be used as a cloud-native database, and therefore cannot inherently survive failures, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk along with simplifying async replication to make the database resilient to failures, it does not address horizontal scalability or geo-distribution. Google Spanner addresses all of these features, however it does Read More
MS Thesis Defense: Filter Representation in Vectorized Query Execution (Amadou Ngom)
Advances in memory capacity have allowed Database Management Systems (DBMSs) to store large amounts of data in memory, thereby shifting the performance bottleneck of query execution from disk accesses to CPU efficiency (i.e., instruction count and cycles per instruction). One technique used to achieve such efficiency in analytical applications is batch-oriented processing or vectorization: it reduces interpretation overhead, improves cache locality, and allows for efficient loop optimizations (e.g., loop unrolling, SIMD vectorization). For each vector (i.e., a batch of tuples), Read More
[DB Seminar] Spring 2020 DB Group: Rockset: Realtime Indexing for fast queries on massive semi-structured data
Rockset is a realtime indexing database that powers fast SQL over semi-structured data such as JSON, Parquet, or XML without requiring any schematization. All data loaded into Rockset are automatically indexed and a fully featured SQL engine powers fast queries over semi-structured data without requiring any database tuning. Rockset exploits the hardware fluidity available in the cloud and automatically grows and shrinks the cluster footprint based on demand. Available as a serverless cloud service, Rockset is used by developers to Read More
[DB Seminar] Spring 2020 DB Group: Linux 4.x Tracing (Pre-Recorded)
There is no invited speaker today. We will instead watch this video together: Linux 4.x Tracing: Performance Analysis with bcc/BPF (eBPF) Brendan Gregg https://youtu.be/w8nFRoFJ6EQ Zoom Password: 264771 Read More
[DB Seminar] Spring 2020 DB Group: Black-box Isolation Checking with Elle
Databases are awful. They lose information, corrupt state, and do other terrible things, both by design and by accident. You'd think that *testing* databases to see how awful they are would help make them better, but it turns out that testing most of the useful database safety properties is *also* awful. We came up with a better way to test databases, called Elle. Elle finds, graphs, and explains a wealth of isolation violations by mapping observed histories to Adya-style dependency Read More
[DB Seminar] Spring 2020 DB Group: Testing Cloud-Native Databases with Chaos Mesh
In the world of distributed computing, faults happen to clusters unpredictably, especially when they run in the cloud. To make a distributed database like TiDB resilient enough, chaos engineering is the way to go. At PingCAP, we use Chaos Mesh®, an open-source chaos engineering platform for Kubernetes to improve the resiliency of TiDB. Chaos Mesh adopts a cloud-native design and currently supports more than 10 chaos types. This talk will mainly introduce Chaos Mesh and how we use it to test Read More
[DB Seminar] Spring 2020 DB Group: Astra: How we built a Cassandra-as-a-Service
At DataStax, we’ve been on a multi-year journey to bring a Cassandra DBaaS to the market, culminating in the GA of Astra in May 2020. In this talk, we’ll share our successes and failures through the iterative journey to GA, our current Kubernetes based architecture, how we built scalability and reliability into the platform, and how Cassandra’s architecture and implementation affected our design choices for current features like multi-tenancy and influences our future initiatives. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More
[DB Seminar] Spring 2020 DB Group: Another Relational Database, Why and How
There are a lot of relational database, so a fair question is why we decided to create a new one. The primary reason is trade-offs. Relational database are optimized for storing a single version of the truth and providing it or updating it with maximum efficiency. More succinctly they are optimized for being good OLTP stores. They are not optimized to meet the increasingly common need to move structured data from one party (person or entity) to another. The existing Read More
[DB Seminar] Spring 2020 DB Group: Deepgreen DB: Greenplum at Speed
Greenplum is an open source Postgres-based MPP solution that can scale to hundreds of nodes and petabytes of data. Deepgreen DB is an optimized version of Greenplum. On top of a mature, market-tested data warehouse, Deepgreen DB adds data-centric code generation for speed, columnar external data engine, new interconnect and SQL-level integration with Go/Python. This talk will mainly recount the challenges of LLVM codegen on PG/GP while maintaining 100% compatibility, a necessity for market acceptance. Zoom Link: https://cmu.zoom.us/j/562649242 Read More
[DB Seminar] Spring 2020 DB Group: Building Materialize, a Streaming SQL Database powered by Timely Dataflow
Materialize (Materialize.io, GitHub) is a streaming database. Instead of being optimized for processing ad-hoc transactional or analytical queries, it is optimized for view maintenance on an ongoing basis over streams of already processed transactions. Although OLTP and OLAP systems often have support for views, they are not architected to efficiently maintain these views as the data change. Systems designed for view maintenance can often handle substantially higher load for workloads that re-issue the same questions against changing data: they perform Read More