Archived Events

Archived Events

Aug 10 2020
04:30pm EDT
Quarantine DB Talk 2020: Splice Machine – An HTAP DB at Scale
Daniel Gómez Ferro , Yi Xia

Emerging modern applications routinely depend on data at scale for AI and ML and therefore require HTAP (Hybrid Transactional/Analytical Processing) database systems to make real-time business decisions. In this talk, we would like to introduce Splice Machine, an HTAP DB at scale, our mission and architecture. We will also dive into the topics of our transaction mechanism, query optimization and dual execution engine support. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More

Aug 3 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: YugabyteDB: Bringing Together the Best of Amazon Aurora and Google Spanner

PostgreSQL, a single-node open-source RDBMS, is widely adopted for its powerful set of features. However, PostgreSQL is not built to be used as a cloud-native database, and therefore cannot inherently survive failures, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk along with simplifying async replication to make the database resilient to failures, it does not address horizontal scalability or geo-distribution. Google Spanner addresses all of these features, however it does... Read More

Jul 27 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Black-box Isolation Checking with Elle

Databases are awful. They lose information, corrupt state, and do other terrible things, both by design and by accident. You'd think that *testing* databases to see how awful they are would help make them better, but it turns out that testing most of the useful database safety properties is *also* awful. We came up with a better way to test databases, called Elle. Elle finds, graphs, and explains a wealth of isolation violations by mapping observed histories to Adya-style dependency... Read More

Jul 24 2020
10:00am EDT
MS Thesis Defense: Filter Representation in Vectorized Query Execution (Amadou Ngom)

Advances in memory capacity have allowed Database Management Systems (DBMSs) to store large amounts of data in memory, thereby shifting the performance bottleneck of query execution from disk accesses to CPU efficiency (i.e., instruction count and cycles per instruction). One technique used to achieve such efficiency in analytical applications is batch-oriented processing or vectorization: it reduces interpretation overhead, improves cache locality, and allows for efficient loop optimizations (e.g., loop unrolling, SIMD vectorization). For each vector (i.e., a batch of tuples),... Read More

Jul 20 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Rockset: Realtime Indexing for fast queries on massive semi-structured data

Rockset is a realtime indexing database that powers fast SQL over semi-structured data such as JSON, Parquet, or XML without requiring any schematization. All data loaded into Rockset are automatically indexed and a fully featured SQL engine powers fast queries over semi-structured data without requiring any database tuning. Rockset exploits the hardware fluidity available in the cloud and automatically grows and shrinks the cluster footprint based on demand. Available as a serverless cloud service, Rockset is used by developers to... Read More

Jul 13 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Astra: How we built a Cassandra-as-a-Service
Jim McCollom & Jeff Carpenter

At DataStax, we’ve been on a multi-year journey to bring a Cassandra DBaaS to the market, culminating in the GA of Astra in May 2020. In this talk, we’ll share our successes and failures through the iterative journey to GA, our current Kubernetes based architecture, how we built scalability and reliability into the platform, and how Cassandra’s architecture and implementation affected our design choices for current features like multi-tenancy and influences our future initiatives.  Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More

Jul 6 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Another Relational Database, Why and How
Oscar Batori & Zach Musgrave

There are a lot of relational database, so a fair question is why we decided to create a new one. The primary reason is trade-offs. Relational database are optimized for storing a single version of the truth and providing it or updating it with maximum efficiency. More succinctly they are optimized for being good OLTP stores. They are not optimized to meet the increasingly common need to move structured data from one party (person or entity) to another. The existing... Read More

Jun 29 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Linux 4.x Tracing (Pre-Recorded)

There is no invited speaker today. We will instead watch this video together: Linux 4.x Tracing: Performance Analysis with bcc/BPF (eBPF) Brendan Gregg https://youtu.be/w8nFRoFJ6EQ Zoom Password: 264771 Read More

Jun 22 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Testing Cloud-Native Databases with Chaos Mesh

In the world of distributed computing, faults happen to clusters unpredictably, especially when they run in the cloud. To make a distributed database like TiDB resilient enough, chaos engineering is the way to go. At PingCAP, we use Chaos Mesh®, an open-source chaos engineering platform for Kubernetes to improve the resiliency of TiDB. Chaos Mesh adopts a cloud-native design and currently supports more than 10 chaos types. This talk will mainly introduce Chaos Mesh and how we use it to test... Read More

Jun 15 2020
04:30pm EDT
[DB Seminar] Spring 2020 DB Group: Deepgreen DB: Greenplum at Speed
CK Tan

Greenplum is an open source Postgres-based MPP solution that can scale to hundreds of nodes and petabytes of data. Deepgreen DB is an optimized version of Greenplum. On top of a mature, market-tested data warehouse, Deepgreen DB adds data-centric code generation for speed, columnar external data engine, new interconnect and SQL-level integration with Go/Python. This talk will mainly recount the challenges of LLVM codegen on PG/GP while maintaining 100% compatibility, a necessity for market acceptance. Zoom Link: https://cmu.zoom.us/j/562649242 Read More