Archived Events

Archived Events

Jul 13

2020

Jul 13 2020
Astra: How we built a Cassandra-as-a-Service
Speakers:
Jim McCollom , Jeff Carpenter
System:
Cassandra
Video:
YouTube

At DataStax, we’ve been on a multi-year journey to bring a Cassandra DBaaS to the market, culminating in the GA of Astra in May 2020. In this talk, we’ll share our successes and failures through the iterative journey to GA, our current Kubernetes based architecture, how we built scalability and reliability into the platform, and how Cassandra’s architecture and implementation... Read More

Jul 6

2020

Jul 6 2020
Another Relational Database, Why and How
Speaker:
Oscar Batori & Zach Musgrave
System:
Dolt
Video:
YouTube

There are a lot of relational database, so a fair question is why we decided to create a new one. The primary reason is trade-offs. Relational database are optimized for storing a single version of the truth and providing it or updating it with maximum efficiency. More succinctly they are optimized for being good OLTP stores. They are not optimized... Read More

Jun 29

2020

Jun 29 2020
[DB Seminar] Spring 2020 DB Group: Linux 4.x Tracing (Pre-Recorded)
Speaker:
Brendan Gregg

There is no invited speaker today. We will instead watch this video together: Linux 4.x Tracing: Performance Analysis with bcc/BPF (eBPF) Brendan Gregg https://youtu.be/w8nFRoFJ6EQ Zoom Password: 264771 Read More

Jun 22

2020

Jun 22 2020
Testing Cloud-Native Databases with Chaos Mesh
Speaker:
Siddon Tang
System:
Chaos Mesh
Video:
YouTube

In the world of distributed computing, faults happen to clusters unpredictably, especially when they run in the cloud. To make a distributed database like TiDB resilient enough, chaos engineering is the way to go. At PingCAP, we use Chaos Mesh®, an open-source chaos engineering platform for Kubernetes to improve the resiliency of TiDB. Chaos Mesh adopts a cloud-native design and currently... Read More

Jun 15

2020

Jun 15 2020
Deepgreen DB: Greenplum at Speed
Speaker:
CK Tan
System:
Vitesse
Video:
YouTube

Greenplum is an open source Postgres-based MPP solution that can scale to hundreds of nodes and petabytes of data. Deepgreen DB is an optimized version of Greenplum. On top of a mature, market-tested data warehouse, Deepgreen DB adds data-centric code generation for speed, columnar external data engine, new interconnect and SQL-level integration with Go/Python. This talk will mainly recount the... Read More

Jun 8

2020

Jun 8 2020
Finding Logic Bugs in Database Management Systems
Speaker:
Manuel Rigger
System:
SQLancer
Video:
YouTube

Database Management Systems (DBMS) are used ubiquitously for storing and retrieving data. It is critical that they function correctly --- incorrectly computed result sets (e.g., by omitting a row) can cause serious loss or damage. We refer to such defects as logic bugs. Despite their importance, finding logic bugs in production DBMS is a longstanding challenge. Existing techniques such as... Read More

Jun 1

2020

Jun 1 2020
Building Materialize, a Streaming SQL Database powered by Timely Dataflow
Speaker:
Arjun Narayan
System:
Materialize
Video:
YouTube

Materialize (Materialize.io, GitHub) is a streaming database. Instead of being optimized for processing ad-hoc transactional or analytical queries, it is optimized for view maintenance on an ongoing basis over streams of already processed transactions. Although OLTP and OLAP systems often have support for views, they are not architected to efficiently maintain these views as the data change. Systems designed for... Read More

May 18

2020

May 18 2020
APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems
Speaker:
Jinho Jung
System:
APOLLO
Video:
YouTube

The practical art of constructing database management systems (DBMSs) involves a morass of trade-offs among query execution speed, query optimization speed, standards compliance, feature parity, modularity, portability, and other goals. It is no surprise that DBMSs, like all complex software systems, contain bugs that can adversely affect their performance. The performance of DBMSs is an important metric as it determines... Read More

May 11

2020

May 11 2020
Introducing ClickHouse–the fastest data warehouse you’ve never heard of
Speaker:
Robert Hodges
System:
ClickHouse
Video:
YouTube

The market for scalable SQL data warehouses is dominated by proprietary products. ClickHouse is one of the first open source projects to give those products a run for their money. ClickHouse scales to hundreds of nodes with ingest measured in millions of events per second. The user community includes CloudFlare, Cisco, and numerous financial services companies. This talk briefly recounts... Read More

May 4

2020

May 4 2020
[DB Seminar] Spring 2020 DB Group: Active Learning for ML Enhanced Database Systems
Speaker:
Lin Ma

Abstract: Recent research has shown promising results by using machine learning (ML) techniques to improve the performance of database systems, e.g., in query optimization or index recommendation. However, in many production deployments, the ML models’ performance degrades significantly when the test data diverges from the data used to train these models.   In this talk, I will present a solution to... Read More