News & Events
Quarantine DB Talk 2020: FoundationDB or: How I Learned to Stop Worrying and Trust the Database
Getting multiple entities to work nicely together is a difficult task. This is true for machines as much as it is true for humans. This is why testing and debugging distributed systems is such a hard task. Even if well known algorithms are used, subtle bugs can introduce catastrophic failures. FoundationDB uses deterministic simulation to test these failures. This is the secret sauce that makes FoundationDB one of the most robust databases on the market. FoundationDB is a distributed key Read More
Quarantine DB Talk 2020: EraDB: Designing Systems for Cardinality and Dimensionality
EraDB is a distributed database designed for petabyte-scale, schemaless data that leverages cloud-native object storage for global persistence. In this talk, Todd will discuss the historical origins of EraDB and delve into how it is designed to handle high-cardinality and high-dimensionality data within a flexible, horizontally-scalable architecture. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More
Quarantine DB Talk 2020: Datometry Hyper-Q: Virtualizing the World’s Enterprise Data Warehouses
Enterprises worldwide are looking to move their database applications to the cloud. However, conventional migration from an on-premise data warehouse to a cloud-native one is a costly, labor-intensive task, laden with many risks. According to Gartner, the majority of these migrations are late, run over budget, or fail altogether. Datometry has developed a virtualization platform that enables applications written for an on-premises data warehouse to run on a cloud data warehouse — without major rewrites, without rearchitecting. Instead, Datometry Hyper-Q Read More
Quarantine DB Talk 2020: Apache Arrow Flight: Accelerating Columnar Dataset Transport
In this talk I will discuss the role that Apache Arrow and Arrow Flight are playing to provide a faster and more efficient approach to building data services that transport large datasets. We'll look at the technical details of why the Arrow protocol is an attractive choice and look at specific examples of where Arrow has been employed for better performance and resource efficiency. Finally, I will discuss the implications for databases and the upcoming generation of data systems. This Read More
Quarantine DB Talk 2020: Fauna: Lessons Learned Building a Real World, Calvin-based System
Fauna is a NoSQL Database-as-an-API service which supports consistent, global database access for OLTP workloads. While there are many aspects of Fauna which make it unique among similar systems, one in particular is its use of Calvin, a deterministic transaction resolution protocol which underpins its strict-serializability guarantees. This talk will give an overview of Fauna's architecture, why we chose Calvin and the benefits therefore attained, and some lessons learned evolving our system in a real world, production environment where the Read More
Quarantine DB Talk 2020: CockroachDB’s Query Optimizer
We live in an increasingly interconnected world, with many organizations operating across countries or even continents. To serve their global user base, organizations are replacing their legacy DBMSs with cloud-based systems capable of scaling OLTP workloads to millions of users. CockroachDB is a scalable SQL DBMS that was built from the ground up to support these global OLTP workloads while maintaining high availability and strong consistency. Just like its namesake, CockroachDB is resilient to disasters through replication and automatic recovery Read More
Quarantine DB Talk 2020: Query Optimization at Snowflake
In this talk, I will give an introduction to Snowflake's query optimizer. I will talk about the main features of Snowflake's optimizer, explain the main philosophy behind the design decisions, and delve into some unique aspects of the implementation. I will also later expand into our infrastructures to facilitate optimizer development and discuss the opportunities and challenges for implementing and rolling out query optimizations for cloud-based databases. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom Read More
Quarantine DB Talk 2020: ScyllaDB — No-Compromise Performance
ScyllaDB is a distributed NoSQL database that provides high availability, multiple consistency models, and high performance. This talk will cover how ScyllaDB approaches performance: thread-per-core, fully asynchronous operation, and kernel bypass. This talk is part of the Quarantine Database Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More
Quarantine DB Talk 2020: Splice Machine – An HTAP DB at Scale
Emerging modern applications routinely depend on data at scale for AI and ML and therefore require HTAP (Hybrid Transactional/Analytical Processing) database systems to make real-time business decisions. In this talk, we would like to introduce Splice Machine, an HTAP DB at scale, our mission and architecture. We will also dive into the topics of our transaction mechanism, query optimization and dual execution engine support. Zoom Link: https://cmu.zoom.us/j/562649242 (Password 264771) Read More
Quarantine DB Talk 2020: TerminusDB: Building a Native Revision Control DB from Scratch
Revision control and CI/CD has completely transformed the way we write and deliver software. Yet data management has not kept pace with these changes. Data assets are still managed using RDBMSs, or worse, with CSVs or Excel spreadsheets. Since many software applications rely on data assets, this presents a serious problem for data-driven software. TerminusDB is a graph database which provides distributed revision control features natively, enabling us to lift best practices for CI/CD to data-driven applications. It supports a Read More