[Vaccination 2022] Practical Considerations for ACID/MVCC Storage Engines (Oren Eini)
In this talk, Oren Eini, founder of RavenDB, will discuss the design decisons and the manner in which RavenDB deals with storing data on disk. Achieving highly concurrent and transactional system can be a challenging task. RavenDB solves this issue using a storage engine called Voron. We'll go over the design of Voron and how it is able to achieve... Read More
[Vaccination 2021] How We Build Firebolt (Benjamin Wagner)
Data-driven companies are increasingly building customer-facing analytics products. These workloads demand lower latency, higher concurrency, and more predictable query performance than ever before - demands that traditional data warehouses struggle with. In this talk, Benjamin explains how Firebolt is designed to welcome this new generation of data challenges. This talk is part of the Vaccination Database (Second Dose) Tech Talk... Read More
[Vaccination 2021] Apache Arrow: High-Performance Columnar Data Framework (Wes McKinney)
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing. With the aim to make the data ecosystem modular and connected, Wes will talk about Apache Arrow’s vision for a future more unified data analytics ecosystem. In this talk, Wes will discuss the underlying interfaces and protocols powering the project, trends in the Apache Arrow ecosystem, and... Read More
An Overview of Google BigQuery (Justin Levandoski)
Google BigQuery is a serverless, scalable, and cost effective cloud data warehouse. Having evolved from internal Google infrastructure (Dremel), BigQuery is unique in a number of dimensions. In this talk, we provide a look at some of the key architectural aspects of BigQuery and how it provides a true serverless and multi-tenant warehousing solution to customers. We then provide an... Read More
[Vaccination 2021] Convex: Life Without a Backend Team (James Cowling)
Many of us have devoted decades of our lives to making databases faster, cheaper, more scalable, more reliable... much of which is completely irrelevant to the average developer. The main obstacle for the next generation of app developers is convenience. The "serverless revolution" and architectures like Jamstack demonstrate a desire for life without a backend team, yet current technologies in... Read More
[Vaccination 2021] Query Optimization and Acceleration at Dremio (Steven Phillips)
This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
[Vaccination 2021] Fluree – Cloud-Native Ledger Graph Database (Brian Platz)
Fluree is an immutable RDF graph database that, beyond performing typical modern database functions, emphasizes security, trust, provenance, privacy, and interoperability. Fluree is open source under the AGPL license and is built on Clojure and open W3C standards. Fluree natively supports JSON and JSON-LD and can leverage/enforce any RDF ontology. The Fluree system consists of a cryptographically-secure ledger to handle... Read More
[Vaccination 2021] Vertica – High Performance Over Varying Terrain (Stephen Walkauskas)
Vertica is an OLAP database, originally designed when a "beefy" server had 16GB of memory and 8 CPUs, and a 5TB data warehouse was considered to be HUGE. A lot has changed since then. Though CPUs have gotten only a bit faster there are more cores on a die. Applications are commonly run in virtualized environments these days. Data volumes... Read More
[Vaccination 2021] The Pinecone Vector Database System (Edo Liberty)
Pinecone is a fully managed vector database that makes it easy to add vector search to production applications. Learn how we combine state-of-the-art vector indexing with infrastructure orchestration to provide high-performance vector search at any scale. This talk is part of the Vaccination Database (Second Dose) Tech Talk Seminar Series. Zoom Link: https://cmu.zoom.us/j/95002789605 (Passcode 982149) Read More
Rethinking Systems for Data-Intensive Computing (Matei Zaharia)
A growing fraction of applications today, from basic business processing to machine learning, are data-intensive: they need to correctly process and produce massive datasets that are too large for any human to inspect. These applications pose many systems challenges, from programming interfaces, to monitoring and debugging (how can a human make sure these applications are working well?), to performance. I’ll... Read More