Archived Events

Archived Events

Mar 24 2025
04:30pm EDT
[SQL Death] PRQL: Pipelined Relational Query Language

The past 50 years have seen a great evolution in programming languages — except in the world of databases. There, SQL still reigns supreme; but is it a shark, enduring due to its perfection, or a dinosaur, one impact away from extinction? We argue that SQL is an amalgamation of both: relational algebra — the shark; and the language — the bloated dinosaur. PRQL incorporates the beauty of relational algebra while also embracing the best from libraries like dplyr, Pandas,... Read More

Mar 18 2025
01:00pm EDT
NSH 4305
Building Novel Abstractions for a Declarative Cloud (Tianyu Li)

As the cloud evolves in capability, it has also become increasingly complex and difficult to program. New abstractions are necessary to ensure next-generation cloud applications are correct, simple, and efficient. In this talk, I will first describe Resilient Composition, a new abstraction that ensures fault-tolerance in applications composed from independent, distributed components. The key insight is to rely on atomic, fault-tolerant “steps” that span component operations and messages. I will present DARQ, an efficient execution engine for such steps, and... Read More

Mar 17 2025
04:30pm EDT
[SQL Death] Malloy: A Modern Open Source Language for Analyzing, Transforming, and Modeling Data

In software we express our ideas through tools. In data, those tools think in rectangles. From spreadsheets to the data warehouses, to do any analytical calculation, you must first go through a rectangle.. Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes). But really, most Data isn’t rectangular. Most data exists in hierarchies (orders, items, products, users). Most query results are better returned as a hierarchy (category, brand, product). Can we... Read More

Mar 13 2025
12:00pm EDT
NSH 3305
Redesigning Blockchains: SSD-optimized Verifiable Databases and Beyond (Daniel Lin-Kit Wong)

Blockchains are decentralized ledgers that replace trusted central authorities with verifiable distributed consensus. This decentralization has resulted in blockchains effectively becoming ‘slow and expensive computers’, but there are huge opportunities for architectural optimization across the entire blockchain software stack. We begin this talk by outlining the scaling challenges from a systems researcher’s perspective, and discussing the bottlenecks faced in computation, storage, and network bandwidth. We then discuss how we optimized the blockchain storage layer using our novel verifiable database, the... Read More

Mar 10 2025
04:30pm EDT
[SQL Death] Pipe Syntax in SQL: SQL for the 21st Century

SQL has been extremely successful as the de facto standard language for working with data. Virtually all mainstream database-like systems use SQL as their primary query language. But SQL is an old language with significant design problems, making it difficult to learn, difficult to use, and difficult to extend. Many have observed these challenges with SQL, and proposed solutions involving new languages. New language adoption is a significant obstacle for users, and none of the potential replacements have been successful... Read More

Feb 24 2025
04:30pm EDT
[SQL Death] Apache Pinot Query Optimizer
Yash Mayya, Gonzalo Ortiz

Apache Pinot is a distributed real-time OLAP database, part of a fast-growing segment designed for large-scale, user-facing analytics. Its primary query language is SQL, and it excels at low-latency queries, high throughput, and fresh data. Currently, Pinot supports two SQL dialects, and we are building a compatibility layer to enable pluggable time-series query languages, with Uber's M3 and PromQL as the first integrations. But these languages are just interfaces. Once parsed and validated, queries are transformed into Pinot's relational algebra,... Read More

Feb 17 2025
04:30pm EDT
[SQL Death] Towards Sanity in Query Languages
Viktor Leis, Thomas Neumann

The relational model has stood the test of time is the foundation of most database systems. But let's be honest -- its success is not because of SQL, but in spite of it. SQL's syntax is arcane, inconsistent, and bears little resemblance to the actual execution semantics of queries. Worse yet, SQL is not even a true standard -- every system implements its own incompatible dialect, creating a fractured ecosystem where portability and interoperability are afterthoughts. This lack of standardization... Read More

Feb 10 2025
04:30pm EDT
[SQL Death] Larry Ellison was Right (kinda)! TypeScript Stored Procedures for the Modern Age

No one uses SQL to write business logic. It's written in a programming language with libraries, tests, type safety, and expressive syntax. Traditionally this was the domain of a backend team, who’d try to build enough functionality to keep the frontend team happy without breaking the database. This model hasn’t kept up with the needs of full stack developers though, so they’ve turned to platforms that expose the database directly to code running on client devices. This introduces a host... Read More

Jan 21 2025
12:00pm EDT
GHC 8115
SplitSQL: Practical Pushdown Cache for DataLake Analytics (Xiangpeng Hao)

Modern data analytics embrace a disaggregated architecture which decouples storage, cache, and compute into network-connected independent components. With disaggregated cache, a key design decision is whether to push down query predicates to the cache server. Without predicate pushdown, the cache must send all data to compute nodes, creating network bottlenecks. With predicate pushdown, the cache server evaluates predicates on cached data, but its limited computational resources become the bottleneck. In this talk, we introduce SplitSQL, a pushdown cache system with... Read More

Dec 9 2024
04:30pm EDT
[Building Blocks] Implement, Integrate and Extend a Query Engine (Ruihang Xia)

GreptimeDB uses Apache DataFusion and many other common building blocks in its implementation. This talk will focus on managing the query aspect of a (time-series) database across various parts. We have extended DataFusion to implemenet PromQL, add grammar candies to SQL, cooperate with external secondary indexes and write domain-specific optimizer rules etc. Each of above is extended in a different stage of query execution. In addition to new features, we'll also discuss using DataFusion and Arrow as frameworks for implementing... Read More