[DB Seminar] Spring 2020 DB Group: Building Materialize, a Streaming SQL Database powered by Timely Dataflow
Materialize (Materialize.io, GitHub) is a streaming database. Instead of being optimized for processing ad-hoc transactional or analytical queries, it is optimized for view maintenance on an ongoing basis over streams of already processed transactions.
Although OLTP and OLAP systems often have support for views, they are not architected to efficiently maintain these views as the data change. Systems designed for view maintenance can often handle substantially higher load for workloads that re-issue the same questions against changing data: they perform work proportional to the volume of changes in the source data, rather than in proportion to the number of times the results need to be inspected.
To maintain views, internally Materialize expresses them as differential dataflow computations. Dataflow computations are substantially different from standard query processing. New challenges include the inability for dataflows to re-plan dynamically or recursively. Query optimization also has different constraints: Plans need to be both efficient to compute initially and efficient to maintain incrementally. In this talk I will cover some of the challenges that we encounter in building Materialize, contrasting them to the approaches taken by traditional database architectures.
Zoom Link: https://cmu.zoom.us/j/562649242