Noria: dynamic, partially-stateful data-flow for high-performance web applications

Jon Gjengset, Malte Schwarzkopf,

Jonathan Behrens, Lara Timbo Araujo, Martin Ek, Eddie Kohler, M. Frans Kaashoek, Robert Morris

In Rust

Background

id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

SELECT Story.id, Story.title, count(*) FROM Story

 JOIN Vote ON Story.id = Vote.story_id

GROUP BY Story.id, Story.title

COUNT

JOIN

What if we query multiple times.

SELECT Story.id, Story.title, count(*) FROM Story

 JOIN Vote ON Story.id = Vote.story_id

GROUP BY Story.id, Story.title

Save

Query

Peloton ***, 2

Problems

  • Consistency?
    • Two clients simultaneously update cache
  • Thundering Herd
    • After invaliding the cache
    • the ton of queries go to MySQL
id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

SELECT Story.id, Story.title, count(*) FROM Story

 JOIN Vote ON Story.id = Vote.story_id

GROUP BY Story.id, Story.title

COUNT

JOIN

StoryWithVoteCount

Materialized View!

story_id count
1 2
2 1

Operator is stateful

id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

COUNT

JOIN

StoryWithVoteCount

story_id count
1 2
2 1
2 Chenyao
id title count
1 Peloton 2
2 Terrier 1

2

2

Two Challenges

  • Limit the size of its state and views
  • Adapt the data-flow without downtime

Partially-stateful

 data-flow

id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

COUNT

JOIN

StoryWithVoteCount

story_id count
1 2
2
\bot

WHERE id = 2

\bot
\bot
id title count
1 Pelton 2
2
id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

COUNT

JOIN

StoryWithVoteCount

story_id count
1 2
2 1
\bot
\bot

WHERE id = 2

id title count
1 Pelton 2
2
id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

COUNT

JOIN

StoryWithVoteCount

story_id count
1 2
2 1

WHERE id = 2

id title count
1 Pelton 2
2 Terrier 1

Dynamic data-flow

id title autor
1 Peloton is the best DBMS Andy
2 Terrier: Son of Peloton  Tianyu
3 Poker Night Andy
story_id user
1 Andy
1 Lin
2 Andy
3 Chenyao

Story

Vote

COUNT

JOIN

StoryWithVoteCount

UserKarma

user karma
Andy
Tianyu

Just new an operator with empty states!

\bot
\bot

Lazy Approach!

id title autor
1 Peloton is the best DBMS Andy
2 Terrier: Son of Peloton  Tianyu
3 Poker Night Andy
story_id user
1 Andy
1 Lin
2 Andy
3 Chenyao

Story

Vote

COUNT

JOIN

StoryWithVoteCount

UserKarma

user karma
Andy
Tianyu
\bot
\bot

2

evmap (Shadowing)

id title
1 Peloton is the best DBMS
2 Terrier: Son of Peloton 
story_id user
1 Andy
1 Lin
2 Andy

Story

Vote

COUNT

JOIN

StoryWithVoteCount

evmap

Readers

Writer

root

Map

Copy #1

Map

 Copy #2

A = 4

A = 4

A = 5

Readers

Writer

root

Map

Copy #1

Map

Copy #2

A = 4

A = 5

A = 5

Features&Implementation

  • Wrote In Rust
  • MySQL Protocol Compatible
    • By MySQL adaptor
  • RocksDB as BaseTable
  • Eventually Consistent
  • Distributed
    • hash-partitioning  each operator on a key assign to different instances.
    • all the instances hold same data-flow graph

  • No Transactions Support (in paper)
    • They claim the design is compatible with OCC

Evaluation

Question & Thoughts

  • I don't know materialized view before
  • many many people use memcached to address this (including me)
  • So I think this is an amazing idea