Master Thesis Talk: Non-blocking Lazy Schema Changes in Multi-version Database Management Systems
The relational schema of a table in a database management system (DBMS) describes its logical attribute information and constraints. Despite the aim of separation between logical schema and physical data storage, in practice, the schema often dictates how a DBMS organizes data on disk or in memory. This tight coupling is because the database’s physical schema must match its logical schema. The problem with this is that applications that incur frequent schema changes (e.g., add a column, change column type) may become slower or even unavailable during a change due to data migration. A better approach is to support non-blocking schema changes by storing multiple versions of tables and allow data migration happens lazily.
In this thesis, we introduce multi-version schemas that are based on multi-version concurrency control policies (MVCC) to support fast online schema changes. This approach maintains multiple tables of different schemas and allows transactions to see the correct versions of tuples. It migrates tuples from old schema to new schema lazily on demand. We show that the overhead of maintaining multiple tables is small, the recovery from the performance degeneration caused by schema change is fast, and the system can control the “slow” period with version compaction.