[Vaccination 2022] HTAP with Azure Cosmos DB: Hybrid Transaction & Analytical Processing (Hari Sudan S)
Azure Cosmos DB is a multi-tenant globally distributed database service for managing JSON documents at Internet scale. As the amount of data managed by the service has grown several times over the past 5 years, customers have shown an increasing need for being able to do efficient analytics on top of this operational data store. The customer asks include: reducing cost, removing the need to manage separate data storage or ETL, as well as being able to query data using familiar languages like Spark-SQL and T-SQL. And customers also want to retain the schema-free data model, elasticity, geo-replication, and multi-region-write capabilities of Azure Cosmos DB. This talk will cover how Azure Cosmos DB solved this problem with a unique HTAP database model. Specifically, with an incrementally updateable column store table format that is powered by an open source format and retains transactional consistency with the operational (OLTP) data store. This column store table format is stored in a decoupled fashion, while providing the entire gamut of high availability and elasticity features that Azure Cosmos DB offers today.
This talk is part of the Vaccination Database (Booster) Tech Talk Seminar Series.
Hari Sudan S is a Group Engineering Manager on the Azure Cosmos DB team at Microsoft. Hari’s teams own the core backend infrastructure for Azure Cosmos DB, including Storage, Indexing, and High Availability. Hari’s engineering contributions to Azure Cosmos DB include Bw-Tree, Log Structured Storage, and Geo Replication/Geo Failovers in the HA space. Since joining Microsoft in 2008, Hari also worked on the SQL Server Database Engine and Business Intelligence Products. Hari earned his MS in Computer Science and Engineering from the University of Washington.