TopK: Billion-Scale Hybrid Retrieval from the Ground Up (Marek Galovic)
- Speaker:
- Marek Galovic
- Date:
- Mon Feb 16, 2026 @ 04:30pm EST
- Date:
- Mon Feb 16, 2026
- Time:
- 04:30pm EST
- Location:
- https://cmu.zoom.us/j/99830697483?pwd=RLKiHNDLPvOyoCMHSBGmWfbba4ZAb4.1Zoom
- Title:
- TopK: Billion-Scale Hybrid Retrieval from the Ground Up
- System:
- TopK
Talk Info:
TopK is a search engine built from the ground up for unstructured retrieval. It combines dense/sparse/multi-vector search, lexical search, powerful filtering, and customizable scoring capabilities in a single, cloud-native system that scales to billions of documents with high ingest throughput and O(10ms) p99 query latencies. In this talk, I'll focus on how TopK is designed on a high-level, including our disaggregated read-write path and distributed compaction, and then dive deep into our columnar file format (.bob) and query engine (reactor), which we built from the ground up to support search at scale.
This talk is part of the PostgreSQL vs. The World Seminar Series.
Bio:
Marek is the CEO and co-founder of TopK - an AI-native search engine. Before founding TopK, Marek led data/control plane engineering teams at Pinecone and worked on fraud detection and financial forecasting at Shopify. He holds a degree in computer science and artificial intelligence from CTU Prague, where he researched game theory and adversarial machine learning algorithms applied to computer security (published at NeurIPS).