Algorithms
QATBE
Query-Aware Token-Budgeted Extraction. QATBE solves the core problem of LLM context window management: given a token budget, how do you pack the most query-relevant content?
QATBE uses BM25 scoring to rank content segments by relevance, then applies a greedy knapsack algorithm to select the optimal subset within the token budget. The result: maximum relevance density per token.
How it works
- SCS segmentation — The extracted text is split into typed segments: headings, paragraphs, code blocks, lists, tables, quotes.
- BM25 scoring — Each segment is scored against the original query using BM25 TF-IDF. Query terms are expanded with synonyms first.
- Type-aware weighting — Scores are multiplied by a type efficiency factor. Code blocks and tables get a bonus (high information density per token); navigation and metadata get a penalty.
- Greedy knapsack — Segments are sorted by score/tokens ratio, then greedily selected until the budget is reached.
- Coherence restoration — Selected segments are reordered by document position to maintain reading flow.
Segment type weights
| Segment type | Efficiency weight | Rationale |
|---|---|---|
| Code block | 1.5× | High information density, directly actionable |
| Table | 1.4× | Structured data is very token-efficient |
| Heading | 1.3× | Provides context for surrounding content |
| Paragraph | 1.0× | Baseline |
| List | 1.1× | Slightly more efficient than prose |
| Quote | 0.9× | Often secondary content |
| Metadata | 0.5× | Rarely query-relevant |
Detail tiers
The tier parameter in the Search and Scrape APIs maps to a QATBE token budget:
| Tier | Token budget | Best for |
|---|---|---|
key_facts | ~200 tokens | Quick answers, chatbots, speed-critical |
summary | ~1,000 tokens | General RAG, AI context injection |
detailed | ~5,000 tokens | Thorough research, long-form generation |
complete | ~20,000 tokens | Full extraction, document analysis |
Performance
- BM25 scoring: ~0.5ms per 1,000 segments
- Knapsack selection: O(n log n) — always fast
- Works on CPU only, no GPU required
- Scales linearly with document length
Example: token budget in practice
For a 10,000-token web page with a 1,000-token budget, QATBE:
- Splits into ~80 segments across 8 types
- Scores each segment (takes ~1ms total)
- Selects top 15–20 segments by relevance/token ratio
- Returns ~1,000 tokens of maximally relevant content
- Typical relevance retention: 85–95% of key information