Algorithms

QATBE

Query-Aware Token-Budgeted Extraction. QATBE solves the core problem of LLM context window management: given a token budget, how do you pack the most query-relevant content?

QATBE uses BM25 scoring to rank content segments by relevance, then applies a greedy knapsack algorithm to select the optimal subset within the token budget. The result: maximum relevance density per token.

How it works

  1. SCS segmentation — The extracted text is split into typed segments: headings, paragraphs, code blocks, lists, tables, quotes.
  2. BM25 scoring — Each segment is scored against the original query using BM25 TF-IDF. Query terms are expanded with synonyms first.
  3. Type-aware weighting — Scores are multiplied by a type efficiency factor. Code blocks and tables get a bonus (high information density per token); navigation and metadata get a penalty.
  4. Greedy knapsack — Segments are sorted by score/tokens ratio, then greedily selected until the budget is reached.
  5. Coherence restoration — Selected segments are reordered by document position to maintain reading flow.

Segment type weights

Segment typeEfficiency weightRationale
Code block1.5×High information density, directly actionable
Table1.4×Structured data is very token-efficient
Heading1.3×Provides context for surrounding content
Paragraph1.0×Baseline
List1.1×Slightly more efficient than prose
Quote0.9×Often secondary content
Metadata0.5×Rarely query-relevant

Detail tiers

The tier parameter in the Search and Scrape APIs maps to a QATBE token budget:

TierToken budgetBest for
key_facts~200 tokensQuick answers, chatbots, speed-critical
summary~1,000 tokensGeneral RAG, AI context injection
detailed~5,000 tokensThorough research, long-form generation
complete~20,000 tokensFull extraction, document analysis

Performance

  • BM25 scoring: ~0.5ms per 1,000 segments
  • Knapsack selection: O(n log n) — always fast
  • Works on CPU only, no GPU required
  • Scales linearly with document length

Example: token budget in practice

For a 10,000-token web page with a 1,000-token budget, QATBE:

  • Splits into ~80 segments across 8 types
  • Scores each segment (takes ~1ms total)
  • Selects top 15–20 segments by relevance/token ratio
  • Returns ~1,000 tokens of maximally relevant content
  • Typical relevance retention: 85–95% of key information

Next steps