Algorithms

QATBE

Query-Aware Token-Budgeted Extraction. QATBE solves the core problem of LLM context window management: given a token budget, how do you pack the most query-relevant content?

QATBE uses BM25 scoring to rank content segments by relevance, then applies a greedy knapsack algorithm to select the optimal subset within the token budget. The result: maximum relevance density per token.

How it works

SCS segmentation — The extracted text is split into typed segments: headings, paragraphs, code blocks, lists, tables, quotes.
BM25 scoring — Each segment is scored against the original query using BM25 TF-IDF. Query terms are expanded with synonyms first.
Type-aware weighting — Scores are multiplied by a type efficiency factor. Code blocks and tables get a bonus (high information density per token); navigation and metadata get a penalty.
Greedy knapsack — Segments are sorted by score/tokens ratio, then greedily selected until the budget is reached.
Coherence restoration — Selected segments are reordered by document position to maintain reading flow.

Segment type weights

Segment type	Efficiency weight	Rationale
Code block	1.5×	High information density, directly actionable
Table	1.4×	Structured data is very token-efficient
Heading	1.3×	Provides context for surrounding content
Paragraph	1.0×	Baseline
List	1.1×	Slightly more efficient than prose
Quote	0.9×	Often secondary content
Metadata	0.5×	Rarely query-relevant

Detail tiers

The tier parameter in the Search and Scrape APIs maps to a QATBE token budget:

Tier	Token budget	Best for
`key_facts`	~200 tokens	Quick answers, chatbots, speed-critical
`summary`	~1,000 tokens	General RAG, AI context injection
`detailed`	~5,000 tokens	Thorough research, long-form generation
`complete`	~20,000 tokens	Full extraction, document analysis

Performance

BM25 scoring: ~0.5ms per 1,000 segments
Knapsack selection: O(n log n) — always fast
Works on CPU only, no GPU required
Scales linearly with document length

Example: token budget in practice

For a 10,000-token web page with a 1,000-token budget, QATBE:

Splits into ~80 segments across 8 types
Scores each segment (takes ~1ms total)
Selects top 15–20 segments by relevance/token ratio
Returns ~1,000 tokens of maximally relevant content
Typical relevance retention: 85–95% of key information

Next steps

CEP Extraction →

How content is extracted first

HyperFusion →

Ranking after extraction

Search API →

tier parameter reference

Scrape API →

Direct content extraction