Skip to content

Hybrid Search

Every query in TeamLoop passes through a multi-stage hybrid search pipeline that combines the strengths of semantic understanding and exact keyword matching. The result is retrieval that catches both conceptual relationships and precise terminology, then refines the top candidates with a cross-encoder reranker for maximum precision.

When you run a query, TeamLoop executes the following stages:

  1. Vector search — finds semantically similar entities using embedding cosine similarity
  2. BM25 full-text search — finds entities containing matching keywords
  3. Reciprocal Rank Fusion — merges the two result sets into a unified ranking
  4. Cross-encoder reranking — rescores the top candidates for precision refinement
  5. Score normalization — normalizes final scores to a 0-1 range

Both search pipelines run in parallel, so hybrid retrieval adds minimal latency compared to vector-only search.

Vector search embeds your query using the same model that embedded your entities, then retrieves the top 50 candidates by cosine similarity from pgvector.

This means a query for “auth token refactor” will match entities about “JWT migration” or “session management overhaul” even when they share no keywords — the embedding model understands semantic equivalence.

Embedding models by deployment:

DeploymentModelDimensions
SaaSVoyage AI (voyage-3)1024
AWS MarketplaceAmazon Titan V2 (default)1024

The embedding model is locked per-organization after the first embedding is created. Both models use 1024 dimensions for schema compatibility. Titan embeddings are generated with bounded parallelism (up to 5 concurrent requests per batch) for high throughput.

Full-text search uses PostgreSQL’s built-in tsvector indexing with ts_rank_cd scoring. It tokenizes your query and matches against pre-indexed entity text, ranking results by term frequency and document coverage.

This catches cases where exact terms matter — searching for “PROJ-1234” or “CockroachDB” will surface entities containing those literal strings, even if the embedding model doesn’t place them close in vector space.

Full-text search retrieves up to 50 candidates, ranked by BM25 relevance.

After both pipelines return their candidates, TeamLoop merges them using Reciprocal Rank Fusion with k=60. For each entity, the RRF score is:

score = sum(1 / (k + rank)) across all lists where the entity appears

An entity that appears in both vector and text results receives contributions from both, boosting it above entities that only appear in one. The constant k=60 prevents top-ranked results from dominating — a well-established default from information retrieval research.

After fusion, results are sorted by descending RRF score. The top 20 candidates are forwarded to the reranking stage.

The reranker is a cross-encoder model that scores each (query, document) pair independently. Unlike the embedding model (which encodes query and document separately), the cross-encoder sees both together and can make fine-grained relevance judgments.

TeamLoop sends the top 20 RRF candidates to the reranker and keeps the top 5 results.

Reranking models by deployment:

DeploymentProviderModel
SaaSVoyage AIrerank-2
AWS MarketplaceCohere via Bedrockrerank-v3.5

Reranking runs with a 300ms timeout. If the reranker is unavailable, returns an error, or exceeds the timeout, TeamLoop logs a warning and returns results in their original RRF-fused order. This means reranking improves precision when available but never blocks a query from returning results.

When entities have been decomposed into atomic facts (via teamloop_save_facts), those facts participate in search alongside regular entities. Both vector and text search index facts as first-class entities.

This means a query for “who approved the database migration?” may surface a specific fact like “Sarah approved the PostgreSQL migration on January 15” rather than the entire parent document. Fact results link back to their parent entity via PART_OF relationships, so you can always navigate to the full context.

See Fact Extraction for details on how facts are created.

The pipeline is designed to return useful results even when components fail:

ScenarioBehaviorMethod
Both pipelines succeedFull hybrid fusion + rerankinghybrid
Embeddings unavailableText search only (BM25 ranking)text_only
Text search failsVector search only (cosine similarity)vector_only
Reranker fails or times outResults returned in RRF orderhybrid (reranked: false)
Both pipelines failError returned to caller

The method field in search metadata tells you which path was taken, so you can understand the quality of your results.

ParameterDefaultDescription
Vector candidates50Max results from vector search
Text candidates50Max results from full-text search
RRF k60Fusion constant (higher = more equal weighting)
Rerank candidates20How many RRF results go to the reranker
Rerank top-k5How many results survive reranking
Rerank enabledtrueWhether reranking runs at all
Rerank timeout300msMax time to wait for the reranker

These defaults work well for most workloads. Per-query overrides are available through the search API for advanced use cases.

Final scores are normalized to a 0-1 range by dividing all scores by the maximum score in the result set. The top result always has a score of 1.0, and subsequent results express their relevance relative to it.

This makes scores comparable across different queries and result set sizes, regardless of whether the results were reranked or returned in RRF order.