Auto-Clustering

TeamLoop automatically detects natural clusters in your knowledge graph and suggests them as subgraph starting points. No manual curation required — clusters form incrementally as you save entities.

How It Works

When you save entities via teamloop_save_knowledge or teamloop_remember, each entity is asynchronously assigned to a cluster:

Compute similarity — The entity’s embedding is compared to existing cluster centroids using cosine similarity
Temporal weighting — Recent entities get up to a 20% similarity boost, making fresh clusters tighter
Assign or create — If similarity exceeds the threshold (0.75), the entity joins the nearest cluster; otherwise a new cluster is created
Update centroid — The cluster centroid is updated via running average (O(1) per assignment)
Generate label — A human-readable label is derived from top entity names and date range

This is online centroid-based clustering — no upfront K, no batch recomputation, and O(1) per entity assignment.

Viewing Clusters

Dashboard

Navigate to Clusters in the sidebar to see all detected clusters with 3+ entities. Each card shows:

Cluster label (auto-generated from top entity terms)
Entity count
Date range of member entities

MCP Tool

teamloop_list_clusters

Returns a formatted list of clusters with entity counts and date ranges.

Parameter	Type	Required	Description
`min_size`	number	No	Minimum entities to show (default: 3)

REST API

GET /v1/clusters              # List clusters
GET /v1/clusters/:id          # Get cluster with entity IDs

Converting to Subgraphs

Found a useful cluster? Convert it to a subgraph with one click:

Dashboard

Click Create Subgraph on any cluster card.

REST API

POST /v1/clusters/:id/create-subgraph

Returns the new subgraph ID and entity count.

Recalculating Clusters

If clusters drift over time, you can force a full recalculation:

Dashboard

Click Recalculate at the top of the clusters page.

REST API

POST /v1/clusters/recalculate

This deletes all existing clusters and re-clusters every entity with an embedding from scratch.

Configuration

Clustering uses these defaults:

Parameter	Default	Description
Similarity threshold	0.75	Minimum cosine similarity to join a cluster
Minimum cluster size	3	Clusters below this are hidden from listings
Maximum clusters	50	Cap per user to prevent fragmentation
Temporal decay days	90	Recent entities get a boost that decays over this period

Temporal Weighting

The effective similarity formula:

effective_similarity = cosine_similarity * (1.0 + 0.2 * max(0, 1 - age_days / 90))

Entities less than 90 days old get up to a 20% boost
Entities older than 90 days have no penalty — they cluster normally
This keeps recent work grouped more tightly without penalizing older knowledge

Design Decisions

Centroids stored in PostgreSQL — Multi-instance deploys (AgentCore Runtime) need shared state; pgvector HNSW gives fast nearest-cluster lookup
Running average centroid — O(1) per assignment; full recompute only on explicit recalculate
No LLM for labeling — Simple term extraction from top entity names + date range; avoids latency and cost
Async assignment — Goroutines with background context; never blocks the save flow
Graceful degradation — If no embedding provider is configured, clustering is silently disabled

Next Steps

Subgraphs — Manage curated subgraphs created from clusters
Synthesis — Generate documents from subgraphs
Knowledge Graphs — Understand entity types and relationships