Navigation

Choosing memory sources

Every search() call can selectively enable or disable each memory store. By default, all three are active.

# Only search episodic memory (Qdrant vector store)
results = brain.search(
    "What did the user say about Postgres?",
    k=5,
    # include_episodic=True,    ← default
    # include_semantic=True,    ← default
    # include_working=True,     ← default
)

Via the REST API you have explicit control:

import requests

response = requests.post(
    "https://api.bsyncs.com/brain/retrieve",
    headers={"x-api-key": "atlas_..."},
    json={
        "query": "What database does Acme use?",
        "user_id": "user-123",
        "k": 5,
        "include_episodic": True,
        "include_semantic": True,
        "include_working": False,   # skip working memory for this query
        "session_id": None,
    }
)

Graph traversal depth

Control how many relationship hops the graph reasoner follows with max_hops. Higher values surface deeper connections but cost more latency.

# Shallow — direct neighbours only (fast)
results = brain.search("What team is Sarah on?", max_hops=1)

# Default — two hops (balanced)
results = brain.search("What infrastructure does Sarah's team own?", max_hops=2)

# Deep — up to 5 hops (use for complex relational queries)
results = brain.search(
    "Which engineers indirectly depend on the PostgreSQL migration?",
    max_hops=4,
)

max_hops above 3 significantly increases latency on large graphs. Use it only when explicitly needed for multi-hop reasoning. The hard ceiling is 5 hops.

`max_hops`	Typical latency	Best for
1	~50ms	Direct attribute lookup
2	~150ms	1-degree relationship queries
3	~400ms	Standard multi-hop reasoning
4–5	~800ms+	Deep graph exploration

Result count and score filtering

# Return more results
results = brain.search("project decisions", k=10)

# Require a minimum score (0.0 – 1.0)
# Only return facts above this confidence threshold
results = brain.search(
    "project decisions",
    k=10,
    # min_score is passed through the REST API
)

Via REST:

{
  "query": "project decisions",
  "user_id": "user-123",
  "k": 10,
  "min_score": 0.4
}

Contextual compression (LLM reranking)

Enable use_compression to have an LLM extract only the query-relevant portions of each retrieved document before returning them. This reduces context window usage at the cost of one extra LLM call.

Contextual compression requires OPENAI_API_KEY to be set on the server. It is disabled by default.

{
  "query": "database migration risks",
  "user_id": "user-123",
  "k": 5,
  "use_compression": true
}

Session routing

When a session_id is provided, Atlas automatically:

Reads the rolling topic vector from Redis (EMA of recent turn embeddings)
Blends it with the query vector: q_blended = 0.6 · q_query + 0.4 · topic_vec
Caches top-k results as hot facts for fast re-access in the same session

This means follow-up questions in the same session automatically favour topic-consistent memories without any extra configuration.

# First message — establishes session context
brain.add("We are designing the payment microservice.", session_id="sess-1")

# Second message — query is automatically blended with payment context
results = brain.search("Which database should we use?", session_id="sess-1")
# Will prefer payment-relevant database facts over unrelated ones

Filtering by persona

Pass persona in the request to restrict retrieval to a specific agent role. Memories stored under "shared" persona are always included.

# Only retrieve analyst memories (+ shared)
results = brain.search("revenue projections", persona="analyst")

# Only retrieve assistant memories (+ shared)
results = brain.search("user preferences", persona="assistant")

Getting started

Customization

AI tools

Choosing memory sources

Graph traversal depth

Result count and score filtering

Contextual compression (LLM reranking)

Session routing

Filtering by persona

​Choosing memory sources

​Graph traversal depth

​Result count and score filtering

​Contextual compression (LLM reranking)

​Session routing

​Filtering by persona

Choosing memory sources

Graph traversal depth

Result count and score filtering

Contextual compression (LLM reranking)

Session routing

Filtering by persona