Documentation Index
Fetch the complete documentation index at: https://docs.bsyncs.com/llms.txt
Use this file to discover all available pages before exploring further.
Choosing memory sources
Every search() call can selectively enable or disable each memory store. By default, all three are active.
# Only search episodic memory (Qdrant vector store)
results = brain.search(
"What did the user say about Postgres?",
k=5,
# include_episodic=True, ← default
# include_semantic=True, ← default
# include_working=True, ← default
)
Via the REST API you have explicit control:
import requests
response = requests.post(
"https://api.bsyncs.com/brain/retrieve",
headers={"x-api-key": "atlas_..."},
json={
"query": "What database does Acme use?",
"user_id": "user-123",
"k": 5,
"include_episodic": True,
"include_semantic": True,
"include_working": False, # skip working memory for this query
"session_id": None,
}
)
Graph traversal depth
Control how many relationship hops the graph reasoner follows with max_hops. Higher values surface deeper connections but cost more latency.
# Shallow — direct neighbours only (fast)
results = brain.search("What team is Sarah on?", max_hops=1)
# Default — two hops (balanced)
results = brain.search("What infrastructure does Sarah's team own?", max_hops=2)
# Deep — up to 5 hops (use for complex relational queries)
results = brain.search(
"Which engineers indirectly depend on the PostgreSQL migration?",
max_hops=4,
)
max_hops above 3 significantly increases latency on large graphs. Use it only when explicitly needed for multi-hop reasoning. The hard ceiling is 5 hops.
max_hops | Typical latency | Best for |
|---|
| 1 | ~50ms | Direct attribute lookup |
| 2 | ~150ms | 1-degree relationship queries |
| 3 | ~400ms | Standard multi-hop reasoning |
| 4–5 | ~800ms+ | Deep graph exploration |
Result count and score filtering
# Return more results
results = brain.search("project decisions", k=10)
# Require a minimum score (0.0 – 1.0)
# Only return facts above this confidence threshold
results = brain.search(
"project decisions",
k=10,
# min_score is passed through the REST API
)
Via REST:
{
"query": "project decisions",
"user_id": "user-123",
"k": 10,
"min_score": 0.4
}
Contextual compression (LLM reranking)
Enable use_compression to have an LLM extract only the query-relevant portions of each retrieved document before returning them. This reduces context window usage at the cost of one extra LLM call.
Contextual compression requires OPENAI_API_KEY to be set on the server. It is disabled by default.
{
"query": "database migration risks",
"user_id": "user-123",
"k": 5,
"use_compression": true
}
Session routing
When a session_id is provided, Atlas automatically:
- Reads the rolling topic vector from Redis (EMA of recent turn embeddings)
- Blends it with the query vector:
q_blended = 0.6 · q_query + 0.4 · topic_vec
- Caches top-k results as hot facts for fast re-access in the same session
This means follow-up questions in the same session automatically favour topic-consistent memories without any extra configuration.
# First message — establishes session context
brain.add("We are designing the payment microservice.", session_id="sess-1")
# Second message — query is automatically blended with payment context
results = brain.search("Which database should we use?", session_id="sess-1")
# Will prefer payment-relevant database facts over unrelated ones
Filtering by persona
Pass persona in the request to restrict retrieval to a specific agent role. Memories stored under "shared" persona are always included.
# Only retrieve analyst memories (+ shared)
results = brain.search("revenue projections", persona="analyst")
# Only retrieve assistant memories (+ shared)
results = brain.search("user preferences", persona="assistant")