Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bsyncs.com/llms.txt

Use this file to discover all available pages before exploring further.

Choosing memory sources

Every search() call can selectively enable or disable each memory store. By default, all three are active.
# Only search episodic memory (Qdrant vector store)
results = brain.search(
    "What did the user say about Postgres?",
    k=5,
    # include_episodic=True,    ← default
    # include_semantic=True,    ← default
    # include_working=True,     ← default
)
Via the REST API you have explicit control:
import requests

response = requests.post(
    "https://api.bsyncs.com/brain/retrieve",
    headers={"x-api-key": "atlas_..."},
    json={
        "query": "What database does Acme use?",
        "user_id": "user-123",
        "k": 5,
        "include_episodic": True,
        "include_semantic": True,
        "include_working": False,   # skip working memory for this query
        "session_id": None,
    }
)

Graph traversal depth

Control how many relationship hops the graph reasoner follows with max_hops. Higher values surface deeper connections but cost more latency.
# Shallow — direct neighbours only (fast)
results = brain.search("What team is Sarah on?", max_hops=1)

# Default — two hops (balanced)
results = brain.search("What infrastructure does Sarah's team own?", max_hops=2)

# Deep — up to 5 hops (use for complex relational queries)
results = brain.search(
    "Which engineers indirectly depend on the PostgreSQL migration?",
    max_hops=4,
)
max_hops above 3 significantly increases latency on large graphs. Use it only when explicitly needed for multi-hop reasoning. The hard ceiling is 5 hops.
max_hopsTypical latencyBest for
1~50msDirect attribute lookup
2~150ms1-degree relationship queries
3~400msStandard multi-hop reasoning
4–5~800ms+Deep graph exploration

Result count and score filtering

# Return more results
results = brain.search("project decisions", k=10)

# Require a minimum score (0.0 – 1.0)
# Only return facts above this confidence threshold
results = brain.search(
    "project decisions",
    k=10,
    # min_score is passed through the REST API
)
Via REST:
{
  "query": "project decisions",
  "user_id": "user-123",
  "k": 10,
  "min_score": 0.4
}

Contextual compression (LLM reranking)

Enable use_compression to have an LLM extract only the query-relevant portions of each retrieved document before returning them. This reduces context window usage at the cost of one extra LLM call.
Contextual compression requires OPENAI_API_KEY to be set on the server. It is disabled by default.
{
  "query": "database migration risks",
  "user_id": "user-123",
  "k": 5,
  "use_compression": true
}

Session routing

When a session_id is provided, Atlas automatically:
  1. Reads the rolling topic vector from Redis (EMA of recent turn embeddings)
  2. Blends it with the query vector: q_blended = 0.6 · q_query + 0.4 · topic_vec
  3. Caches top-k results as hot facts for fast re-access in the same session
This means follow-up questions in the same session automatically favour topic-consistent memories without any extra configuration.
# First message — establishes session context
brain.add("We are designing the payment microservice.", session_id="sess-1")

# Second message — query is automatically blended with payment context
results = brain.search("Which database should we use?", session_id="sess-1")
# Will prefer payment-relevant database facts over unrelated ones

Filtering by persona

Pass persona in the request to restrict retrieval to a specific agent role. Memories stored under "shared" persona are always included.
# Only retrieve analyst memories (+ shared)
results = brain.search("revenue projections", persona="analyst")

# Only retrieve assistant memories (+ shared)
results = brain.search("user preferences", persona="assistant")