Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt

Use this file to discover all available pages before exploring further.

General

Synap is a managed memory platform for AI agents. It provides a complete pipeline for ingesting conversations and documents, extracting structured knowledge (facts, preferences, episodes, emotions, temporal events), resolving entities across conversations, and retrieving relevant context when your agent needs it.Instead of building and maintaining your own vector database, retrieval pipeline, and entity resolution system, you integrate the Synap SDK into your application and let the platform handle the rest. Your agent gets long-term, structured memory with a few lines of code.
Traditional RAG systems retrieve raw document chunks based on similarity search. Synap goes several steps further:
  • Structured extraction: Synap does not just store chunks. It extracts typed knowledge — facts, preferences, episodes, emotions, and temporal events — with confidence scores.
  • Entity resolution: Mentions of the same entity across conversations (e.g., “John”, “my manager”, “John Smith”) are linked to a single canonical entity.
  • Scoped retrieval: Memories are scoped to users, customers, and organizations. Each user gets their own memory without manual isolation logic.
  • Context compaction: Long conversation histories are automatically summarized while preserving key information, reducing token usage.
  • Managed pipeline: No vector databases to deploy, no embedding models to tune, no retrieval pipelines to build.
All of these are memory layers for AI agents, and at a 30,000-ft view they overlap. The differences that matter in practice:
CapabilitySynapMem0ZepLettaSuperMemory
Typed memories (facts / preferences / episodes / emotions / temporal)Native, per-type retrievalSingle “memory” typeFacts + episodesSingle “memory” typeSingle “memory” type
Entity resolution across conversationsYes (graph store, automatic)LimitedYes (graph store)NoNo
Multi-scope (user / customer / client / world)Native scope chainUser onlyUser onlyUser onlyUser only
Customized Memory Architecture (MACA)Yes — generated from a Use-Case Markdown specManual prompt tuningManual configManual schemaManual config
Context compaction (auto-summarize long history)Built-in (context.compact)NoLimitedNoNo
Self-host optionCloud only (managed)Self-host + cloudSelf-host + cloudSelf-hostCloud only
Anticipation cache (gRPC stream of likely-needed memories)YesNoNoNoNo
B2B-native (customer/org isolation, MACA-per-instance)YesNo (user-only)No (user-only)No (user-only)No (user-only)
When Synap is the right choice: you’re building a B2B agent product where each customer org has shared context (policies, runbooks, product data) on top of per-user memory; you want typed extraction so the LLM can reason over preferences vs. facts vs. temporal events distinctly; you don’t want to run a vector DB.When another tool is a better fit: you need self-host today (Mem0 / Zep / Letta); you only have a single-user consumer app and don’t need scope chain or entity graph (Mem0 / SuperMemory are simpler); you want agent state + memory in one SDK (Letta).
Build your own if any of these apply:
  • You have strict data residency or air-gap requirements that managed cloud can’t meet, and your prospective scale doesn’t justify Synap’s self-hosted licensing.
  • Your memory model is highly domain-specific (e.g., medical records with regulated taxonomies) and you’d end up reimplementing extraction anyway.
  • You’re at single-digit MAU and a pgvector table + a few prompt-engineered extraction calls is genuinely cheaper than the integration overhead.
Don’t build your own if you’re just worried about “lock-in” or “wanting control.” The honest cost of running a production memory pipeline — embeddings, vector store, graph store, entity resolution, compaction, eviction, observability — is a multi-engineer-quarter project that no agent team has gotten right on a side budget.
Yes. Synap is designed with a zero-trust security model:
  • Encryption in transit: All connections use TLS 1.3.
  • Encryption at rest: All stored data is encrypted at rest using AES-256.
  • Instance isolation: Each instance has its own storage namespace. Memories from one instance are never accessible from another.
  • Scope isolation: Within an instance, memories are scoped to users and customers. A user can only access memories in their scope chain.
  • Credential management: API keys are hashed (SHA-256) before storage. Plaintext keys are never stored on the server.
See Authentication for details on the credential lifecycle.
Synap Cloud is currently available in US East (Virginia) and EU West (Frankfurt). Additional regions are planned based on demand. Contact sales@maximem.ai for region-specific requirements or data residency needs.

SDK

The official Synap SDK is available for Python 3.11+. It is fully async, built on asyncio, and available via pip:
pip install maximem-synap
A JavaScript/TypeScript SDK is available (@maximem/synap-js-sdk) — note it requires a Python 3.11+ runtime on the host since it wraps the Python SDK as a subprocess. Native TypeScript and Go SDKs are on the roadmap. Check the Changelog for updates on new language support.
Yes. All SDK methods are async and must be called with await inside an async context. This design ensures your application never blocks on network I/O.If you need to call the SDK from synchronous code, use asyncio.run():
import asyncio
from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK(api_key="synap_your_key_here")

# From synchronous code
result = asyncio.run(sdk.memories.create(
    document="User prefers dark mode.",
    document_type="ai-chat-conversation",
    user_id="user_123",
    customer_id="acme_corp",
    mode="fast",
))
The SDK raises typed exceptions that map to HTTP error codes. Catch specific exceptions for fine-grained error handling:
import uuid
from maximem_synap import (
    AuthenticationError,       # 401, 403
    ContextNotFoundError,      # 404
    RateLimitError,            # 429
    ServiceUnavailableError,   # 500, 503
    InvalidInputError,         # 400
)

try:
    context = await sdk.conversation.context.fetch(
        conversation_id=str(uuid.uuid4()),
        search_query=["user preferences"],
    )
except RateLimitError as e:
    # Automatic retry with backoff is built into the SDK.
    # This exception is raised only after all retries are exhausted.
    print(f"Retry after {e.retry_after}s")
except ContextNotFoundError:
    print("Conversation not found")
The SDK automatically retries 429, 500, and 503 errors with exponential backoff. See Error Handling for the full reference.
Yes. Synap is framework-agnostic. The SDK operates independently of your LLM orchestration layer. Common integration patterns:
  • LangChain: Use sdk.conversation.context.fetch() in a custom retriever, then pass the context to your chain.
  • LlamaIndex: Use sdk.conversation.context.compacted() with format="system_prompt" and inject it into your query engine.
  • Direct: Call the SDK from your application code and pass context to any LLM API.
See the First Integration guide for detailed examples.

Memory

Each Instance has a retention policy that Synap chooses automatically based on your use-case file. Compliance-sensitive agents get longer retention with archive-on-expiry; consumer agents get shorter retention with automatic eviction. Frequently accessed memories are kept longer; rarely accessed ones are aged out sooner.You can also delete individual memories at any time via the Delete Memory endpoint, regardless of the retention policy.
Yes. Use the DELETE /v1/memories/{memory_id} endpoint to permanently delete a specific memory. Deletion removes the memory from both the vector store and graph store. Entity references are updated but the entities themselves are not deleted, as they may be referenced by other memories.
await sdk.memories.delete("mem_a1b2c3d4e5f67890")
Deletion is permanent and cannot be undone. See the Memory API for details.
The mode parameter controls a speed-quality tradeoff. Ingestion and retrieval use distinct mode value sets:Ingestion (sdk.memories.create()) — values: "fast" or "long-range" (default).
ModeSpeedQualityBest For
fastHighestGoodReal-time chat ingestion, high-volume streams
long-rangeModerateHighestImportant documents, support tickets, onboarding conversations
Retrieval (sdk.conversation.context.fetch()) — values: "fast" (default) or "accurate".
ModeLatencyMethodBest For
fast~50-100msVector similarity onlyReal-time chat, single-topic queries
accurate~200-500msVector + graph + re-rankingRelationship-aware queries, multi-entity context
The two value spaces are not interchangeable. Passing "accurate" to memories.create() or "long-range" to context.fetch() will be rejected.
If you provide a document_id in the create memory request, Synap checks for duplicates. If a document with the same ID has already been ingested, the request is rejected with a 409 Conflict error.If you do not provide a document_id, the document is ingested as a new record. The extraction pipeline may produce duplicate memories if the content overlaps with previously ingested documents. Entity resolution helps by linking entities across documents, but the memories themselves are stored independently.For production use, we recommend always providing a document_id for deduplication.

Configuration

Synap auto-generates each Instance’s memory configuration from the Use-Case Markdown file you upload. To change behavior — enable different memory categories, shift the primary scope, update retention guidance — re-upload an updated use-case file in the Dashboard. Synap re-evaluates and applies the new configuration. The previous version is retained so you can roll back if needed.
No. Configuration updates are zero-downtime: in-flight requests complete on the previous configuration and new requests pick up the new one. There is no traffic interruption.
Existing memories keep their original scope and category assignments. The updated configuration governs new memories ingested after it takes effect, and it tunes retrieval/ranking behavior going forward. Memory data is not retroactively rewritten.

Billing and Usage

Synap usage is measured across three dimensions:
  • API calls: Each HTTP request to the API counts as one API call. Batch endpoints count as a single call regardless of batch size.
  • Token usage: LLM tokens consumed during ingestion (extraction, categorization) and retrieval (re-ranking, compaction). Input and output tokens are tracked separately.
  • Storage: Total memories stored across all instances. Measured as a monthly peak.
Use the Dashboard Analytics to monitor your usage in real time.
Each HTTP request to any Synap API endpoint counts as one API call, including:
  • Memory ingestion (single and batch)
  • Context fetch and compaction
  • Configuration operations
  • Dashboard queries
  • Analytics queries
  • Status checks
Webhook deliveries do not count as API calls.

Troubleshooting

Common causes and solutions:
  1. Missing or malformed API key: Ensure the header is Authorization: Bearer synap_... with the Bearer prefix.
  2. Revoked key: Check the Dashboard to verify the key is still active.
  3. Wrong instance: The API key may not have access to the instance you are targeting.
See Error Codes for the full list of auth-related errors.
If context fetch returns empty results when you expect matches:
  1. Check ingestion status: Verify the ingestion completed successfully via GET /v1/memories/{ingestion_id}/status. Memories are not retrievable until ingestion completes.
  2. Check scope: Memories are scoped to the user/customer that was specified during ingestion. Context fetch only returns memories within the conversation’s scope chain.
  3. Check confidence threshold: Memories with confidence below the MACA threshold (default 0.7) are discarded during ingestion.
  4. Check memory types: If you are filtering by types in the fetch request, ensure the desired types are included.
  5. Check context budget: If the budget is very small, only the highest-ranked memories may fit.
Use the Dashboard monitoring tools to inspect the ingestion pipeline and stored memories for debugging.
Steps for diagnosing retrieval problems:
  1. Get the correlation ID: Note the X-Correlation-Id from the fetch response.
  2. Check analytics: Use GET /v1/analytics/latency?operation=context_fetch to see if latency is abnormal.
  3. Try different modes: Switch from fast to accurate mode to see if graph traversal finds additional results.
  4. Broaden the query: Try more general search queries or remove type filters.
  5. Check compaction: If the context was recently compacted, some memories may have been summarized away. Use format: "full" to see both the narrative and structured extractions.
If the issue persists, contact support with the correlation ID and instance ID.