Documentation Index
Fetch the complete documentation index at: https://docs.maximem.ai/llms.txt
Use this file to discover all available pages before exploring further.
General
What is Synap?
What is Synap?
How is Synap different from RAG (Retrieval-Augmented Generation)?
How is Synap different from RAG (Retrieval-Augmented Generation)?
- Structured extraction: Synap does not just store chunks. It extracts typed knowledge — facts, preferences, episodes, emotions, and temporal events — with confidence scores.
- Entity resolution: Mentions of the same entity across conversations (e.g., “John”, “my manager”, “John Smith”) are linked to a single canonical entity.
- Scoped retrieval: Memories are scoped to users, customers, and organizations. Each user gets their own memory without manual isolation logic.
- Context compaction: Long conversation histories are automatically summarized while preserving key information, reducing token usage.
- Managed pipeline: No vector databases to deploy, no embedding models to tune, no retrieval pipelines to build.
How does Synap compare to Mem0, Zep, Letta, and SuperMemory?
How does Synap compare to Mem0, Zep, Letta, and SuperMemory?
| Capability | Synap | Mem0 | Zep | Letta | SuperMemory |
|---|---|---|---|---|---|
| Typed memories (facts / preferences / episodes / emotions / temporal) | Native, per-type retrieval | Single “memory” type | Facts + episodes | Single “memory” type | Single “memory” type |
| Entity resolution across conversations | Yes (graph store, automatic) | Limited | Yes (graph store) | No | No |
| Multi-scope (user / customer / client / world) | Native scope chain | User only | User only | User only | User only |
| Customized Memory Architecture (MACA) | Yes — generated from a Use-Case Markdown spec | Manual prompt tuning | Manual config | Manual schema | Manual config |
| Context compaction (auto-summarize long history) | Built-in (context.compact) | No | Limited | No | No |
| Self-host option | Cloud only (managed) | Self-host + cloud | Self-host + cloud | Self-host | Cloud only |
| Anticipation cache (gRPC stream of likely-needed memories) | Yes | No | No | No | No |
| B2B-native (customer/org isolation, MACA-per-instance) | Yes | No (user-only) | No (user-only) | No (user-only) | No (user-only) |
When should I build my own memory layer instead of using Synap?
When should I build my own memory layer instead of using Synap?
- You have strict data residency or air-gap requirements that managed cloud can’t meet, and your prospective scale doesn’t justify Synap’s self-hosted licensing.
- Your memory model is highly domain-specific (e.g., medical records with regulated taxonomies) and you’d end up reimplementing extraction anyway.
- You’re at single-digit MAU and a
pgvectortable + a few prompt-engineered extraction calls is genuinely cheaper than the integration overhead.
Is my data secure?
Is my data secure?
- Encryption in transit: All connections use TLS 1.3.
- Encryption at rest: All stored data is encrypted at rest using AES-256.
- Instance isolation: Each instance has its own storage namespace. Memories from one instance are never accessible from another.
- Scope isolation: Within an instance, memories are scoped to users and customers. A user can only access memories in their scope chain.
- Credential management: API keys are hashed (SHA-256) before storage. Plaintext keys are never stored on the server.
What regions is Synap available in?
What regions is Synap available in?
SDK
Which programming languages are supported?
Which programming languages are supported?
asyncio, and available via pip:@maximem/synap-js-sdk) — note it requires a Python 3.11+ runtime on the host since it wraps the Python SDK as a subprocess. Native TypeScript and Go SDKs are on the roadmap. Check the Changelog for updates on new language support.Is the SDK async-only?
Is the SDK async-only?
async and must be called with await inside an async context. This design ensures your application never blocks on network I/O.If you need to call the SDK from synchronous code, use asyncio.run():How do I handle SDK errors?
How do I handle SDK errors?
429, 500, and 503 errors with exponential backoff. See Error Handling for the full reference.Can I use Synap with LangChain, LlamaIndex, or other frameworks?
Can I use Synap with LangChain, LlamaIndex, or other frameworks?
- LangChain: Use
sdk.conversation.context.fetch()in a custom retriever, then pass the context to your chain. - LlamaIndex: Use
sdk.conversation.context.compacted()withformat="system_prompt"and inject it into your query engine. - Direct: Call the SDK from your application code and pass context to any LLM API.
Memory
How long are memories stored?
How long are memories stored?
Can I delete specific memories?
Can I delete specific memories?
DELETE /v1/memories/{memory_id} endpoint to permanently delete a specific memory. Deletion removes the memory from both the vector store and graph store. Entity references are updated but the entities themselves are not deleted, as they may be referenced by other memories.What is the difference between fast and long-range / accurate mode?
What is the difference between fast and long-range / accurate mode?
mode parameter controls a speed-quality tradeoff. Ingestion and retrieval use distinct mode value sets:Ingestion (sdk.memories.create()) — values: "fast" or "long-range" (default).| Mode | Speed | Quality | Best For |
|---|---|---|---|
fast | Highest | Good | Real-time chat ingestion, high-volume streams |
long-range | Moderate | Highest | Important documents, support tickets, onboarding conversations |
sdk.conversation.context.fetch()) — values: "fast" (default) or "accurate".| Mode | Latency | Method | Best For |
|---|---|---|---|
fast | ~50-100ms | Vector similarity only | Real-time chat, single-topic queries |
accurate | ~200-500ms | Vector + graph + re-ranking | Relationship-aware queries, multi-entity context |
"accurate" to memories.create() or "long-range" to context.fetch() will be rejected.What happens if I ingest the same document twice?
What happens if I ingest the same document twice?
document_id in the create memory request, Synap checks for duplicates. If a document with the same ID has already been ingested, the request is rejected with a 409 Conflict error.If you do not provide a document_id, the document is ingested as a new record. The extraction pipeline may produce duplicate memories if the content overlaps with previously ingested documents. Entity resolution helps by linking entities across documents, but the memories themselves are stored independently.For production use, we recommend always providing a document_id for deduplication.Configuration
How do I change my Instance's memory behavior?
How do I change my Instance's memory behavior?
Does updating the configuration cause downtime?
Does updating the configuration cause downtime?
What happens to existing memories when configuration changes?
What happens to existing memories when configuration changes?
Billing and Usage
How is usage calculated?
How is usage calculated?
- API calls: Each HTTP request to the API counts as one API call. Batch endpoints count as a single call regardless of batch size.
- Token usage: LLM tokens consumed during ingestion (extraction, categorization) and retrieval (re-ranking, compaction). Input and output tokens are tracked separately.
- Storage: Total memories stored across all instances. Measured as a monthly peak.
What counts as an API call?
What counts as an API call?
- Memory ingestion (single and batch)
- Context fetch and compaction
- Configuration operations
- Dashboard queries
- Analytics queries
- Status checks
Troubleshooting
Why am I getting AuthenticationError?
Why am I getting AuthenticationError?
- Missing or malformed API key: Ensure the header is
Authorization: Bearer synap_...with theBearerprefix. - Revoked key: Check the Dashboard to verify the key is still active.
- Wrong instance: The API key may not have access to the instance you are targeting.
Why are my memories not being retrieved?
Why are my memories not being retrieved?
- Check ingestion status: Verify the ingestion completed successfully via
GET /v1/memories/{ingestion_id}/status. Memories are not retrievable until ingestion completes. - Check scope: Memories are scoped to the user/customer that was specified during ingestion. Context fetch only returns memories within the conversation’s scope chain.
- Check confidence threshold: Memories with confidence below the MACA threshold (default 0.7) are discarded during ingestion.
- Check memory types: If you are filtering by
typesin the fetch request, ensure the desired types are included. - Check context budget: If the budget is very small, only the highest-ranked memories may fit.
How do I debug context retrieval issues?
How do I debug context retrieval issues?
- Get the correlation ID: Note the
X-Correlation-Idfrom the fetch response. - Check analytics: Use
GET /v1/analytics/latency?operation=context_fetchto see if latency is abnormal. - Try different modes: Switch from
fasttoaccuratemode to see if graph traversal finds additional results. - Broaden the query: Try more general search queries or remove type filters.
- Check compaction: If the context was recently compacted, some memories may have been summarized away. Use
format: "full"to see both the narrative and structured extractions.