LlamaIndex - Maximem Synap

Give LlamaIndex chat engines and RAG pipelines persistent memory backed by Synap. Conversations survive restarts, and Synap-stored memories sit alongside your document retrieval as first-class context.

Overview

This guide shows how to add Synap to a LlamaIndex application to build pipelines that:

Maintain chat history across sessions and processes
Retrieve user-scoped memories alongside document chunks in a RAG flow
Fuse memory-based and document-based retrieval into a single ranked result set

The Synap LlamaIndex integration ships two drop-in components — each one implements a native LlamaIndex interface, so you can use it anywhere a vanilla LlamaIndex memory or retriever is accepted.

Component	LlamaIndex interface	Purpose
`SynapChatMemory`	`BaseMemory`	Persistent chat history per `conversation_id`
`SynapRetriever`	`BaseRetriever`	Returns `NodeWithScore` objects for RAG pipelines

Setup

Install the package alongside LlamaIndex:

pip install maximem-synap-llamaindex llama-index llama-index-llms-openai

Configure your API key. Generate one from the Synap Dashboard.

.env

SYNAP_API_KEY=synap_your_key_here
OPENAI_API_KEY=your-openai-api-key

Initialize the SDK once at application startup:

from maximem_synap import MaximemSynapSDK

sdk = MaximemSynapSDK()
await sdk.initialize()

See SDK Initialization for the full lifecycle and configuration options.

Basic integration

The smallest useful integration plugs SynapChatMemory into any LlamaIndex chat engine. Past turns are loaded automatically on each call, and new turns are persisted on the way out:

from llama_index.core.chat_engine import CondensePlusContextChatEngine
from synap_llamaindex import SynapChatMemory

memory = SynapChatMemory(
    sdk=sdk,
    conversation_id="conv-001",
    user_id="alice",
    customer_id="acme",   # optional — required for B2B instances
)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    retriever=your_doc_retriever,
    memory=memory,
)

response = await chat_engine.achat("What were my action items from last week?")

SynapChatMemory loads prior messages on get() and writes new turns back to Synap on put(). Failed reads return an empty buffer and log an error; failed writes surface explicitly so callers know if persistence failed. To make user-specific memories retrievable inside the chat engine (alongside or in place of documents), layer in SynapRetriever below.

Core concepts

Persistent chat memory

SynapChatMemory implements BaseMemory. Every LlamaIndex chat engine accepts a memory object — drop this one in to make the conversation durable:

from synap_llamaindex import SynapChatMemory

memory = SynapChatMemory(
    sdk=sdk,
    conversation_id="conv-001",
    user_id="alice",
    customer_id="acme",
)

Each conversation_id maps one-to-one to a Synap conversation. Restart your process, instantiate SynapChatMemory again with the same conversation_id, and the chat engine resumes with the prior history.

Semantic retrieval

SynapRetriever implements BaseRetriever and returns NodeWithScore objects — the same shape every LlamaIndex RAG component expects. Use it as the retriever of a RetrieverQueryEngine, or as a sub-retriever inside a RouterRetriever / QueryFusionRetriever:

from synap_llamaindex import SynapRetriever

retriever = SynapRetriever(
    sdk=sdk,
    user_id="alice",
    customer_id="acme",
    max_results=6,
    mode="accurate",   # "fast" (50-100ms) or "accurate" (graph + rerank)
)

nodes = await retriever.aretrieve("What are the user's project preferences?")
# node.text = memory text
# node.score = relevance score

The two retrieval modes trade latency against comprehensiveness:

	`fast`	`accurate`
Latency	50-100ms	200-500ms
Search	Vector similarity	Vector + graph + re-ranking
Best for	Real-time chat	Multi-entity queries

See Context Fetch for the full retrieval contract.

Complete example: support assistant with memory + docs

The following pipeline gives a chat engine both Synap-backed conversation memory and a fused retriever that blends user-specific memories with document chunks:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.chat_engine import CondensePlusContextChatEngine
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.llms.openai import OpenAI
from synap_llamaindex import SynapChatMemory, SynapRetriever


class SupportAssistant:
    def __init__(self, sdk, user_id: str, customer_id: str | None = None):
        self.sdk = sdk
        self.user_id = user_id
        self.customer_id = customer_id

        # Document retriever — your existing RAG corpus
        docs = SimpleDirectoryReader("./knowledge_base").load_data()
        doc_index = VectorStoreIndex.from_documents(docs)
        doc_retriever = doc_index.as_retriever(similarity_top_k=4)

        # Synap retriever — user-scoped memories
        memory_retriever = SynapRetriever(
            sdk=sdk,
            user_id=user_id,
            customer_id=customer_id,
            max_results=4,
            mode="fast",
        )

        # Fuse the two so a single retrieve call returns ranked results from both
        self.retriever = QueryFusionRetriever(
            retrievers=[doc_retriever, memory_retriever],
            similarity_top_k=6,
            num_queries=1,
        )

        self.llm = OpenAI(model="gpt-4o")

    async def ask(self, conversation_id: str, message: str) -> str:
        memory = SynapChatMemory(
            sdk=self.sdk,
            conversation_id=conversation_id,
            user_id=self.user_id,
            customer_id=self.customer_id,
        )

        chat_engine = CondensePlusContextChatEngine.from_defaults(
            retriever=self.retriever,
            memory=memory,
            llm=self.llm,
        )

        response = await chat_engine.achat(message)
        return str(response)


# Usage
assistant = SupportAssistant(sdk, user_id="alice", customer_id="acme")
reply = await assistant.ask("conv-001", "Has my refund been processed?")

Three things to notice in this pattern:

SynapChatMemory is constructed per-conversation so multiple sessions can run side-by-side without interfering.
SynapRetriever is fused with the document retriever via QueryFusionRetriever, so user-specific facts and corpus documents come back as one ranked list.
Memory and retrieval are independent — drop either and the pipeline still works; together they cover both the “what did we say” and “what do I know about this user” axes.

Advanced patterns

Multi-tenant scoping

Both components accept the same scoping triple — user_id (required), optional customer_id, optional conversation_id. customer_id is required on B2B Synap instances and ignored on single-tenant ones. See Memory Scopes.

retriever = SynapRetriever(
    sdk=sdk,
    user_id="alice",
    customer_id="acme",   # scope retrievals to acme's tenancy
)

Tuning retrieval mode per query

SynapRetriever takes a default mode at construction, but you can swap it temporarily for a single high-recall lookup:

retriever.mode = "accurate"
nodes = await retriever.aretrieve("Summarize everything about the Acme account.")
retriever.mode = "fast"

Composing with other retrievers

SynapRetriever is a regular BaseRetriever, so it composes cleanly with LlamaIndex’s RouterRetriever, QueryFusionRetriever, or any custom retriever you build. Combine it with a document retriever (as in the example above), or route between Synap memories and a vector store based on the query.

Failure semantics

The integration follows the Synap-wide contract:

Retrieval failures degrade gracefully — SynapRetriever.aretrieve returns [] and logs an error
Memory reads degrade gracefully — SynapChatMemory.get returns an empty buffer and logs an error
Memory writes surface failures — SynapChatMemory.put raises SynapIntegrationError so callers know persistence failed

This is by design: read failures should never break a user-facing turn, while write failures must be visible to callers.

Next steps

LangChain

Memory, retriever, and tools for LangChain chains and agents.

Haystack

Retriever and memory-writer pipeline components for Haystack.

Context Fetch

The retrieval API that powers SynapRetriever — modes, scopes, and response shapes.

Memory Scopes

How user_id, customer_id, and conversation_id interact across retrievals.

Documentation Index

​Overview

​Setup

​Basic integration

​Core concepts

​Persistent chat memory

​Semantic retrieval

​Complete example: support assistant with memory + docs

​Advanced patterns

​Multi-tenant scoping

​Tuning retrieval mode per query

​Composing with other retrievers

​Failure semantics

​Next steps