Configuring Memory Architecture

This guide covers every section of the MACA config in detail, with practical recipes for common use cases.

Understanding the Config Structure

A MACA configuration has three top-level sections: storage, ingestion, and retrieval. Here is a fully annotated example:

maca-config.yaml

# ============================================================
# MACA Configuration — Memory Architecture Configuration Artifact
# ============================================================
# Version: 1.0.0
# Instance: inst_a1b2c3d4e5f67890
# ============================================================

version: "1.0.0"

# --- Storage: Where and how memories are persisted ---
storage:
  vector:
    namespace: "prod-memories"          # Logical partition in the vector store
    embedding_dimension: 1536           # Must match your embedding model (1536 = OpenAI)
    enabled: true                       # Set false to disable vector storage entirely
  graph:
    namespace: "prod-knowledge"         # Logical partition in the graph store
    enabled: true                       # Set false to disable graph/entity storage
  scoping:
    primary_scope: "user"               # user | customer | instance
  retention:
    max_memory_age_days: 365            # Memories older than this are auto-purged (0 = no limit)

# --- Ingestion: What gets extracted from incoming content ---
ingestion:
  categories:
    facts: true                         # Factual statements about users/topics
    preferences: true                   # User likes, dislikes, choices
    temporal_events: true               # Time-bound events and plans
    relationships: true                 # Connections between entities
    procedures: false                   # Step-by-step instructions (off by default)
    emotions: false                     # Emotional states and sentiment
  extraction:
    mode: "standard"                    # standard | enhanced
    confidence_threshold: 0.7           # 0.0-1.0 — minimum confidence to persist a memory
  chunking:
    strategy: "semantic"                # semantic | fixed
    max_chunk_tokens: 512               # Maximum tokens per chunk
  pii:
    handling: "redact"                  # redact | mask | passthrough
    categories: ["email", "phone", "ssn", "credit_card"]
  agent_hints:
    - "Focus on extracting product preferences and purchase history"
    - "Treat project deadlines as high-priority temporal events"

# --- Retrieval: How memories are searched and ranked ---
retrieval:
  modes:
    fast: true                          # ~50-100ms, vector-only search
    accurate: true                      # ~200-500ms, vector + graph + re-ranking
  ranking:
    recency_weight: 0.3                 # 0.0-1.0 — how much to favor recent memories
    relevance_weight: 0.5               # 0.0-1.0 — how much to favor semantic match
    confidence_weight: 0.2              # 0.0-1.0 — how much to favor high-confidence memories
  anticipation:
    enabled: false                      # Predictive pre-fetch for common query patterns
    cache_ttl_seconds: 300              # How long anticipated results stay cached
  context_budget:
    max_tokens: 4096                    # Maximum tokens returned in a single retrieval
  agent_hints:
    - "Prioritize the user's most recent preferences over older ones"
    - "Include relationship context when entities are mentioned"

The version field is required. Synap uses it to track config history and enable rollback. Always increment the version when making changes.

Storage Configuration

The storage section controls where memories are persisted and how they are organized.

Vector Store

The vector store powers semantic search — finding memories that are conceptually similar to a query, even if they do not share exact keywords.

storage:
  vector:
    namespace: "prod-memories"
    embedding_dimension: 1536
    enabled: true

Parameter	Description	Default
`namespace`	Logical partition name. Use different namespaces for different environments (e.g., `dev-memories`, `staging-memories`, `prod-memories`).	`"default"`
`embedding_dimension`	Must match your embedding model. Common values: `1536` (OpenAI text-embedding-ada-002/3-small), `768` (Cohere), `384` (MiniLM).	`1536`
`enabled`	Set to `false` to disable vector storage entirely. Retrieval will rely on graph-only queries.	`true`

Changing embedding_dimension on an existing instance does not re-embed existing memories. If you switch embedding models, you will need to re-index. See the Migration Guide for details.

Graph Store

The graph store maintains a knowledge graph of entities and their relationships. It powers entity resolution, relationship queries, and structured lookups.

storage:
  graph:
    namespace: "prod-knowledge"
    enabled: true

Parameter	Description	Default
`namespace`	Logical partition for the graph store.	`"default"`
`enabled`	Set to `false` to disable graph storage. Entity resolution and relationship extraction will be skipped.	`true`

For most applications, keep both vector and graph stores enabled. The combination of semantic search (vector) and structured queries (graph) produces significantly better retrieval results than either alone.

Scoping

Scoping determines the default isolation boundary for memories. This controls how memories are organized and who can access them.

storage:
  scoping:
    primary_scope: "user"

Value	Description	Best For
`"user"`	Each user gets an isolated memory space. The most common setting.	Multi-user applications, chatbots, personal assistants
`"customer"`	Memories are shared across all users within a customer (organization).	Team-based tools, enterprise apps with shared context
`"instance"`	All memories are shared across the entire instance.	Single-user agents, knowledge bases, internal tools

See the Multi-User Scoping Guide for a detailed explanation of scope hierarchy and isolation patterns.

Retention

Retention controls how long memories are kept before automatic cleanup.

storage:
  retention:
    max_memory_age_days: 365

Parameter	Description	Default
`max_memory_age_days`	Memories older than this threshold are automatically purged during the nightly cleanup cycle. Set to `0` to disable automatic purging.	`0` (no limit)

Consider your use case carefully when setting retention:

Customer support: 90-180 days is usually sufficient. Older tickets lose relevance.
Personal assistant: 365+ days. Users expect long-term memory.
Compliance-sensitive: Match your data retention policy. Consult your legal team.
High-volume analytics: 30-90 days to control storage costs.

Ingestion Configuration

The ingestion section controls what the pipeline extracts from incoming documents and how content is processed.

Category	What It Extracts	Example
`facts`	Factual statements about users, topics, or the world	”User works at Acme Corp as a software engineer”
`preferences`	Likes, dislikes, choices, and stated preferences	”User prefers dark mode and concise responses”
`temporal_events`	Time-bound events, deadlines, plans, and schedules	”User has a dentist appointment next Tuesday”
`relationships`	Connections between entities (people, organizations, concepts)	“Alice manages the engineering team at Acme”
`procedures`	Step-by-step instructions and workflows	”To deploy, run tests first, then build, then push”
`emotions`	Emotional states, sentiment, and tone	”User expressed frustration about the billing issue”

Extraction

Extraction settings control the quality and aggressiveness of the extraction pipeline.

ingestion:
  extraction:
    mode: "standard"
    confidence_threshold: 0.7

Parameter	Description	Default
`mode`	`"standard"` — balanced speed and accuracy. `"enhanced"` — deeper extraction with multi-pass analysis (slower, higher quality).	`"standard"`
`confidence_threshold`	Minimum confidence score (0.0-1.0) required to persist an extracted memory. Higher values mean fewer but more reliable memories.	`0.7`

Tuning the confidence threshold:

Threshold	Effect	Use When
`0.5`	More permissive — captures more memories, including lower-confidence ones	Recall is more important than precision (e.g., research, brainstorming)
`0.7`	Balanced default — good tradeoff between coverage and quality	Most applications
`0.9`	Very strict — only high-confidence memories are stored	Precision is critical (e.g., medical, legal, financial)

Chunking

Chunking determines how documents are split before embedding and storage.

ingestion:
  chunking:
    strategy: "semantic"
    max_chunk_tokens: 512

Strategy	Description	Best For
`"semantic"`	Splits at natural boundaries (paragraphs, topic shifts). Produces higher-quality embeddings.	Conversations, articles, documents
`"fixed"`	Splits at fixed token intervals. Faster but may cut across ideas.	Structured data, logs, uniform-format content

PII Handling

Controls how Personally Identifiable Information is handled during ingestion.

ingestion:
  pii:
    handling: "redact"
    categories: ["email", "phone", "ssn", "credit_card"]

Handling Mode	Description
`"redact"`	PII is detected and permanently removed before storage. Original content cannot be recovered.
`"mask"`	PII is replaced with tokens (e.g., `[EMAIL]`, `[PHONE]`). The mapping is stored separately for authorized unmasking.
`"passthrough"`	No PII processing. Content is stored as-is. Use only when you handle PII externally.

If your application handles user data subject to GDPR, CCPA, or similar regulations, do not use "passthrough". Use "redact" or "mask" and ensure your retention policy complies with applicable data protection requirements.

Agent Hints

Agent hints are natural-language instructions that guide the extraction pipeline. They help Synap understand domain-specific context that the general-purpose pipeline might miss.

ingestion:
  agent_hints:
    - "Focus on extracting product preferences and purchase history"
    - "Treat project deadlines as high-priority temporal events"
    - "When users mention team members, extract the reporting relationship"

Agent hints are powerful for domain-specific tuning. Write them as clear, specific instructions. Avoid vague hints like “extract everything important” — instead, name the specific types of information that matter for your use case.

Retrieval Configuration

The retrieval section controls how memories are searched, ranked, and delivered.

Modes

Synap supports two retrieval modes. You can enable one or both.

retrieval:
  modes:
    fast: true
    accurate: true

Mode	Latency	Method	Best For
`fast`	~50-100ms	Vector-only semantic search	Real-time chat, interactive UIs, latency-sensitive paths
`accurate`	~200-500ms	Vector + graph search with cross-encoder re-ranking	Complex queries, research, cases where precision outweighs speed

When both modes are enabled, the SDK selects the mode via the mode parameter in context.fetch():

# Fast mode for real-time chat
context = await sdk.conversation.context.fetch(
    conversation_id="conv_789",
    search_query=["user preferences"],
    mode="fast"
)

# Accurate mode for a detailed research query
context = await sdk.conversation.context.fetch(
    conversation_id="conv_789",
    search_query=["all interactions with Acme Corp in Q3"],
    mode="accurate"
)

Ranking

Ranking weights control how retrieved memories are scored and ordered. All three weights should be between 0.0 and 1.0. They do not need to sum to 1.0 — they are normalized internally.

retrieval:
  ranking:
    recency_weight: 0.3
    relevance_weight: 0.5
    confidence_weight: 0.2

Signal	Description	Increase When
`recency_weight`	Favors recently created/updated memories	User context changes frequently, recent info is more valuable
`relevance_weight`	Favors memories semantically closest to the search query	Query accuracy matters most, user history is stable
`confidence_weight`	Favors memories with higher extraction confidence scores	Operating in high-stakes domains where accuracy is critical

A good starting point for most applications is relevance: 0.5, recency: 0.3, confidence: 0.2. This prioritizes relevance while giving a moderate boost to recent memories and a light preference for high-confidence extractions. Tune from there based on your retrieval quality observations.

Anticipation

Anticipation enables predictive pre-fetching of context for common query patterns. When enabled, Synap analyzes retrieval patterns and pre-caches likely-needed context before the SDK requests it.

retrieval:
  anticipation:
    enabled: false
    cache_ttl_seconds: 300

Parameter	Description	Default
`enabled`	Turn predictive pre-fetch on or off.	`false`
`cache_ttl_seconds`	How long anticipated results stay in the pre-fetch cache.	`300` (5 minutes)

Anticipation adds a small amount of background processing and storage overhead. Enable it when you observe repetitive retrieval patterns (e.g., a support bot that frequently looks up the same customer context). For applications with highly diverse queries, the hit rate may be too low to justify the overhead.

Context Budget

The context budget controls the maximum volume of content returned in a single retrieval call.

retrieval:
  context_budget:
    max_tokens: 4096

Parameter	Description	Default
`max_tokens`	Maximum number of tokens across all returned memories. Synap truncates and prioritizes to stay within budget.	`4096`

Align max_tokens with your LLM’s context window:

LLM Context Window	Recommended `max_tokens`	Reasoning
4K tokens	1024-1536	Leave room for system prompt + user message + response
8K tokens	2048-3072	Comfortable budget for most conversations
32K+ tokens	4096-8192	Generous context, but more is not always better

Setting max_tokens too high can flood your LLM prompt with marginally relevant context, reducing response quality. Start conservative and increase only if you observe retrieval misses.

Agent Hints (Retrieval)

Similar to ingestion hints, retrieval agent hints guide how Synap ranks and filters results.

retrieval:
  agent_hints:
    - "Prioritize the user's most recent preferences over older ones"
    - "Include relationship context when entities are mentioned"
    - "Deprioritize procedural memories unless the user asks 'how to'"

Common Configuration Recipes

These are battle-tested configurations for common application patterns.

Customer Support Bot
Personal Assistant
Knowledge Base Agent
High-Volume Analytics

High recall, user-scoped, all core categories enabled. Optimized for quickly retrieving a customer’s full context during a support interaction.

customer-support-maca.yaml

version: "1.0.0"

storage:
  vector:
    namespace: "support-memories"
    embedding_dimension: 1536
    enabled: true
  graph:
    namespace: "support-knowledge"
    enabled: true
  scoping:
    primary_scope: "user"
  retention:
    max_memory_age_days: 180

ingestion:
  categories:
    facts: true
    preferences: true
    temporal_events: true
    relationships: true
    procedures: false
    emotions: true            # Track frustration/satisfaction for escalation
  extraction:
    mode: "standard"
    confidence_threshold: 0.6  # Lower threshold for broader recall
  chunking:
    strategy: "semantic"
    max_chunk_tokens: 512
  pii:
    handling: "mask"
    categories: ["email", "phone", "ssn", "credit_card"]
  agent_hints:
    - "Extract product names, order IDs, and issue descriptions as facts"
    - "Track customer sentiment and frustration level"
    - "Note any escalation requests or supervisor mentions"

retrieval:
  modes:
    fast: true
    accurate: true
  ranking:
    recency_weight: 0.4       # Recent interactions matter most in support
    relevance_weight: 0.4
    confidence_weight: 0.2
  anticipation:
    enabled: true             # Support queries are repetitive
    cache_ttl_seconds: 600
  context_budget:
    max_tokens: 3072
  agent_hints:
    - "Always include the customer's most recent open issue"
    - "Include past resolution history for similar problems"

Preferences-heavy, user-scoped with long retention. Designed for applications where the AI builds a deep understanding of individual users over time.

personal-assistant-maca.yaml

version: "1.0.0"

storage:
  vector:
    namespace: "assistant-memories"
    embedding_dimension: 1536
    enabled: true
  graph:
    namespace: "assistant-knowledge"
    enabled: true
  scoping:
    primary_scope: "user"
  retention:
    max_memory_age_days: 0    # No limit -- long-term memory

ingestion:
  categories:
    facts: true
    preferences: true         # Core value for personalization
    temporal_events: true     # Calendars, deadlines, plans
    relationships: true       # Who the user knows
    procedures: false
    emotions: false
  extraction:
    mode: "enhanced"          # Deeper extraction for richer personalization
    confidence_threshold: 0.7
  chunking:
    strategy: "semantic"
    max_chunk_tokens: 512
  pii:
    handling: "redact"
    categories: ["ssn", "credit_card"]
  agent_hints:
    - "Extract dietary preferences, travel preferences, and hobbies"
    - "Track family members, colleagues, and friends as relationships"
    - "Note recurring commitments and routine schedules"

retrieval:
  modes:
    fast: true
    accurate: true
  ranking:
    recency_weight: 0.2       # Older preferences still matter
    relevance_weight: 0.6     # Relevance is king for personalization
    confidence_weight: 0.2
  anticipation:
    enabled: false
  context_budget:
    max_tokens: 4096
  agent_hints:
    - "Preferences should always outrank facts in personalization queries"
    - "Include temporal events only if they are upcoming (within 30 days)"

Facts-focused, client-scoped, large context budget. Designed for agents that answer questions from a shared knowledge base rather than tracking individual users.

knowledge-base-maca.yaml

version: "1.0.0"

storage:
  vector:
    namespace: "kb-memories"
    embedding_dimension: 1536
    enabled: true
  graph:
    namespace: "kb-knowledge"
    enabled: true
  scoping:
    primary_scope: "instance"   # Shared knowledge, not per-user
  retention:
    max_memory_age_days: 0      # Knowledge should persist indefinitely

ingestion:
  categories:
    facts: true                 # Primary category
    preferences: false
    temporal_events: false
    relationships: true         # Useful for "who owns what" queries
    procedures: true            # How-to documentation
    emotions: false
  extraction:
    mode: "enhanced"            # Thorough extraction for documentation
    confidence_threshold: 0.8   # High threshold -- quality over quantity
  chunking:
    strategy: "semantic"
    max_chunk_tokens: 1024      # Larger chunks for coherent passages
  pii:
    handling: "passthrough"     # KB content typically has no PII
    categories: []
  agent_hints:
    - "Extract definitions, technical specifications, and API details"
    - "Preserve code examples and configuration snippets as procedures"
    - "Link related concepts and components as relationships"

retrieval:
  modes:
    fast: false                 # Accuracy matters more than speed for KB
    accurate: true
  ranking:
    recency_weight: 0.1        # KB content is usually stable
    relevance_weight: 0.7      # Find the most relevant answer
    confidence_weight: 0.2
  anticipation:
    enabled: false
  context_budget:
    max_tokens: 8192           # Large budget for comprehensive answers
  agent_hints:
    - "Include procedural steps when the query asks 'how to'"
    - "Include related concept definitions for technical queries"

Minimal extraction, fast mode only, strict retention. Designed for applications that process high volumes of data and need to control costs.

high-volume-maca.yaml

version: "1.0.0"

storage:
  vector:
    namespace: "analytics-memories"
    embedding_dimension: 384    # Smaller model for cost efficiency
    enabled: true
  graph:
    namespace: "analytics-knowledge"
    enabled: false              # Skip graph to reduce processing
  scoping:
    primary_scope: "customer"
  retention:
    max_memory_age_days: 30     # Aggressive retention limit

ingestion:
  categories:
    facts: true
    preferences: false
    temporal_events: false
    relationships: false
    procedures: false
    emotions: false
  extraction:
    mode: "standard"            # Faster processing
    confidence_threshold: 0.8   # Only high-confidence extractions
  chunking:
    strategy: "fixed"           # Faster than semantic chunking
    max_chunk_tokens: 256       # Small chunks for quick processing
  pii:
    handling: "redact"
    categories: ["email", "phone", "ssn", "credit_card"]
  agent_hints:
    - "Extract only key metrics and quantitative facts"

retrieval:
  modes:
    fast: true
    accurate: false             # Fast mode only
  ranking:
    recency_weight: 0.6        # Most recent data is most relevant
    relevance_weight: 0.3
    confidence_weight: 0.1
  anticipation:
    enabled: true
    cache_ttl_seconds: 120
  context_budget:
    max_tokens: 2048
  agent_hints:
    - "Prioritize quantitative facts and metrics"

Applying Changes Safely

Configuration changes can have significant impact on how your instance processes and retrieves memories. Always follow this workflow:

Write your config

Create or modify your MACA YAML file locally. Validate the YAML syntax before proceeding.

Dry run

Open Dashboard → Instance → Memory Configuration, paste the new YAML into the editor, and click Dry run. The validator returns:

Schema and business-rule errors (must fix before applying)
Warnings (recommended fixes; the apply path will still accept)
A diff showing exactly what will change

Review the diff

Examine the diff carefully. Pay particular attention to:

Categories being disabled (extraction will stop for that type)
Scope changes (may affect retrieval behavior)
Retention changes (may trigger cleanup of older memories)
Embedding dimension changes (requires re-indexing)

Apply the config

Once satisfied, click Apply. The new version moves through the configured approval workflow (pending → approved → active) and the dashboard records who applied which version with a timestamp.

Monitor

After applying, monitor your instance in the Dashboard for any unexpected behavior. Check:

Ingestion success rate
Memory extraction counts by category
Retrieval latency
Error rates

Configuration changes affect new requests only. Existing memories are not re-processed, re-extracted, or re-embedded when you change the config. If you need to re-process existing content (e.g., after adding a new category), you must re-ingest the source documents.

Rolling back

Open Dashboard → Instance → Memory Configuration → Version History to see every config version with applied_at, status, and applied_by. From the same view you can diff any two versions side-by-side and click Rollback on a previous version to re-apply it as a new version, preserving the audit trail.

Document your rollback plan before applying any config change, especially in production. Know which version you will roll back to and what the impact will be.

Programmatic MACA management (validate, apply, list versions, rollback) is on the roadmap. Email support@maximem.ai if you need to script config changes across many instances today.

Next Steps

Multi-User Scoping

Learn how to configure memory isolation for multi-tenant applications.

Memory Architecture Concepts

Understand the underlying architecture that MACA configs control.

Entities and Resolution

Learn how entity resolution works with your graph store configuration.

Production Checklist

Ensure your configuration is production-ready.

Documentation Index

​Understanding the Config Structure

​Storage Configuration

​Vector Store

​Graph Store

​Scoping

​Retention

​Ingestion Configuration

​Categories

​Extraction

​Chunking

​PII Handling

​Agent Hints

​Retrieval Configuration

​Modes

​Ranking

​Anticipation

​Context Budget

​Agent Hints (Retrieval)

​Common Configuration Recipes

​Applying Changes Safely

​Rolling back

​Next Steps

Multi-User Scoping

Memory Architecture Concepts

Entities and Resolution

Production Checklist

Understanding the Config Structure

Storage Configuration

Vector Store

Graph Store

Scoping

Retention

Ingestion Configuration

Categories

Extraction

Chunking

PII Handling

Agent Hints

Retrieval Configuration

Modes

Ranking

Anticipation

Context Budget

Agent Hints (Retrieval)

Common Configuration Recipes

Applying Changes Safely

Rolling back

Next Steps