Automatic User Information Extraction

MUXI includes a powerful automatic user information extraction system that can identify and store important information about users from conversations. This guide explains how the extraction system works and how to configure it in your MUXI applications.

User information extraction is designed with privacy in mind. All extraction is opt-in, anonymous users are excluded by default, and safeguards are in place to protect sensitive information.

Overview

The automatic user information extraction system analyzes conversations between users and agents to identify key facts, preferences, and characteristics. It then stores this information in the user’s context memory, making it available for future interactions.

Key Features

Zero-effort personalization: Remembers important user information without explicit programming
Importance scoring: Prioritizes information based on relevance and significance
Confidence assessment: Evaluates certainty before storing information
Conflict resolution: Handles updates when new information contradicts existing data
Privacy controls: Protects user privacy with opt-out mechanisms and anonymous user exclusion

Architecture

The extraction system follows a centralized architecture:

MemoryExtractor: Core class that analyzes conversations and extracts information
Orchestrator: Manages the extraction process and coordinates with agents
Agent: Delegates extraction to its parent Orchestrator

When a user interacts with an agent, the conversation is processed asynchronously to extract important information without impacting response times.

Implementation Details

The extraction process works as follows:

User sends a message to an agent
Agent processes and responds to the message
Agent delegates extraction to the Orchestrator
Orchestrator processes the conversation asynchronously
MemoryExtractor analyzes the content to identify key information
Extracted information is scored for importance and confidence
Valid information is stored in the user’s context memory
Future conversations include this information in the agent’s context

Configuration

Basic Setup

Automatic user information extraction is enabled by default in the Orchestrator. You can explicitly configure it as follows:

Declarative way

# app.py
from muxi import muxi
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()

# Initialize MUXI with extraction enabled
app = muxi(
    buffer_memory=15,
    long_term_memory="postgresql://user:pass@localhost/db",
    auto_extract_user_info=True,  # Enable automatic extraction
    config_file="configs/muxi_config.yaml"
)

Programmatic way

import os
from muxi.core.orchestrator import Orchestrator
from muxi.models.providers.openai import OpenAIModel
from muxi.server.memory.buffer import BufferMemory
from muxi.server.memory.long_term import LongTermMemory

# Initialize components
model = OpenAIModel(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o"
)
buffer = BufferMemory(15)
long_term_memory = LongTermMemory(connection_string="postgresql://user:pass@localhost/db")

# Initialize orchestrator with extraction enabled
orchestrator = Orchestrator(
    buffer_memory=buffer,
    long_term_memory=long_term_memory,
    auto_extract_user_info=True,  # Enable automatic extraction
)

# Create an agent that will use the orchestrator's extraction
orchestrator.create_agent(
    agent_id="assistant",
    description="A helpful assistant that remembers important user information.",
    model=model
)

Advanced Configuration

For more control over the extraction process, you can configure additional parameters:

from muxi.models.providers.openai import OpenAIModel

# Create a specialized model for extraction
extraction_model = OpenAIModel(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o",  # Using a capable model for extraction
    temperature=0.0  # Lower temperature for more deterministic extraction
)

# Initialize orchestrator with advanced extraction options
orchestrator = Orchestrator(
    buffer_memory=buffer,
    long_term_memory=long_term_memory,
    auto_extract_user_info=True,
    extraction_model=extraction_model,  # Specify a dedicated model for extraction
    extraction_interval=3  # Process extraction every 3 messages (default is 1)
)

Extraction Parameters

The following parameters can be configured:

Parameter	Description	Default
`auto_extract_user_info`	Enable/disable automatic extraction	`True`
`extraction_model`	Model used for extraction	Same as agent model
`extraction_interval`	Process extraction every N messages	`1`
`confidence_threshold`	Minimum confidence score (0.0-1.0)	`0.7`
`importance_threshold`	Minimum importance score (0.0-1.0)	`0.5`

Privacy Controls

Automatic Privacy Features

The extraction system includes several built-in privacy features:

Anonymous User Exclusion: Users with user_id=0 are automatically excluded from extraction
Configurable Thresholds: Only information with sufficient confidence is stored
PII Detection: Basic detection of sensitive personal information
Asynchronous Processing: Extraction runs separately from conversation flow

User Opt-Out

You can explicitly exclude users from extraction:

# Exclude a user from extraction
await orchestrator.opt_out_user(user_id="user_123", opt_out=True)

# Re-enable extraction for a user
await orchestrator.opt_out_user(user_id="user_123", opt_out=False)

Data Purging

You can purge all extracted information for a user:

# Remove all extracted information for a user
await orchestrator.purge_user_data(user_id="user_123")

Accessing Extracted Information

Information extracted by the system is automatically included in the agent’s context for future conversations. You can also access it directly:

# Get all context memory for a user
user_context = await orchestrator.get_user_context_memory(user_id="user_123")

# Check specific extracted information
if "preferred_language" in user_context:
    preferred_language = user_context["preferred_language"]["value"]
    print(f"User prefers programming in {preferred_language}")

Example: Extraction in Action

Here’s an example of automatic extraction in a conversation:

# User interacts with the agent
user_message = "I live in Berlin and I'm learning Python. I prefer visual explanations over text."
agent_response = await app.chat(
    message=user_message,
    agent_name="assistant",
    user_id="user_123"
)

# Behind the scenes, the system extracts:
# - Location: Berlin
# - Learning: Python
# - Preference: Visual explanations

# In a future conversation, the agent can use this information
later_response = await app.chat(
    message="Can you help me understand dictionaries?",
    agent_name="assistant",
    user_id="user_123"
)
# The agent might respond with a visual explanation of Python dictionaries,
# since it remembers the user's location, learning interests, and preferences

Best Practices

Set an appropriate extraction interval - Balance between comprehensive extraction and performance. For chatbots, every message (interval=1) works well; for applications with frequent messages, consider a higher interval.
Consider using a specialized extraction model - For high-accuracy extraction, consider using a more capable model for extraction than your main conversation model.
Adjust confidence thresholds - Tune the confidence threshold based on your needs. Higher thresholds (e.g., 0.8) will reduce false positives but might miss some information.
Respect privacy preferences - Always provide clear opt-out mechanisms and process opt-out requests promptly.
Review extracted information periodically - Audit the quality and accuracy of extracted information to fine-tune your system.
Implement retention policies - Define how long extracted information should be stored and implement automatic aging or purging.

Limitations and Considerations

Extraction is only as good as the underlying model. More capable models generally provide better extraction results.
Complex or nuanced information may not be extracted correctly. Consider supplementing with explicit storage where needed.
Extraction runs asynchronously and may not be immediate. Don’t rely on instant availability of newly extracted information.
Balance between extraction frequency and API costs, especially with third-party models that charge per token.

What’s Next

Now that you understand automatic user information extraction, you might want to:

Learn about Context Memory Namespaces for organizing extracted information
Explore Memory Optimization for improving performance
Consider implementing Interface-Level User ID Generation for consistent user identification