Skip to main content
Acontext supports multi-modal messages that include text, images, audio, and PDF documents. You can send and retrieve these messages in both OpenAI and Anthropic formats, with automatic format conversion between providers.

Prerequisites

Before working with multi-modal messages, ensure you have:
  • A running Acontext server (run locally)
  • An Acontext API key
Multi-modal content is stored as assets in S3, while message metadata is stored in PostgreSQL. Acontext automatically handles file uploads and generates presigned URLs for retrieval.

Supported content types

Acontext supports the following multi-modal content types:

Images

PNG, JPEG, GIF, WebP formats for visual content

Audio

WAV, MP3 formats for voice and sound

Documents

PDF documents for analysis and summarization

Sending images

Images with OpenAI format

OpenAI supports images through the image_url content part type, which accepts both external URLs and base64-encoded data URLs:
  • Image URL
  • Base64-encoded
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)
client.ping()
session = client.sessions.create()

# Send a message with an image URL
message = client.sessions.send_message(
    session_id=session.id,
    blob={
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What's in this image?"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/image.png",
                    "detail": "high"  # Options: "low", "high", "auto"
                }
            }
        ]
    },
    format="openai"
)

print(f"Message with image sent: {message.id}")
The detail parameter controls image processing quality. Use "high" for detailed analysis, "low" for faster processing, or "auto" to let the system decide.
Base64-encoded images in OpenAI format use the data URL scheme: data:image/[type];base64,[base64-data]. The image data is stored within the message parts and returned as base64 when retrieved.

Images with Anthropic format

Anthropic requires images to be base64-encoded:
import base64
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)
session = client.sessions.create()

# Read and encode image as base64
with open("image.png", "rb") as image_file:
    image_data = base64.b64encode(image_file.read()).decode("utf-8")

# Send message with base64 image
message = client.sessions.send_message(
    session_id=session.id,
    blob={
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Describe this image"
            },
            {
                "type": "image",
                "source": {
                    "type": "base64",
                    "media_type": "image/png",
                    "data": image_data
                }
            }
        ]
    },
    format="anthropic"
)

print(f"Message with image sent: {message.id}")
Anthropic format requires images to be base64-encoded. The base64 data is stored within the message parts and returned as base64 when you retrieve the message.

Sending audio

Audio content can be included in messages for speech-to-text or audio analysis use cases:
import base64
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)
session = client.sessions.create()

# Read and encode audio file
with open("audio.wav", "rb") as audio_file:
    audio_data = base64.b64encode(audio_file.read()).decode("utf-8")

# Send message with audio (OpenAI format)
message = client.sessions.send_message(
    session_id=session.id,
    blob={
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Transcribe this audio"
            },
            {
                "type": "input_audio",
                "input_audio": {
                    "data": audio_data,
                    "format": "wav"
                }
            }
        ]
    },
    format="openai"
)

print(f"Message with audio sent: {message.id}")

Sending Files

You can send files for analysis and understanding using base64-encoded content. Different formats handle files differently:
  • Anthropic Format
  • OpenAI Format
Anthropic supports sending files using the document content type with base64-encoded data:
import base64
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)
session = client.sessions.create()

# Read and encode PDF file as base64
with open("report.pdf", "rb") as pdf_file:
    pdf_data = base64.b64encode(pdf_file.read()).decode("utf-8")

# Send message with PDF document (Anthropic format)
message = client.sessions.send_message(
    session_id=session.id,
    blob={
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_data
                }
            },
            {
                "type": "text",
                "text": "Summarize the key findings in this report"
            }
        ]
    },
    format="anthropic"
)

print(f"Message with PDF sent: {message.id}")
When you send a PDF with base64 data, the base64 content is stored within the message parts JSON in S3. When you retrieve the message, the PDF is returned as base64 data again—not as a presigned URL. This keeps the PDF data inline with the message content.

Supported document formats

Acontext will not limit the document formats in the context, you can store any file type in Acontext. However, not every file type is supported by your LLM Provider. You may check their documentations to see if the file type is supported:

Retrieving multi-modal messages

When retrieving messages, the content format depends on how the message was originally sent:
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)

# Retrieve messages
result = client.sessions.get_messages(
    session_id="session_uuid",
    format="anthropic",  # or "openai"
    limit=50
)

print(f"Retrieved {len(result.items)} messages")

# Access messages
for msg in result.items:
    for block in msg.content:
        if block.get('type') == 'image':
            # Images sent as base64 are returned as base64
            print(f"Image source type: {block['source']['type']}")
How content is returned:
  • Images/PDFs sent as base64 data are returned as base64 data (stored within the message parts)
  • Images/PDFs sent as URLs in OpenAI format are stored as URLs in metadata
  • Files uploaded via multipart form-data are stored as separate S3 assets (not covered in this guide)

Format conversion

Acontext automatically converts between formats when retrieving messages:
Format conversion for images:
  • Images sent as base64 are stored as base64 and returned as base64 in any format
  • Images sent as URLs (in OpenAI format) are stored as URLs and can be:
    • Retrieved as URLs in OpenAI format (URL is preserved)
    • Retrieved as base64 in Anthropic format (URL is downloaded and converted on-the-fly)
  • Store OpenAI, Retrieve Anthropic
  • Store Anthropic, Retrieve OpenAI
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)
session = client.sessions.create()

# Send in OpenAI format
client.sessions.send_message(
    session_id=session.id,
    blob={
        "role": "user",
        "content": [
            {"type": "text", "text": "Analyze this image"},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/photo.jpg"
                }
            }
        ]
    },
    format="openai"
)

# Retrieve in Anthropic format
result = client.sessions.get_messages(
    session_id=session.id,
    format="anthropic"  # Different format!
)

# Image is automatically converted to Anthropic format
print("Message retrieved in Anthropic format")
for msg in result.items:
    print(f"Role: {msg.role}")
    for block in msg.content:
        print(f"  Block type: {block.type}")
Format conversion is bidirectional and lossless for common content types. Use the format that best matches your workflow when retrieving messages.

Best practices

  • Compress images and PDFs to reduce storage costs and improve performance
  • Use appropriate resolutions (e.g., 2048px max for most image analysis tasks)
  • Consider using "detail": "low" in OpenAI format for simple image understanding tasks
  • Base64-encoded content increases message size by ~33%, so optimization is important
  • Base64 encoding works well for files under 10MB
  • For very large files (>10MB), consider using multipart file uploads instead
  • Monitor your storage usage and clean up old sessions regularly
  • Use OpenAI format for GPT-4 Vision and similar models
  • Use Anthropic format for Claude with vision and document analysis capabilities
  • Format conversion is automatic and lossless for common content types
  • Base64 data (images, PDFs, audio) is stored within the message parts JSON
  • Message parts are stored in S3, with metadata in PostgreSQL
  • When retrieving, base64 content is returned as-is (not converted to URLs)

Complete workflow example

Here’s a complete example that demonstrates sending and retrieving multi-modal messages:
import base64
from acontext import AcontextClient

client = AcontextClient(
    api_key="sk-ac-your-root-api-bearer-token",
    base_url="http://localhost:8029/api/v1"
)

try:
    # Create session
    session = client.sessions.create()
    print(f"Session created: {session.id}")
    
    # Send text + image message
    with open("screenshot.png", "rb") as f:
        image_data = base64.b64encode(f.read()).decode("utf-8")
    
    message = client.sessions.send_message(
        session_id=session.id,
        blob={
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What UI improvements would you suggest for this design?"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                }
            ]
        },
        format="anthropic"
    )
    print(f"Message sent: {message.id}")
    
    # Retrieve messages
    result = client.sessions.get_messages(
        session_id=session.id,
        format="openai"  # Convert to OpenAI format
    )
    
    print(f"\nRetrieved {len(result.items)} messages:")
    for msg in result.items:
        print(f"  Role: {msg.role}")
        if isinstance(msg.content, list):
            for part in msg.content:
                if part.get('type') == 'text':
                    print(f"  Text: {part.get('text')[:50]}...")
                    
finally:
    client.close()

Next steps