Unified multi-modal

One API for text, image, audio, video, and file inputs — regardless of provider.

Every provider has its own multi-modal shape. OpenAI wants { type: 'image_url', image_url: {...} }, Anthropic wants { type: 'image', source: {...} }, Gemini wants parts-with-inline- data. @agentskit/core provides a provider-neutral ContentPart model — adapters that understand a modality read the parts, the rest fall back to a text projection.

Install

Built into @agentskit/core.

import {
  textPart,
  imagePart,
  audioPart,
  filePart,
  partsToText,
} from '@agentskit/core'
import type { Message } from '@agentskit/core'

const parts = [
  textPart('What is in this screenshot?'),
  imagePart('https://cdn.example.com/screenshot.png', { detail: 'high', mimeType: 'image/png' }),
]

const message: Message = {
  id: crypto.randomUUID(),
  role: 'user',
  content: partsToText(parts),  // text projection: "What is...\n[image: ...]"
  parts,                        // adapters that support vision read this
  status: 'complete',
  createdAt: new Date(),
}

Part kinds

Builder	`type`	Notes
`textPart(text)`	`'text'`	Plain text segment
`imagePart(src, { mimeType?, detail? })`	`'image'`	Data URL, http(s), or provider-hosted id
`audioPart(src, { durationSec? })`	`'audio'`
`videoPart(src, { durationSec? })`	`'video'`
`filePart(src, { filename? })`	`'file'`	PDF, CSV, arbitrary binary

In an adapter

A vision-aware adapter reads msg.parts and maps each entry to its provider's shape. A text-only adapter keeps reading msg.content and sees a safe projection like "caption\n[image: pic.png]".

import { normalizeContent, filterParts } from '@agentskit/core'

function toOpenAIMessage(m: Message) {
  const { parts } = normalizeContent(m.content, m.parts)
  return {
    role: m.role,
    content: parts.map(p => {
      if (p.type === 'text') return { type: 'text', text: p.text }
      if (p.type === 'image') return { type: 'image_url', image_url: { url: p.source, detail: p.detail } }
      return { type: 'text', text: `[${p.type}]` }
    }),
  }
}

// Quickly grab every attached image:
const images = filterParts(parts, 'image')

Unified multi-modal

Install

Part kinds

In an adapter

See also

On this page

Unified multi-modal

Install

Build a multi-modal message

Part kinds

In an adapter

See also

On this page