Docs / Key Concepts / How It All Fits Together

How It All Fits Together

Here's the architecture in plain English. No whiteboards required.

The layers

Your assistant is built in layers, and each one has a clear job:

┌─────────────────────────────────────────────┐
│              YOU (the human)                 │
├─────────────────────────────────────────────┤
│           CHANNELS                          │
│     Desktop App · Voice · (future)          │
├─────────────────────────────────────────────┤
│           ASSISTANT CORE                    │
│   Personality · Memory · Decision-making    │
├─────────────────────────────────────────────┤
│           SKILLS                            │
│   Email · Calendar · Weather · Browser ...  │
├─────────────────────────────────────────────┤
│           TOOLS                             │
│   file_read · bash · web_search · browser   │
│   navigate · memory_save · ui_show ...      │
├─────────────────────────────────────────────┤
│           WORKSPACE                         │
│   SOUL.md · USER.md · IDENTITY.md · skills/ │
└─────────────────────────────────────────────┘

How a message flows

When you type something, here's what happens:

  1. You send a message through a channel (the desktop app, voice, etc.)
  2. The assistant core receives it. It pulls in context: your workspace files (SOUL.md, USER.md), your memories, the current conversation history, and any relevant skill instructions.
  3. All of that goes to the AI model. Your message plus context is sent to a cloud AI provider (like Anthropic's Claude) which generates a response.
  4. The response may include actions. “Read this file,” “search the web,” “send this email.” Each action maps to a tool.
  5. Tools execute. Some run in a sandbox (safe, no permission needed). Some touch your machine (these ask for permission first).
  6. Skills orchestrate the tools. A skill like “Email” knows which tools to use and in what order to check your inbox, draft a reply, or send a message.
  7. Results come back to you through the same channel. Text, UI cards, interactive apps, whatever format fits.
  8. The assistant learns. If something worth remembering came up, it saves it to memory or updates your workspace files.

The key insight: your assistant isn't one monolithic thing. It's a conversation layer, a thinking layer, an action layer, and a memory layer, all working together. That's what makes it feel like more than a chatbot.

What runs where

ComponentWhere it runsWhy it matters
Desktop appYour machineWhat you see and interact with
Assistant coreYour machine (Docker container)Processes messages, manages state, coordinates tools
AI modelCloud (Anthropic, etc.)The “thinking” part. Your prompts are sent here.
Workspace filesYour machine (~/.vellum/)Your assistant's persistent brain. Local, readable, yours.
Tool executionYour machine (sandbox or host)Where actions actually happen
Connected servicesCloud (Gmail, Slack, etc.)External services your assistant talks to on your behalf
🫣 The trade-off, again: The workspace and tools are fully local. The thinking happens in the cloud. Your prompts, context, and conversation history are sent to the AI model provider. We keep coming back to this because transparency is one of our principles, not because we're trying to scare you.