November is a month for crisp fall weather, giving thanks, and another round of Vellum product updates! In October, we shipped a ton of new models, improvements to Evals, Prompts, Workflows, and more.
Hold the gravy, let’s dive in and see what’s new 🎃
Online Evaluations for Workflow and Prompt Deployments
Previously, you could only run “Offline Evaluations” or “Inline Evaluations.” You can run Offline Evaluations manually when you want to check Prompt / Workflow performance, e.g. when you’re getting ready to make a new Production Release. Inline Evaluations are useful if you want to check quality during a Workflow’s execution and conditionally do something within the Workflow (retry a prompt, throw an error or Slack alert, escalate to a human, etc.)
But what if you want to monitor how your product performs live in production? Now you can!
Online Evaluations help you see your product’s performance in real time. They run on every production execution of your app, helping you catch & resolve edge-cases faster, and prevent regressions more thoroughly. The best part – you can use Vellum premade Metrics, or Custom Metrics that you’ve already configured!
You can read more about Online Evaluations here!
Configurable Prompt Node Timeouts
Previously, if you wanted to avoid having a single Prompt node slow down your workflow, you’d need to setup a few nodes and cumbersome logic to time out early.
Now, you can easily set maximum timeouts for Prompt Nodes within Workflows, preventing bottlenecks and ensuring efficient resource management.
AutoLayout and AutoConnect for Workflows
As you experiment and your workflows become more complex, keeping them organized will make them easier to iterate on. Now, you can automatically organize and connect nodes in Workflow Sandboxes with just a click.
Datadog and Webhook Logging Beta Integrations
If you want deeper insights into key events happening in Vellum, but in the context of the rest of your systems, now you have it with our Datadog & Webhook Logging integrations (in beta). For example, you can set up a Datadog alert to fire when there are multiple subsequent failures when executing a Workflow Deployment.
If you’d like to participate in the Beta Period and want help setting up their integration, please contact us!
New Models and Providers!
Model optionality gives builders more flexibility to optimize for accuracy, latency, and cost, as use-cases necessitate. Here’s a quick overview of the 25 (!!) new models we added in October:
- All Perplexity models — including Online models for searching the web!
- Cerebras — featuring 2,100 tokens/sec. That’s 3x faster than the current state of the art, or nearly 3 books per minute!
- 13 new OpenRouter models
- The newest Claude 3.5 Sonnet
- Gemini 1.5 Flash 8B
Other noteworthy mentions:
- Vertex AI embedding models:
text-embedding-004
andtext-multilingual-embedding-002
- OpenAI Prompt Caching for GPT-4o and GPT-o1 models
Click here to see more details about the new models we’re supporting.
Evaluations
Reorder Test Suite Variables
You can now reorder Input and Evaluation Variables within a Test Suite’s settings page, helping you stay organized & make changes faster by putting related values next to one another.
Reorder Entities in Evaluation Reports
When your Evaluation Reports use many Metrics, often you want to see related Metrics grouped nearby one another. You can now reorder entities in the Evaluation Report table, making it easier to triage your Metric scores and iterate on your Prompts & Workflows accordingly.
Filter and Sort on Metric Scores
You can now filter and sort on a Metric’s score within Evaluation Reports. This makes it easier to find all Test Cases that fall below a given Metric threshold, so you can iterate and improve your products’ robustness faster.
Prompts, Models, and Embeddings
Prompt Caching Support for OpenAI
OpenAI now automatically performs prompt caching to help optimize cost & latency of prompts. In Vellum, we capture the new Cache Tokens when using supported OpenAI models, to help you analyze cache hit rates and optimize LLM spend.
Vertex AI Embedding Model Support
We now support Vertex AI Embedding Models: text-embedding-004
and text-multilingual-embedding-002
, giving you more options to optimize your RAG pipelines.
New Models!
That’s right, 25 new models.
Deployments:
New API for Listing Entities in a Folder
Now you can programmatically retrieve all entities in a folder via API. The response lists these entities along with high-level metadata about them.
This new API is available in our SDKs beginning with version 0.8.25. For additional details, check out our API Reference here.
Quality of Life Improvements
Workflow Edge Type Improvements
Edges between Nodes in Workflows could appear jagged or misaligned, making it difficult to visualize connections. With this new improvement, edges now snap into straight-line connectors when they are close to horizontal.
See you in December!
That’s all for now folks. We hope you have a wonderful November, filled with lots of food & fall activities. See ya in December!
PSA - sign up for our newsletter to get these updates in right your inbox!
Latest AI news, tips, and techniques
Specific tips for Your AI use cases
No spam
Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.
This is just a great newsletter. The content is so helpful, even when I’m busy I read them.
Experiment, Evaluate, Deploy, Repeat.
AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.