Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

How the Best Product and Engineering Teams Ship AI Solutions

Four core practices that enable teams to move 100x faster, without sacrificing reliability.

7 min
Written by
Reviewed by
No items found.

Deploying AI systems comes with unique challenges that traditional software deployment practices don't fully address. For example, how do you ensure consistency when so many variables (from data to prompts to model parameters) can impact your system's performance? What happens when something goes wrong in production? How can you iterate confidently without introducing regressions?

I’ve seen teams either get stuck in slow, cautious release cycles or, on the flip side, push updates without enough guardrails. Neither is ideal.

We’ve helped hundreds of companies to ship to production and we’ve narrowed down 5 core practices every modern AI team needs to ship improvements faster, without sacrificing reliability.

Teams that follow these practices often cut deployment times from weeks to days (sometimes minutes), moving up to 100x faster while keeping production versions stable.

In this article, I’ll walk you through each practice, show how your team can put them to use, and explain how Vellum can help along the way.

1. Rigorous Versioning

The unique challenge with AI products is that changes often need more detailed release versioning.

With AI development, you’ll want to have more detailed, smaller versions that you can fully control and revert back to in the case of a regression. These might be as small as tweaking a prompt or adding a guardrail, or as big as adding a new tool or agent. All these changes need to be logged and saved as separate versions of your AI solution, so you can actually control the nuanced behavior of your models.

For example, to better manage these releases, Vellum’s integrated release management controls offer custom release tags.

These tags can be re-assigned to update your production, staging, or custom environment to a new prompt or workflow, without 0 code changes. Additionally, Vellum offers release reviews, where similarly to Github reviews, an admin can leave reviews on a Prompt or Workflow Deployment Release after it has been deployed. This is of high priority for many companies.

Preview of the release review in Vellum for each deployed LLM feature

Now that we know we need more detailed version control, how do we actually release all these smaller updates if our app deploys are on a set schedule?

That takes us to core practice #2.

2. Decoupled AI Deployments

AI features should be updated independently from the main application. So, you’ll need to decouple your AI deployments from the rest of the app.

This approach will enable everyone on the team to push updates and improvements as often as needed. AI development is a cross-functional activity, you can greatly unblock your engineers, and let domain experts tweak prompts, and update the production without asking for code changes. With decoupled deployments your team can reduce time-to-production from days to hours.

Vellum offers features like one-click deployment for promoting validated changes to production.

This is extremely useful: The engineering team at Woflow, Michael and Jordan, roll out up to 20 AI updates a week, all without redeploying their core system or disrupting the rest of the app. As soon as there’s user feedback or a need for a fix, the product team can deploy updates immediately.

"Using Vellum, we can now update our AI up to 20 times a week without re-deploying our main application," - Jordan Nemrow, CTO at Woflow
Preview of a data extraction workflow that has a decoupled deployment in production

3. Automated Testing

For your normal CI/CD process, your code must pass through a suite of tests before getting merged into main. You expect the same outputs given a known set of inputs.

With AI solutions, it's not that simple. Outcomes vary, so tests need flexibility. Instead of exact answers, you're evaluating behaviors, reasoning, and decision-making (e.g., tool selection). This requires nuanced success criteria like scores, ratings, and user satisfaction, not just pass/fail tests. Notably, it’s common for an update to improve performance for one type of user query but degrade performance for another.

That’s why you need to test your whole workflow from start to finish, as well as each AI component on its own.

Here's where Vellum really shines: our release mechanism is tightly integrated with our evaluation suite. Before promoting a new version to production, teams can automatically run predefined test cases against the release candidate and compare its performance metrics directly against the currently deployed version.

Preview of an aggregate eval view in Vellum, where you can easily notice Improvements/Regressions

Redfin saved hundreds of hours, by evaluating their “Ask Redfin” chatbot rigorously, before they deployed it in 14 markets across the U.S.

Using Vellum to test prompts and model setups let us move way faster—no waiting on engineering or code changes. More people could experiment, and once things worked, engineers took it live. The Vellum team saved us hundreds of hours. - Sebi Lozano, Senior Product Manager at Redfin

This process will prepare you for production, but expect to have a lot of edge-cases in production and that brings us to the last thing: detailed tracing.

4. Detailed Tracing

With AI products, small changes like tweaking a prompt, model, or your RAG setup can have a big impact on the output. Traditional version control or logging doesn’t catch those nuances. You need to trace at the level of inputs, outputs, model versions, and even evaluation results to understand what actually changed and why the behavior shifted.

Check out the preview below to see how Vellum solves this. You can trace every step your AI takes in production for any workflow. This will enable you to capture errors in production, and solve for user-feedback much faster.

Preview of the tracing view in Vellum

Rely Health, a health-tech company is able to push updates 100x faster using the tracing view, enabling them to customize their healthcare solutions for every clinic or hospital they work with.

"We create dozens of AI workflows—easily 7-8 per client, covering patient phone calls, provider calls, and more. Vellum has been a huge help because we need to rapidly iterate and deploy. We can take a given execution, drop it into our evaluation table, and keep moving fast." - Prithvi, CTO at Rely Health.

Results: 100x faster AI updates

Let’s take a look at how applying these core principles are driving value for our customers today.

Woflow

Take Woflow, for example. They use Vellum to manage critical AI pipelines for ingesting structured data from product catalogs (images/pdfs) during merchant onboarding. By using Vellum’s product, they’ve reported:

Value Metric Outcome with Vellum Improvement
AI Development Cycle Faster Build and Iterate ⚡️ 50% Faster
Error Resolution Instant Fixes 🛠️ Real-Time Updates
Inference Latency 50% Lower ⏱️ 50% Reduction
Production Reliability 90,000+ Requests/Month High Volume, No Issues

Rely Health

Or let’s take Rely Health as another example. They equip care navigators with the latest AI technology to deliver personalized voice AI agents to automate tens of thousands of routine, lower acuity calls for navigation teams.

By leveraging Vellum’s complete suite, they’ve reported:

Value Metric Outcome Improvement
AI Update Speed 100x Faster AI Updates ⚡️ Much Faster
Clinician Trust Higher Trust from Clinicians 👍 More Confidence
Issue Detection Spot Issues Almost Instantly 🔍 Real-Time Visibility

These results aren’t outliers. They showcase why following these practices will enable the whole team to move faster, and ship more reliably.

Summary

I encourage technical teams and leaders to think through your current AI deployment strategy. Ask yourself: How long does it take to update an AI component in production? How confident are you in those updates? What happens today if you notice regressions in production?

If the answers to these questions reveal gaps in your process, it may be time to consider modernizing your approach. The organizations that master AI release management will be the ones that can iterate faster and deliver more reliable AI experiences.

ABOUT THE AUTHOR
Mathew Pregasen
Technical Contributor

Mathew Pregasen is a technical expert with experience with AI, infrastructure, security, and frontend frameworks. He contributes to multiple technical publications and is an alumnus of Columbia University and YCombinator.

ABOUT THE reviewer

No items found.
lAST UPDATED
May 28, 2025
share post
Expert verified
Related Posts
Product Updates
October 1, 2025
7
Vellum Product Update | September
Guides
October 6, 2025
15
A practical guide to AI automation
LLM basics
September 25, 2025
8 min
Top Low-code AI Agent Platforms for Product Managers
LLM basics
September 25, 2025
8 min
The Best AI Agent Frameworks For Developers
Product Updates
September 24, 2025
7 min
Introducing AI Apps: A new interface to interact with AI workflows
LLM basics
September 18, 2025
7 min
Top 11 low‑code AI workflow automation tools
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

Personalized healthcare explanations of a patient-doctor match
SOAP Note Generation Agent

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
AI agent for claims review and error detection

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).
LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
AI agent for claims review and error detection
Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.
Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.

Build AI agents in minutes for

{{industry_name}}

Competitor research agent
Scrape relevant case studies from competitors and extract ICP details.
AI agent for claims review and error detection
E-commerce shopping agent
Retail pricing optimizer agent
Analyze product data and market conditions and recommend pricing strategies.
Risk assessment agent for supply chain operations
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.