Vellum is coming to the AI Engineering World's Fair in SF. Come visit our booth and get a live demo!

Fine-tuning open source models: why is it relevant now?

Why fine tuning is now relevant with open source models

Written by
Reviewed by
No items found.

Five months ago, we wrote a blog on when fine tuning may be a good idea for your LLM application - there were clear cost and latency benefits for specialized tasks. However, 5 months is a long time in the world of LLMs! Since then, retrieval augmented generation has been far more popular and fine-tuning isn’t supported on the latest instruction tuned models from OpenAI or Anthropic either. More recently though, fine tuning has started to make a comeback coinciding with the rise of open source models. New open source models are being released quickly, with the hotly anticipated Llama 2 coming out yesterday (other top models are Falcon-40b, MPT-30b). And these models are very well suited for fine-tuning.

Why You Should Fine-Tune

"Prompt and prosper" may seem like the ideal mantra for working with LLMs, but eventually you'll find that relying exclusively on prompts can paint you into a corner. The initial ease of using prompts often gives way to challenges that become more pronounced over time. High costs, sub-optimal handling of edge cases, limited personalization, high latency, a tendency towards hallucination, and the gradual erosion of your competitive advantage are all potential issues that can take the sheen off your LLM deployment.

Enter fine-tuning: a method that enables you to optimize your LLMs for specific tasks, resulting in lower costs, improved accuracy, and lower latency. In the following sections, we'll explore fine tuning more, demonstrating how this approach is likely to be an important approach moving forward.

What is Fine-Tuning?

In the realm of AI (not just LLMs), fine-tuning involves training a pre-existing model on a smaller, task-specific dataset to adapt it to a particular task or domain.

The foundation model, a pre-trained LLM, serves as the initial starting point. The weights of this network are then further optimized based on the data specific to the task at hand. This process allows the model to develop a nuanced understanding of the particular context and language patterns it's being fine-tuned for.

The result is a model that uses its pre-trained proficiency in general language to become an expert in your specific application, thanks to the additional layer of learning imparted through fine-tuning. In essence, fine-tuning is a process of specialization that enhances the general skills of a language model to perform better on task-specific applications.

The Resurgence of Fine-Tuning with Open Source Models

The AI industry is moving fast, and new developments constantly make us rethink our strategies. Recently released high quality open source models are doing just that. 

The reason for this renewed interest lies in their performance. Open source models are showing potential that can be harnessed using fine-tuning, making them an attractive choice for LLM applications. By employing your own data, you can tune these models to align better with your specific needs. This move not only adds an extra layer of specialization to the model but also empowers you to maintain control of your AI strategy.

Advantages and Disadvantages of Fine-Tuning

Before we get too deep into fine-tuning, it's crucial to understand its benefits and potential drawbacks. Later we’ll share a step by step guide to fine tuning.

Benefits of Fine-Tuning

  1. Improved performance on specific tasks: By tailoring the model to your specific requirements, fine-tuning can result in a significant performance boost.
  2. Lower cost / latency: As the model becomes more efficient at its tasks, it uses fewer resources, leading to cost savings (no need to send the same prompt to the model in each request)
  3. Enhanced privacy: Since fine-tuning uses your own data and is deployed by you, it adds an extra layer of privacy to your operations.

However, there are also some challenges to keep in mind.

Challenges with Fine-Tuning

  1. Time consuming: Fine-tuning a model requires a significant time investment. This includes training and optimizing time for the model, in addition to determining the best practices and techniques for your approach
  2. Specific expertise needed: Fine tuning is a difficult task (often why users turn to prompting despite lower performances for specific tasks). Achieving optimal results typically requires a considerable amount of knowledge and expertise in parsing data, training, inference techniques, etc. 
  3. Infrastructure overhead: Finetuning an LLM on a large dataset can be a costly process, often requiring a complex setup and expensive GPU resources
  4. Lack of contextual knowledge: Finetuned models are trained to perform very specific tasks and often lack the versatility demonstrated by closed source models like GPT-4

A Step-by-Step Guide to Fine-Tuning Models

Embarking on the fine-tuning journey might seem daunting, but it doesn't have to be. Here's a straightforward guide to set you on the right path:

  1. Collect a substantial amount of quality data: Begin with collecting high-quality prompt and completion pairs. The better your data quality, the better your fine-tuned model will be. If you are working with prompts, store inputs and outputs according to Terms of Service. This data is invaluable and can later be used for fine-tuning your model. The better your data quality, the better your fine-tuned model will be. The amount of data needed to construct a well-performing model  is dependent on the use case and type of data. 
  2. Clean your data: Get rid of the instructions and keep only the inputs. The goal here is to have clean, structured data.
  3. Split your dataset: Split your dataset into training and validation sets (we suggest considering how much data you actually need for validation here instead of an arbitrary 80/20 split) to evaluate the performance of your fine-tuned model.
  4. Experiment with hyper-parameters: Test different foundation models and play around with hyper-parameters like learning rate, number of epochs, etc. The goal is to find the best cost, quality, and latency tradeoff for your specific use case.
  5. Fine-tuning: Armed with your optimized parameters, it's time to fine-tune. Be prepared - each fine-tuning task can take some time to run.
  6. Use your fine-tuned model: Once fine-tuned, use your model by passing only inputs and not the original prompts.
  7. Regularly update your model: To guard against data drift and ensure your model improves over time, repeat this process as your dataset grows and as new foundation models are released.

Considerations to Keep in Mind

Fine-tuning is a potent tool, but like any tool, its effectiveness depends on how well you wield it. Here are some considerations to keep in mind:

  • Overfitting: Be wary of overfitting - a common pitfall where the model becomes too attuned to the training data and performs poorly on unseen data.
  • Quality of the dataset: The quality of your dataset plays a pivotal role in determining the efficacy of the fine-tuned model.
  • Hyper-parameters: Choosing the right hyper-parameters can make or break your fine-tuning process.
  • Privacy and Security Implications: Ensuring the privacy of your data during the fine-tuning process is crucial. Ensure that proper data handling and security protocols are in place.

Conclusion and Next Steps

Fine-tuning models can provide significant benefits and solve many of the challenges associated with using large language models. Despite some potential pitfalls, with the right approach and considerations, fine-tuning can be a robust tool in your AI arsenal.

To delve even deeper into fine-tuning, consider exploring more resources on the topic, such as online courses, tutorials, and research papers. And remember, you're not alone on this journey. Need help getting started or fine-tuning your model? Feel free to reach out to me at akash@vellum.ai

ABOUT THE AUTHOR
Akash Sharma
Co-founder & CEO

Akash Sharma, CEO and co-founder at Vellum (YC W23) is enabling developers to easily start, develop and evaluate LLM powered apps. By talking to over 1,500 people at varying maturities of using LLMs in production, he has acquired a very unique understanding of the landscape, and is actively distilling his learnings with the broader LLM community. Before starting Vellum, Akash completed his undergrad at the University of California, Berkeley, then spent 5 years at McKinsey's Silicon Valley Office.

ABOUT THE reviewer

No items found.
lAST UPDATED
Jul 20, 2023
share post
Expert verified
Related Posts
LLM basics
October 10, 2025
7 min
The Best AI Workflow Builders for Automating Business Processes
LLM basics
October 7, 2025
8 min
The Complete Guide to No‑Code AI Workflow Automation Tools
All
October 6, 2025
6 min
OpenAI's Agent Builder Explained
Product Updates
October 1, 2025
7
Vellum Product Update | September
Guides
October 6, 2025
15
A practical guide to AI automation
LLM basics
September 25, 2025
8 min
Top Low-code AI Agent Platforms for Product Managers
The Best AI Tips — Direct To Your Inbox

Latest AI news, tips, and techniques

Specific tips for Your AI use cases

No spam

Oops! Something went wrong while submitting the form.

Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.

Marina Trajkovska
Head of Engineering

This is just a great newsletter. The content is so helpful, even when I’m busy I read them.

Jeremy Hicks
Solutions Architect

Experiment, Evaluate, Deploy, Repeat.

AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component, Use {{general-cta}}

Build AI agents in minutes with Vellum
Build agents that take on the busywork and free up hundreds of hours. No coding needed, just start creating.

General CTA component  [For enterprise], Use {{general-cta-enterprise}}

The best AI agent platform for enterprises
Production-grade rigor in one platform: prompt builder, agent sandbox, and built-in evals and monitoring so your whole org can go AI native.

[Dynamic] Ebook CTA component using the Ebook CMS filtered by name of ebook.
Use {{ebook-cta}} and add a Ebook reference in the article

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.
Button Text

LLM leaderboard CTA component. Use {{llm-cta}}

Check our LLM leaderboard
Compare all open-source and proprietary model across different tasks like coding, math, reasoning and others.

Case study CTA component (ROI)

40% cost reduction on AI investment
Learn how Drata’s team uses Vellum and moves fast with AI initiatives, without sacrificing accuracy and security.

Case study CTA component (cutting eng overhead) = {{coursemojo-cta}}

6+ months on engineering time saved
Learn how CourseMojo uses Vellum to enable their domain experts to collaborate on AI initiatives, reaching 10x of business growth without expanding the engineering team.

Case study CTA component (Time to value) = {{time-cta}}

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.

[Dynamic] Guide CTA component using Blog Post CMS, filtering on Guides’ names

100x faster time to deployment for AI agents
See how RelyHealth uses Vellum to deliver hundreds of custom healthcare agents with the speed customers expect and the reliability healthcare demands.
New CTA
Sorts the trigger and email categories

Dynamic template box for healthcare, Use {{healthcare}}

Start with some of these healthcare examples

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Healthcare explanations of a patient-doctor match
Summarize why a patient was matched with a specific provider.

Dynamic template box for insurance, Use {{insurance}}

Start with some of these insurance examples

Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Agent that summarizes lengthy reports (PDF -> Summary)
Summarize all kinds of PDFs into easily digestible summaries.
AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.

Dynamic template box for eCommerce, Use {{ecommerce}}

Start with some of these eCommerce examples

E-commerce shopping agent
Check order status, manage shopping carts and process returns.

Dynamic template box for Marketing, Use {{marketing}}

Start with some of these marketing examples

LinkedIn Content Planning Agent
Create a 30-day Linkedin content plan based on your goals and target audience.
ReAct agent for web search and page scraping
Gather information from the internet and provide responses with embedded citations.

Dynamic template box for Legal, Use {{legal}}

Start with some of these legal examples

AI legal research agent
Comprehensive legal research memo based on research question, jurisdiction and date range.
PDF Data Extraction to CSV
Extract unstructured data (PDF) into a structured format (CSV).

Dynamic template box for Supply Chain/Logistics, Use {{supply}}

Start with some of these supply chain examples

Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Dynamic template box for Edtech, Use {{edtech}}

Start with some of these edtech examples

Turn LinkedIn Posts into Articles and Push to Notion
Convert your best Linkedin posts into long form content.

Dynamic template box for Compliance, Use {{compliance}}

Start with some of these compliance examples

No items found.

Dynamic template box for Customer Support, Use {{customer}}

Start with some of these customer support examples

Q&A RAG Chatbot with Cohere reranking
Trust Center RAG Chatbot
Read from a vector database, and instantly answer questions about your security policies.

Template box, 2 random templates, Use {{templates}}

Start with some of these agents

AI agent for claims review
Review healthcare claims, detect anomalies and benchmark pricing.
Synthetic Dataset Generator
Generate a synthetic dataset for testing your AI engineered logic.

Template box, 6 random templates, Use {{templates-plus}}

Build AI agents in minutes

AI legal research agent
Comprehensive legal research memo based on research question, jurisdiction and date range.
Insurance claims automation agent
Collect and analyze claim information, assess risk and verify policy details.
Healthcare explanations of a patient-doctor match
Summarize why a patient was matched with a specific provider.
Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Q&A RAG Chatbot with Cohere reranking
Risk assessment agent for supply chain operations
Comprehensive risk assessment for suppliers based on various data inputs.

Build AI agents in minutes for

{{industry_name}}

Clinical trial matchmaker
Match patients to relevant clinical trials based on EHR.
Prior authorization navigator
Automate the prior authorization process for medical claims.
Population health insights reporter
Combine healthcare sources and structure data for population health management.
Legal document processing agent
Process long and complex legal documents and generate legal research memorandum.
Legal contract review AI agent
Asses legal contracts and check for required classes, asses risk and generate report.
Legal RAG chatbot
Chatbot that provides answers based on user queries and legal documents.

Case study results overview (usually added at top of case study)

What we did:

1-click

This is some text inside of a div block.

28,000+

Separate vector databases managed per tenant.

100+

Real-world eval tests run before every release.