← Back to blog
Anita Kirkovska

Author

Anita Kirkovska

Founding Growth Lead

75 posts

Articles by Anita Kirkovska

Understanding Logprobs: What They Are and How to Use Them

Understanding Logprobs: What They Are and How to Use Them

Learn what OpenAI's logprobs are and how can you use them for your LLM applications

GuidesDec 3, 20257 min
Document Data Extraction in 2026: LLMs vs OCRs

Document Data Extraction in 2026: LLMs vs OCRs

A choice dependent on specific needs, document types and business requirements.

GuidesDec 3, 202510 min
GPT-5 Benchmarks

GPT-5 Benchmarks

See how GPT-5 performs across benchmarks; with a big focus on health

GuidesDec 3, 20255 min
Google's AP2: A new protocol for AI agent payments

Google's AP2: A new protocol for AI agent payments

How verifiable mandates are creating a secure foundation for AI-driven commerce.

GuidesDec 3, 20257 min
A Guide to LLM Observability

A Guide to LLM Observability

Think your APM tool has your AI covered? Think again. LLMs need their own observability playbook.

GuidesOct 17, 202520 min
OpenAI's Agent Builder Explained

OpenAI's Agent Builder Explained

A breakdown of OpenAI’s new Agent Builder and what it signals for the future of building and deploying AI agents.

AllOct 6, 20258 min
Zero-Shot vs Few-Shot prompting: A Guide with Examples

Zero-Shot vs Few-Shot prompting: A Guide with Examples

Exploring zero-shot & few-shot prompting: usage, application methods, and limits.

GuidesSep 23, 20257 min
Chain of Thought Prompting (CoT): Everything you need to know

Chain of Thought Prompting (CoT): Everything you need to know

We break down when Chain-of-Thought adds value, when it doesn’t, and how to use it in today’s LLMs.

GuidesSep 22, 202513 min
Build AI Products Faster: Top Development Platforms Compared

Build AI Products Faster: Top Development Platforms Compared

Compare top AI platforms for fast, reliable development in 2025.

LLM basicsSep 19, 202513 min
Understanding your agent’s behavior in production

Understanding your agent’s behavior in production

You can’t improve what you can’t see, so start tracking every decision your agent makes.

GuidesSep 15, 202516 min
How can agentic capabilities be deployed in production today?

How can agentic capabilities be deployed in production today?

A practical guide to deploying agentic capabilities: what works, what doesn’t, and how to keep it reliable in prod.

GuidesSep 7, 20259 min
Partnering with Composio to Help You Build Better AI Agents

Partnering with Composio to Help You Build Better AI Agents

Building AI agents is 10x easier with 10,000+ tools and built-in LLM tooling support

GuidesAug 12, 20255 min
OpenAI o3 vs gpt-oss 120b

OpenAI o3 vs gpt-oss 120b

Just another eval confirming 90% discount with highest performance from GPT-OSS 120b.

Model ComparisonsAug 6, 20258 min
How to craft effective prompts

How to craft effective prompts

A curated list of best practices, techniques and practical advice on how to get better at prompt engineering.

GuidesAug 5, 202517 min
Subliminal Learning in LLMs

Subliminal Learning in LLMs

LLMs carry hidden traits in their data and we have no idea how.

GuidesJul 27, 20256 min
Introducing Vellum Agent Builder

Introducing Vellum Agent Builder

Go from idea to AI workflow in seconds and continue to build in the UI or your IDE.

Product UpdatesJul 18, 20254 min
Introducing Custom Docker Images & Custom Nodes

Introducing Custom Docker Images & Custom Nodes

Complete control over the business logic and runtime of your AI workflows in Vellum.

Product UpdatesJul 15, 20256 min
Big Ideas from the AI Engineer World’s Fair

Big Ideas from the AI Engineer World’s Fair

What’s shaping AI products, agents, and infrastructure in 2025.

LLM basicsJun 8, 202512 min
10 Humanloop Alternatives in 2025

10 Humanloop Alternatives in 2025

A side-by-side look at Humanloop and 10 other LLM platforms.

GuidesJun 3, 202516 min
Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro

Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro

Analyzing the difference in performance, cost and speed between the world's best reasoning models.

Model ComparisonsMay 23, 20259 min
How to connect a Vellum AI Workflow with your Lovable app

How to connect a Vellum AI Workflow with your Lovable app

Build a functional chatbot using Vellum AI Workflows and Lovable with just a few prompts.

GuidesMay 13, 20256 min
How to evaluate an LLM evaluation framework

How to evaluate an LLM evaluation framework

A quick guide to picking the right framework for testing your AI workflows.

GuidesApr 24, 20257 min
Evaluating models on adaptive reasoning, SAT questions & real-world classification tasks

Evaluating models on adaptive reasoning, SAT questions & real-world classification tasks

Evaluating SOTA models if they can really reason

UncategorizedApr 14, 20252 min
Four Reasons Enterprise AI Projects Get Stuck

Four Reasons Enterprise AI Projects Get Stuck

A wake up call to not underestimate the unique challenges of working with LLMs.

GuidesApr 14, 20255 min
MCP: The Hype vs. Reality

MCP: The Hype vs. Reality

LLMs are stepping outside the sandbox. Should you let them?

GuidesApr 9, 20255 min
How Drata built an enterprise-grade AI solution with Vellum

How Drata built an enterprise-grade AI solution with Vellum

See how Drata leveraged Vellum to build enterprise-grade AI workflows that enhance GRC automation.

Customer StoriesMar 18, 20256 min
Native integration with IBM’s Granite models

Native integration with IBM’s Granite models

Support for IBM granite models in Vellum.

Product UpdatesMar 1, 20252 min
GPT-4.5 vs Claude 3.7 Sonnet

GPT-4.5 vs Claude 3.7 Sonnet

Comparing GPT-4.5 and Claude 3.7 Sonnet on cost, speed, SAT math equations, and adaptive reasoning skills.

Model ComparisonsFeb 28, 20259 min
GPT 4.5 is here: Better, but not the best

GPT 4.5 is here: Better, but not the best

Feels more natural, hallucinates less, can be persuaded—and it’s not a game-changer.

GuidesFeb 27, 20257 min
Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1

Claude 3.7 Sonnet vs OpenAI o1 vs DeepSeek R1

Learn how the latest Anthropic's model compares to similar top-tier reasoning models on the market.

Model ComparisonsFeb 25, 20259 min
How RelyHealth Deploys Healthcare AI Solutions 100x Faster

How RelyHealth Deploys Healthcare AI Solutions 100x Faster

Learn how Vellum enables Rely Health to rapidly build, test, and deploy AI-powered patient care solutions.

Customer StoriesFeb 20, 20256 min
How Revamp Reliably Runs 15M+ LLM Executions in Production

How Revamp Reliably Runs 15M+ LLM Executions in Production

Learn how to optimize prompt versioning, debug efficiently, and make real-time updates to boost AI performance.

Customer StoriesFeb 10, 20256 min
Claude 3.7 Sonnet: Can It Actually Reason?

Claude 3.7 Sonnet: Can It Actually Reason?

Evaluating the 'thinking' of Claude 3.7 Sonnet and other reasoning models to understand how they really reason.

GuidesJan 30, 202513 min
Analysis: OpenAI o1 vs DeepSeek R1

Analysis: OpenAI o1 vs DeepSeek R1

Explore how O1 and R1 perform on well-known reasoning puzzles—now tested in new contexts.

Model ComparisonsJan 30, 20259 min
Breaking down the DeepSeek-R1 training process—no PhD required

Breaking down the DeepSeek-R1 training process—no PhD required

Learn how DeepSeek achieved OpenAI o1-level reasoning with pure RL and solved issues through multi-stage training.

GuidesJan 24, 202511 min
What to do when an LLM request fails

What to do when an LLM request fails

Rate limiting and downtime are common issues with LLMs — here’s how to manage it in production.

GuidesDec 16, 20247 min
Llama 3.3 70b vs GPT-4o

Llama 3.3 70b vs GPT-4o

Learn how the latest model from Meta, Llama 3.3 70b compares to GPT-4o on three tasks

Model ComparisonsDec 10, 20248 min
Native support for SambaNova inference in Vellum

Native support for SambaNova inference in Vellum

Now you can run Llama 3.1 405b, with 200 t/s via SambaNova on Vellum!

Product UpdatesDec 9, 20242 min
AI Development Survey: Help us build the ultimate AI changelog

AI Development Survey: Help us build the ultimate AI changelog

Share your AI process in our 4-minute anonymous survey. Get early insights and a chance to win a MacBook M4 Pro.

LLM basicsNov 25, 20243 min
Announcing Native Support for Cerebras Inference in Vellum

Announcing Native Support for Cerebras Inference in Vellum

Starting today, you can unlock 2,100 t/s with Llama 3.1 70B in Vellum for real-time AI apps.

Product UpdatesOct 24, 20244 min
How Glowing Personalized Hospitality Experiences with AI

How Glowing Personalized Hospitality Experiences with AI

Discover how Glowing leverages Vellum's Workflows to create innovative AI solutions for the hospitality industry.

Customer StoriesOct 1, 20245 min
OpenAI o1: Prompting Tips, Limitations, and Capabilities

OpenAI o1: Prompting Tips, Limitations, and Capabilities

Learn how to prompt OpenAI o1 models, understand their limits and the opportunities ahead.

GuidesSep 13, 20246 min
LLM Benchmarks: Overview, Limits and Model Comparison

LLM Benchmarks: Overview, Limits and Model Comparison

Understand the latest benchmarks, their limitations, and how models compare.

GuidesSep 11, 202412 min
How Woflow Decoupled AI Updates for 50% Faster Delivery — Without the Infra Stress

How Woflow Decoupled AI Updates for 50% Faster Delivery — Without the Infra Stress

Learn how Woflow sped up AI development by 50% — making it easier to handle errors, improve models and ship updates.

Customer StoriesSep 10, 20247 min
How this EdTech Company Made AI Development 10x Faster with Vellum

How this EdTech Company Made AI Development 10x Faster with Vellum

Explore how a leading EdTech company saves 50 eng hours per month and empowers everyone on the team to contribute.

Customer StoriesAug 28, 20247 min
The 6 Stages for Successful AI Implementation

The 6 Stages for Successful AI Implementation

Learn critical strategies to build and launch AI systems quickly and reliably.

GuidesAug 20, 202410 min
How Vellum Helped Odyseek Build Smarter AI Faster

How Vellum Helped Odyseek Build Smarter AI Faster

Learn how Odyseek used Vellum to simplify AI development and improve team collaboration.

Customer StoriesAug 16, 20244 min
Llama 3.1 405b vs Leading Closed-Source Models

Llama 3.1 405b vs Leading Closed-Source Models

Discover How Llama 3.1 405b Stacks Up Against GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet on Three Tasks

Model ComparisonsJul 26, 20248 min
Evaluation: Llama 3.1 70B vs. Comparable Closed-Source Models

Evaluation: Llama 3.1 70B vs. Comparable Closed-Source Models

Explore Llama 3.1 70b's upgrades and see how it stacks up against same-tier closed-source models.

Model ComparisonsJul 24, 20248 min
Claude 3.5 Sonnet vs GPT-4o

Claude 3.5 Sonnet vs GPT-4o

Learn how Claude 3.5 Sonnet compares to GPT4o on data extraction, classification and verbal reasoning tasks.

Model ComparisonsJun 25, 20249 min
Llama 3 70B vs GPT-4: Comparison Analysis

Llama 3 70B vs GPT-4: Comparison Analysis

Find out how Llama 3 70B stacks up against GPT-4 in terms of cost, speed, and performance on specific tasks.

Model ComparisonsMay 8, 202412 min
Rentgrata's Test Driven Journey to a Production-Ready Chatbot

Rentgrata's Test Driven Journey to a Production-Ready Chatbot

Learn how Rentgrata used Vellum to evaluate their chatbot, and cut development time in half.

Customer StoriesMay 2, 20244 min
LlamaIndex vs LangChain Comparison

LlamaIndex vs LangChain Comparison

Discover what are the main differences between LangChain and LlamaIndex, and when to use them.

GuidesMay 1, 202413 min
RAG vs Fine-Tuning: How to Choose the Right Technique?

RAG vs Fine-Tuning: How to Choose the Right Technique?

Learn how RAG compares to fine-tuning and the impact of both model techniques on LLM performance.

GuidesApr 30, 202410 min
Tutorial: Setting Up OpenAI Function Calling with Chat Models

Tutorial: Setting Up OpenAI Function Calling with Chat Models

Learn how to use OpenAI function calling in your AI apps to enable reliable, structured outputs.

GuidesApr 23, 20246 min
How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

How Autobound Achieved a 20x Faster End-to-End LLM Iteration Cycle

Iterating on prompts using OpenAI's playground & Azure AI studio was challenging, until Autobound discovered Vellum.

Customer StoriesApr 11, 20244 min
Redfin's Test Driven Development Approach to Building an AI Virtual Assistant

Redfin's Test Driven Development Approach to Building an AI Virtual Assistant

Discover how Redfin used Vellum to develop and evaluate a production-ready AI assistant, now live in 14 markets.

Customer StoriesApr 9, 20247 min
How to Count Tokens Before you Send an OpenAI API Request

How to Count Tokens Before you Send an OpenAI API Request

Learn how to use Tiktoken and Vellum to programmatically count tokens before running OpenAI API requests.

GuidesMar 27, 20246 min
Getting Started with Prompt Chaining

Getting Started with Prompt Chaining

Learn how to improve LLM outputs, and make your setup more reliable using prompt chaining.

GuidesMar 26, 20245 min
How to Evaluate Your RAG System?

How to Evaluate Your RAG System?

Learn how to use retrieval and content generation metrics to consistently evaluate and improve your RAG system.

GuidesMar 8, 20245 min
How can I get GPT-3.5 Turbo to follow instructions like GPT-4?

How can I get GPT-3.5 Turbo to follow instructions like GPT-4?

Learn prompt engineering tips on how to make GPT-3.5 perform as good as GPT-4.

GuidesFeb 15, 202410 min
How Lavender cut latency by half for 90K monthly requests in production

How Lavender cut latency by half for 90K monthly requests in production

Learn how Lavender develops and manages more than 20 LLM features in production.

Customer StoriesFeb 13, 20245 min
Prompt Engineering Guide for Claude Models

Prompt Engineering Guide for Claude Models

Learn how to prompt Claude with these 11 prompt engineering tips.

GuidesFeb 2, 202410 min
How Codingscape improved time-to-market for their AI apps

How Codingscape improved time-to-market for their AI apps

Learn how Vellum helped Codingscape to ship AI apps quicker and win more projects.

Customer StoriesFeb 1, 20245 min
How can I use LLMs to classify user intents for my chatbot?

How can I use LLMs to classify user intents for my chatbot?

Learn how to build and evaluate intent handler logic in your chatbot workflow

GuidesJan 11, 20246 min
3 Strategies to Reduce LLM Hallucinations

3 Strategies to Reduce LLM Hallucinations

Methods and techniques to reduce hallucinations and maintain more reliable LLMs in production.

GuidesJan 3, 20247 min
Four LLM hallucinations and ways to fix them

Four LLM hallucinations and ways to fix them

What is LLM hallucination & the four most common hallucination types and the causes for them

GuidesJan 1, 20245 min
Classifying Customer Tickets using Gemini Pro

Classifying Customer Tickets using Gemini Pro

Comparing the performance of Gemini Pro with zero and few shot prompting when classifying customer support tickets

GuidesDec 20, 20234 min
Best Model for Text Classification: Gemini Pro, GPT-4 or Claude2?

Best Model for Text Classification: Gemini Pro, GPT-4 or Claude2?

Comparing GPT3.5 Turbo, GPT-4 Turbo, Claude, and Gemini Pro on classifying customer support tickets.

Model ComparisonsDec 13, 20238 min
Tree of Thought Prompting: What It Is and How to Use It

Tree of Thought Prompting: What It Is and How to Use It

Learn how to use Tree of Thought prompting to improve LLM results

GuidesNov 30, 20235 min
User Confidence in OpenAI vs. Alternative models/providers

User Confidence in OpenAI vs. Alternative models/providers

Discover how recent OpenAI developments have influenced user confidence and interest in OpenAI alternatives

GuidesNov 28, 20238 min
First impressions with the Assistants API

First impressions with the Assistants API

Assistants API: Easy assistant setup with memory management - but what's under the hood?

GuidesNov 16, 20238 min
The ABC’s of Multimodal AI: Models, tasks and use-cases

The ABC’s of Multimodal AI: Models, tasks and use-cases

How to use Multimodal AI models to build apps that solve new tasks and offer unique experiences for end users.

GuidesNov 6, 20237 min
Automatic data labeling with LLMs

Automatic data labeling with LLMs

LLMs can label data at the same or better quality compared to human annotators, but ~20x faster and ~7x cheaper.

GuidesNov 2, 20239 min
How Narya's team uses Vellum for auto data labeling & deployments

How Narya's team uses Vellum for auto data labeling & deployments

Learn how Vellum helped Narya.AI save time and make AI easy for everyone on their team.

Customer StoriesOct 25, 20234 min