Creating an AI chatbot involves more than just using a model API and adding context with your data.
You should also consider the various intents users might have and how to manage them.
In this blog post, we'll outline the steps to set up and evaluate intent detection in your chatbot workflow.
Intent detection is a task that identifies a user's intention or desired outcome from their query. This task is essential for AI chatbots to provide accurate and relevant responses.
A very standard intent detection workflow uses a pre-trained LLM model, a prompt with instructions and context, and a handler logic for each of the intents.
Let’s look at a real example to illustrate the need for this step in a chatbot workflow.
Imagine that you’re building a customer service chatbot for an e-commerce business. Your visitors can ask the chatbot any question. Before the chatbot responds or takes any action, it needs to accurately understand the visitor's intent.
To keep this example simple, lets say that you have defined 5 intent categories: "Order Status", "Product Information", "Payments", "Returns" and "Feedback”.
For each category, there should be a distinct step where the LLM powered chatbot, figures out the user's intent. It does this by placing the user's question into the right category. After identifying the intent, the chatbot can then take the next appropriate actions for that particular category.
Having separate steps for the prompts and intent handlers is useful because each of your intents might need to do different actions. For example: “Returns” might need to be handled by an external service/API that a handler action should call, and the handler for “Product information” might just call an LLM and a context doc to answer with text response. Also, adding too many instructions in one prompt can also influence the performance.
Identifying these intents accurately allows the chatbot to respond better, call an external API or route the query to the correct personnel for further assistance.
In the next section we show you how to implement it.
To build a reliable intent detection for your chatbot you need to cover 4 critical steps:
- Defining intents
- Setting up the intent detection prompt
- Setting up handler logic prompts
- Testing and evaluating prompts/models
We give more details on these in the following sections.
Identify the main reasons users interact with your chatbot, like asking for help or making a purchase.
Group these reasons into categories, or 'intents', using insights from customer interactions and FAQs.
After you have your intents, you should start drafting the system prompt that will be used to classify the user’s query. Make sure to give clear directions, and follow best prompt engineering practices. Here’s a simple example for the system prompt:
* For more reliable outputs you can also consider using function calling with your models, so that you always get a structured response from the model, one that can be used to run specific functions in your code. Learn how to set it up here.
Don’t forget to add a fallback option!
Did you notice that we added a fallback intent “Other” in the system prompt?
Adding fallback prompts is essential for handling situations where the chatbot fails to understand or correctly classify a user's intent.
Fallback prompts act as a safety net to keep users engaged, even when their query isn't a clear match. They can involve clarifying questions, rephrasing the query, or offering human assistance.
For each intent, you need to develop a response mechanism, and decide if the chatbot should perform an action like calling an API to another tool/service, or to just provide a text response.
To implement the handler logic you’ll need to build a more complex LLM chain, and use other prompts to provide text responses or to call an API to perform a specific action with external tools and services.
Testing and evaluating prompts across test cases is crucial for building reliable chatbots.
Before you push it to production, you need to be sure that your intent classifiers and handlers are working properly. You should test every intent path with various user queries and evaluate the performance of different prompt and model combinations.
For example, we recently did an experiment where we used four models and few-shot prompts to classify if a customer support ticket has been resolved or not. We had around 200 test cases, and used Vellum to evaluate our configuration at scale. Below you can see how that looked like in the product.
Let us help!
Building an intent classifier is not just a one-time setup; it’s a continuous process that requires extensive evaluation and monitoring once in production.
If you’re planning to build your custom chatbot and need assistance with the setup and evaluation, we can help.
Vellum’s platform for building production LLM apps can help you build a reliable chatbot. We provide the tooling layer to experiment with prompts and models, evaluate at scale, monitor them in production, and make changes with confidence if needed.
If you’re interested, you can book a call here. You can also subscribe to our blog and stay tuned for updates from us.
Latest AI news, tips, and techniques
Specific tips for Your AI use cases
No spam
Each issue is packed with valuable resources, tools, and insights that help us stay ahead in AI development. We've discovered strategies and frameworks that boosted our efficiency by 30%, making it a must-read for anyone in the field.
This is just a great newsletter. The content is so helpful, even when I’m busy I read them.
Experiment, Evaluate, Deploy, Repeat.
AI development doesn’t end once you've defined your system. Learn how Vellum helps you manage the entire AI development lifecycle.