AI | Dialog Systems Part 6: How to Train User Intent Classifier

In Part 5 of this series on Dialog Systems, we have shown you how simple operations with word vectors can help in finding different words that have similar meaning, among other things. You have also learned that if you are fine with the “default” meaning of words built in the pretrained Word2vec models, you can simply download and use them.

In this part of the series, we will return to the topic from Part 3 related to building your own classifier to extract the meaning from a user’s natural language input.

A programmer in his workplace.

A programmer in his workplace. Image credit: StockSnap via Pixabay, free license

In case you missed the first four articles, you may be interested in reading the earlier posts, before starting with the sixth part:

How to Make Your Customer Happy by Using a Dialog System?

AI | Dialog Systems Part 2: How to Develop Dialog Systems That Make Sense

AI | Dialog Systems Part 3: How to Find Out What the User Needs?

AI | Dialog Systems Part 4: How to Teach a Machine to Understand the Meaning of Words?

AI | Dialog Systems Part 5: How to Use Pretrained Models in Your NLP Pipeline

More than just a Keyword Matcher

As we already discussed in Part 3 of this series, a dialog system needs a classifier to extract the meaning of a user’s intent from their natural language input. By discovering this intent, the dialog system has the knowledge required to select the best response to the user.

Let’s consider the following scenario. Suppose we have an expanding retail chain that wants to lower the burden on its support staff. The firm considers developing a self-service dialog system to take care of the most frequently asked questions from their customers. These questions are usually related to the opening and closing hours of the different stores as well as their locations. In addition, people often request scheduling an appointment, or ask about submitting their job application.

Now, if you recall what we learned in Part 3, a dialog system embraces an NLU module to detect meaning in the user’s utterance. The NLU module consists of two parts: Intent Classifier and Entity Identifier. The former uses machine learning to detect meaning for the utterance as a whole by sifting through the verbs. Meanwhile, the mission of the Entity Identifier is to examine noun phrases in search for subsets of the utterance with a definite meaning (dates, locations, etc.). You can find the entities using either machine learning, or handcrafted rules. Sometimes it makes sense to use both.

Our focus in Part 6, however, is on classifying intents only. Previously we’ve learned that an intent is a normalized form of the meaning behind the user’s utterance (in other words, summarizing what the user wants to do). Normally, an intent is based upon some crucial verb that may not necessarily be explicitly incorporated in the utterance.

For example, the utterance “I’ve been locked out of my account” has the same meaning as “I’d like to reset my password”, even though the former does not mention the key word (the implied verb) of the #reset_password intent. So our classifier is expected to be smarter than a plain keyword matcher because of its ability to use contextual clues in the natural language to obtain the user’s intent. To start using a classifier, you will first need to train it by feeding in a bunch of examples and their corresponding intents. We will explain how this is done in upcoming articles of this series.

Training Your Classifier to Recognize Intents

Training is a crucial part of developing a dialog system. Good training boosts the intelligence of the dialog system and makes it more helpful to your users. Training determines virtually each individual success metric and is essential whatever kind of dialog system you are creating. So, let’s have a look what is required to kick off with training your dialog system.

Returning to our hypothetical example of the retail chain, how would we envision a typical scenario for constructing a training data set and using it to teach its dialog system?

Training data should always encompass some input coupled with anticipated output. In our case, a user utterance serves as the input, whereas a user intent acts as the output. For instance, valid pairs of training data for their dialog system might be: (“Where are you based?”, #store_location) and (“How do I reset my password?”, #reset_password).

To train a dialog system, you need to define the intents along with supplying a few examples for

each. With the aid of machine learning, the classifier in your dialog system then examines the training data for patterns it could later use to spot the categories in yet unseen user utterances.

How Do You Get Training Data?

The training process begins with discovering a good source you can collect training data from. It will aid in finding out what users want and how they ask for it. Suppose our retail chain learns that its help desk team has many requests from users to reset their passwords. The team could believe that users will usually request to reset their passwords, but upon inspecting the actual data they may discover that users favor asking to unlock their accounts. Of course, a dialog system will operate much better if you base its training on how users say things in reality rather than merely assuming about how they might do that.

Artificial Intelligence grasps all that it knows from training data. The data you provide for training should be as close as possible to what your dialog system will face in real life. Quality of the training data impacts how accurate the dialog system will be at recognizing requests of actual users. To get the best possible training data, the team eventually needs to determine three things:

  • categories of questions (i.e., intents) you are likely to get from users
  • relative frequency of the intents
  • the way these intents are stated

Suppose the team has already investigated the first two items. The picture below displays the results of this investigation by plotting the intents in 2D as a function of their relative frequency and the complexity in their automation.

The perfect place for obtaining actual user utterances would be another dialog system already in service for the same use case. So in the ideal scenario you would have access to real data that exactly matches the use case of the dialog system under development.

For example, if you want to replace a human-to-human text communication by a chat service automated through a dialog system, you can duly expect that users will be approaching it the same way they talk to humans. More often than not, however, your data sources won’t be perfectly aligned. Thus, you should be rather cautious about using the data you have as the training data for the new system.

In the case of our example retail chain, suppose it can take advantage of transcripts from its human-to-human call center. Here we have a typical situation involving some mismatch across the two communication channels, because humans speak differently to other humans than they communicate with machines.

Chats between humans tend to be rather verbose with frequent drifts off the point. In his book, Andrew Freed provides a very illustrative example of how a typical talk between humans differs from that between a human and a machine [Freed, 2021]:

Help Desk (Human):

“Thanks for calling the help desk, this is Andrew speaking. How can I help you?”

Dialog System:

“Welcome to the help desk. How can I help you?”


“Hey Andrew, thanks for taking my call – I hope you’re having a better day than I am. I’ve been trying all morning to get to my account. I’m getting an error message about my account being locked. I don’t know how to unlock it, I don’t know if I need to do something, or reset my password, or what. Can you help me?”


“My account is locked.”

Help Desk (Human):

“Sure, I can help with that. Let me check a few things.”

Dialog System:

“I’m sorry you can’t log in. Let’s try a few things to fix that.”


Dialogue 1:

Human-to-Human Call Transcript


Dialogue 2:

Human-to-Dialog System Typical Chat

In the figure above, Dialogue 1 originated from production data. Meanwhile, Dialogue 2 attempts to picture what the same user could say when connected to a dialog system. Guess what, the same person who’s just poured 60+ words to another human might instead cut their talk to merely four words when typing those to a chatbot.

Nobody could tell for certain how a particular human would articulate their utterance to the dialog system. We predict that some of the wording will be close leading to the same intent in general. However, it’s important to realize it wouldn’t be a good idea to train our dialog system on the original long-winded utterance.

To train your dialog system, you’d probably want to keep only these three assertions derived from Dialogue 1:

  • “I’m getting an error message about my account being locked.”
  • “I don’t know how to unlock my account.”
  • “Reset my password.”

All those assertions were pulled out straight from the user transcript, except for a minor modification in the second assertion, where “it” was substituted by explicit “my account”.

The important thing is each of the assertions above explicitly express the #reset_password intent.

Now, you might wonder why we should avoid training a dialog system on vague assertions that users make in real settings anyway. Dialog systems learn to identify intents via definite assertions. Assertions are regarded as vague if they fail to closely match a single intent. To resolve such issues, it’s best to ask the user simple follow-up questions as opposed to trying to teach the system confusing statements.

In Preparation to Build a Simple Classifier

OK, so from the first picture above we know that our retail chain’s dialog system will need to take care of over a dozen of intents. The dialog system thus requires an intent classifier capable of grouping utterances to more than a dozen classes.

To begin with, let’s first learn how to do classifications using just two categories:  #reset_password and “not #reset_password ”. Then, we will extend this task to three classes: #reset_password, #store_hours, and #store_location. Finally, a discussion will follow how to scale up the task to as many classes as the company needs.

It’s time to look inside our classifier. Consider the request “Can you help me to reset my password?” How is the classifier aware that this request has #reset_password as its intent as opposed to any other category of utterance?

From what you learned in earlier parts of this series, it’s not a mystery anymore for you that

machine learning does not process text directly. Instead, it crunches numbers. We should not oversimplify this to thinking in terms of zeros and ones only; we just realize that all the algorithms used have a numerical foundation. The figure below should help you visualize the idea for a numerical view of function f that takes some statement as its input to produce a prediction (class) at its output. This function f is nothing else, but our classifier.

It’s time for an in-depth demonstration of how a classifier works. We will implement a straightforward algorithm for classifying text and observe on the go how text is transformed into numbers and then backwards to text again. To get the most of this demonstration, however, requires some familiarity with the principles of machine learning. So we will return to this example in the next part of our series after we cover some basic material on neural nets first.

Wrapping Up

The sixth article in this series on Dialog Systems was devoted to exlaining what it takes to do good training of the user intent classifier in your dialog system and why this is important. You learned that appropriate training makes the dialog system smarter, so it can handle requests from your users expressed in various ways.

In the next part of the series, you will be introduced to the way a machine-learning model learns from examples of inputs and expected outputs. This should help you to understand the mechanism your intent classifier uses for mapping variously expressed user statements to a known category from the limited set of expected intents.

Author’s Bio

Darius Miniotas is a data scientist and technical writer with Neurotechnology in Vilnius, Lithuania. He is also Associate Professor at VILNIUSTECH where he has taught analog and digital signal processing. Darius holds a Ph.D. in Electrical Engineering, but his early research interests focused on multimodal human-machine interactions combining eye gaze, speech, and touch. Currently he is passionate about prosocial and conversational AI. At Neurotechnology, Darius is pursuing research and education projects that attempt to address the remaining challenges of dealing with multimodality in visual dialogues and multiparty interactions with social robots.


  1. Andrew R. Freed. Conversational AI. Manning Publications, 2021.
  2. Rashid Khan and Anik Das. Build Better Chatbots. Apress, 2018.
  3. Hobson Lane, Cole Howard, and Hannes Max Hapke. Natural Language Processing in Action. Manning Publications, 2019.
  4. Michael McTear. Conversational AI. Morgan & Claypool, 2021.
  5. Sumit Raj. Building Chatbots with Python. Apress, 2019.
  6. Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana. Practical Natural Language Processing. O’Reilly Media, 2020.