How to build your own chatbot

With a touch of Huggingface and Cloud run

The use of chatbots has increased drastically in recent years and almost all major companies use some form of chatbot in their business, and it is not hard to understand why this is the case. Since BERT was delivered by google, the performance of NLP models has seen impressive progress. The latest state-of-the-art model GPT-3 has a breathtaking 175 billion parameters in its model, which makes interactions with the model almost indistinguishable from a human being.

So in this article I would like to demonstrate how to build your own chatbot and have it wrapped as an API service.

The project is divided into two parts. First we need to have a trained chat model that can make predictions, then a server that can handle requests to our model.

There are several models available that can be used to build a chatbot. We will use blenderbot, created by facebook’s research team in 2019. The model is an open ended chatbot, which means that it is not created to handle certain given actions in the backend, which is the more common use case for businesses. Instead this model is only focused on imitating human communication.

We will use a small implementation of the model with no additional fine tuning. However this is heavily desirable for a more sophisticated chatbot.

We start by creating a file that we will use to download the model. To help us, we use Huggingface, a python library that provides various high quality NLP models.

Then we create a python class that we will use to handle the logic from converting our english text to create our word tokens that we will use as inputs for our model.

We then build a Flask API with two endpoints, one for checking if the service is working and one for integrating with our chatbot.

Finally we generate a Dockerfile that when being built will pre-download the chat model so that when we send request to our API it can make quick responses, instead of reloading the model every single time. This will drastically improve the performance of our bot. To host the API we use gunicorn as our wsgi server with no additional web server framework.

From our local machine to production

The steps from running your model on your local machine to have it running in production can see daunting. However several services have done this step a lot easier in recent years.

We are going to work with google cloud run for this project. Googles “serverless” platform, I don’t like the word serverless since of course there has to be a server running the code, but it is serverless in the sense that it doesn’t save any client data from session to another session and that we get whatever server is available at any given time.

In order to run our code on google cloud run we have to provide it with our docker image that we will create with the build command. Make sure to execute the command in the same folder where the Dockerfile is located.

docker build -t <docker_name> .

Next we need to push our image to google container registry. This could be done directly in the web GUI or as we will do here, via the gcloud SDK.

gcloud builds submit --tag<PROJECT-ID>/<docker_name>

Now, we are almost ready to go. The last step is to create our cloud run service, which again could be done with either the GUI or the cloud SDK. Here we specify that we want two cpu with 4G ram in each container running our docker image.

cloudrun_example$ gcloud run deploy chatbot \$PROJECT_ID/chatbot:latest \
--platform=managed \
--concurrency=1 \
--cpu=2 \
--memory=4G \

If you set it up correctly you will receive an URL with the address to your service. Like the one below for our chatbot.

Now, let’s try sending a CURL request to our chatbot.

Hi, how are you?

curl --location --request POST '' \-H 'Content-Type: application/json' \--data-raw '{"lastConversations": ["Hi, how are you?"]}'

{“botResponse”: “i’m good. just got back from a long day at work. how are you?”}

There we go!

We managed to create a high performance general purpose chatbot with an associated API.

Full code can be found here:

Efficient data guesser