Chat-bot testing
Chatbot testing - an examle.
I was asked about my experience to test (verify) a chatbot. Honestly, I didn’t have that experience before, and I decided to fix this small ‘mistake’.
The concept is simple we need a chatbot implementation and an approach to verify chatbot behavior. Note, this is just an experiment, the author did it by scratch for learning.
I found a good sample of a chatbot (article ‘A Chatbot in Python using nltk’).
Then we need somehow to prepare this code to work, which I did first in my local machine in Jupiter Lab with IPython kernel. When I finished with ‘raw’ versions of chatbots I created a virtual environment and added necessary files and the repository structure. After that, I added a few simple tests (used unittest testing framework) for each chatbot implementation. Made first run. Then added ‘boundary’ tests, and few negative tests to see how those implementations will behave for the same test data. Then I created necessary files to publish this repository in GitHub and made integration with Travis CI.
-- automaton-v17
|-- .gitignore
|-- .travis.yml
|-- LICENSE
|-- README.md
|-- requirements.txt
`-- data
|-- dialog_talk_agent.xlsx
`-- source
|-- bot.py
`-- tests
|-- test_bot.py
`-- screenshots
|-- failed.png
As you can see, in the current repository I put the data file into an independent folder. In a real application, it should be a database or API for data.
The Source folder contains the actual code for chatbots. Definitely this code can be improved and refactored. Some ideas I will explain in the Analysis and Enhancement sections.
And the third one is a folder that contains actual tests. Those autotests should be improved and refactored too.
Two last files in the root of this repository are configurations, one file for Python to take care of external libraries and dependencies, and the second one configuration for Travis CI to execute tests. That is all about structure.
The approach is pretty forward, our chatbot is a simple app (or function) that gets a sentence (a parameter) and returns the response as a sentence (an answer, a question, etc.). As you know I used the unittest library to create simple tests like request -> response
because our chatbots only have input as a sentence and output as a response. Kind of ‘black box testing’, which is not true, we know implementation and how it works, but for this experiment it is ok.
We may verify that our chatbots are working, they are answering to our requests, but there is a big question - are these answers correct?
Tests are simple. I made two classes for tests. Each test class verifies specific chatbot implementation. In test class you may see tests that are driven by data (or dataset). For simplification I divided them by responses - greeting, question, bye, and negative.
The first three responses are a happy path. Most interesting is the negative test ‘suite’. It contains 7 negative cases: the empty input, the input as an integer value, the input as a negative integer presented as a string value, special characters as the input, and the last three as a string (greeting) in different languages (Russian, Spanish, Chinese).
After a few runs I realized, first chatbot, BOW implementation, for some reasons has 2 failures. One is an Error - an exception for a negative test where I put integer number into chatbot (not a string value which represents an integer), and the second one a Failure - by some reason a test gets a different response from chatbot then expected. Check live example in Travis CI here
The second chatbot, tfidf implementation, passes all tests but it doesn’t mean this implementation has no errors!
Chatbots: Fix chatbot bow (all tests should pass!). Code refactoring. Use the OOP approach.
Tests: Make them data-driven. Use presets. Add more tests for better coverage.
In the first look, we may say - testing a chatbot is simple. But:
Probably I missing some questions, but these three are major for me. IMHO
P/S: If you see an error or would like to ask any question feel free to reach out me ;)