Speechrecognition for naobot
This project is developed and maintained by two students of TU Berlin.
It was created in the context of the lecture RoboCup.
The main goal was to show some of the capabilities of NAO developed by Aldeberan Robotics.
In this small documentation you’ll get an overview about this project.
To understand the purpose and how-to use the code we recommend reading carefully each section of this documentation.
As mentioned this project was initially created in the context of the lecture “RoboCup” of TU Berlin in summerterm 2018.
Purpose was to program NAO - a humanoid programmable robot which is developed by the french company Aldeberan Robotics.
Therefore the code is completely written in Python and is using the API of NAO called NAOqi.
The main idea of our project is an easy-to-interact NAO with enabled speech recognition to control the robot and voice feedback.
NAO should also be able to recognize known and unknown persons and give some feedback if such an event happens.
We call this “The greeting Agent”.
To show our project goal we’ve implemented a special function called “rundemo”.
When running this function NAO will try to scan his closer environment to identify real persons around him.
Therefore he’s standing up and looking from right to left.
After that he’s turing 90 degrees to the left and start looking again from right to left.
This behaviour is repeated until he finished a full 360 degree circle.
Each time he’s recognizing a person in his closer environment he tries to identify the person with the data returned by a picture taken by NAOs camera.
Two different options are possible:
In this manner NAO is nearly able to identify all persons in his closer environmet
We used mainly following API features:
We also wanted to use ALSpeechRecognition for the speech recognition ability.
Sadly this dosen’t work on the image of our NAO so we had to implement a workaround.
Instead of ALSpeechRecognition we used the free online google service “speech-to-text”.
It works quite well. You send an audiostream to the service and you get in return a string of the analyzed stream.
For detailed information about the used functions of NAOqi please refer to the provided links:
Detailed information about the used google service “speech-to-text”:
Before you can run the code you need to install the prerequisite listed in the following section.
Firstly you need a running Python environment with configured NAOqi framework. A detailed installation guide is provided in the aldeberan documentation. We used a Linux “Ubuntu 16.04” running in a virtual machine.
As workaround for the speech recogognition we used the Python library SpeechRecognition in version 3.8.1
The installation guide is provided here
To use the google service you need to provide an active internet connection when running the code.
To start the program you’ve to run following command in a terminal:
Python project.py <IP-Adress of NAO>
When the project is running you can use following commands to control NAO: