项目作者: jcsilva

项目描述 :
Benchmark of industrial Speech Recognition systems for Brazilian Portuguese
高级语言: Python
项目地址: git://github.com/jcsilva/asr-benchmark.git
创建时间: 2017-02-04T20:18:57Z




The goal here is to evaluate some automatic speech recognition (ASR) systems for
Brazilian Portuguese (although the tools developed here may be used to evaluate
ASR systems in any language). The databases that were used are public and may be freely
downloaded by anyone. The designed setup may be reproduced and the results may be
confirmed by anyone who wants.

Download databases

  • LapsBenchMark1.4:
    wget http://www.laps.ufpa.br/falabrasil/files/LapsBM1.4.rar

  • Voxforge:
    wget -r -nH -nd -np -R index.html* http://www.repository.voxforge1.org/downloads/pt/Trunk/Audio/Original/48kHz_16bit/

After downloading you must downsample the databases to 16000 Hz and 8000 Hz.
It can be done with any tool you want. A good one is sox.

LapsBenchMark1.4 has 700 files. Voxforge has many more files, but in this benchmark,
700 audio files were randomly sampled from this database and used in the evaluation.
The chosen files are listed in data/voxforge-{8k,16k}.txt.


You will need Python 3 to run the benchmark scripts. And, optionally, you may
use some scripts I wrote in Bash to process the transcriptions generated by the
benchmark scripts.

I use Anaconda to deal with Python
dependencies, which, in this case, were watson-developer-cloud and python-dotenv.

For creating my environment, I did:

  1. conda create -n asr python=3.5
  2. source activate asr
  3. pip install --upgrade watson-developer-cloud
  4. pip install python-dotenv
  5. pip install SpeechRecognition

In this benchmark, word error rate (WER) and sentence error rate (SER) will be
evaluated and you will need a tool to measure them.
The sclite, included in
NIST Speech Recognition Scoring Toolkit
may be used for this purpose. Another equivalent tool is the
compute-wer from kaldi toolkit. I used
this last one just because I had kaldi installed in my machine.

You will also need to create some credentials to access IBM and Microsoft speech API.
You must go to IBM Bluemix and
Microsoft Bing
to get your keys.

After grabbing your keys, create a .env file in the scripts directory
with the following variables and theirs values:


BLUEMIX_USERNAME and BLUEMIX_PASSWORD are keys necessary for running IBM
benchmark. The other 3 keys are only necessary to run Microsoft benchmark.


  1. source activate asr
  2. python scripts/ibmASR.py 16000 data/laps-16k.txt > results/ibm-laps-16k.tra
  3. python scripts/ibmASR.py 8000 data/laps-8k.txt > results/ibm-laps-8k.tra
  4. python scripts/ibmASR.py 16000 data/voxforge-16k.txt > results/ibm-voxforge-16k.tra
  5. python scripts/ibmASR.py 8000 data/voxforge-8k.txt > results/ibm-voxforge-8k.tra
  6. python scripts/microsoftASR.py 16000 data/laps-16k.txt > results/microsoft-laps-16k.tra
  7. python scripts/microsoftASR.py 8000 data/laps-8k.txt > results/microsoft-laps-8k.tra
  8. python scripts/microsoftASR.py 16000 data/voxforge-16k.txt > results/microsoft-voxforge-16k.tra
  9. python scripts/microsoftASR.py 8000 data/voxforge-8k.txt > results/microsoft-voxforge-8k.tra
  10. python scripts/googleASR.py data/laps-16k.txt > results/google-laps-16k.tra
  11. python scripts/googleASR.py data/laps-8k.txt > results/google-laps-8k.tra
  12. python scripts/googleASR.py data/voxforge-16k.txt > results/google-voxforge-16k.tra
  13. python scripts/googleASR.py data/voxforge-8k.txt > results/google-voxforge-8k.tra
  14. ./scripts/buildLapsHyp.sh results/ibm-laps-16k.tra > hypotheses/ibm-laps-16k.hyp
  15. ./scripts/buildLapsHyp.sh results/ibm-laps-8k.tra > hypotheses/ibm-laps-8k.hyp
  16. ./scripts/buildVoxforgeHyp.sh results/ibm-voxforge-8k.tra > hypotheses/ibm-voxforge-8k.hyp
  17. ./scripts/buildVoxforgeHyp.sh results/ibm-voxforge-16k.tra > hypotheses/ibm-voxforge-16k.hyp
  18. ./scripts/buildLapsHyp.sh results/microsoft-laps-16k.tra > hypotheses/microsoft-laps-16k.hyp
  19. ./scripts/buildLapsHyp.sh results/microsoft-laps-8k.tra > hypotheses/microsoft-laps-8k.hyp
  20. ./scripts/buildVoxforgeHyp.sh results/microsoft-voxforge-8k.tra > hypotheses/microsoft-voxforge-8k.hyp
  21. ./scripts/buildVoxforgeHyp.sh results/microsoft-voxforge-16k.tra > hypotheses/microsoft-voxforge-16k.hyp
  22. compute-wer --mode=present ark:references/laps.ref ark:hypotheses/ibm-laps-16k.hyp
  23. compute-wer --mode=present ark:references/laps.ref ark:hypotheses/ibm-laps-8k.hyp
  24. compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/ibm-voxforge-16k.hyp
  25. compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/ibm-voxforge-8k.hyp
  26. compute-wer --mode=present ark:references/laps.ref ark:hypotheses/microsoft-laps-16k.hyp
  27. compute-wer --mode=present ark:references/laps.ref ark:hypotheses/microsoft-laps-8k.hyp
  28. compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/microsoft-voxforge-16k.hyp
  29. compute-wer --mode=present ark:references/voxforge.ref ark:hypotheses/microsoft-voxforge-8k.hyp


Results shown in terms of WER (Word Error Rate) and SER (Sentence Error Rate).

Database IBM Microsoft
Laps 16 kHz %WER 13.59 [ 982 / 7228, 110 ins, 217 del, 655 sub ]
%SER 64.14 [ 449 / 700 ]
%WER 15.88 [ 1148 / 7228, 96 ins, 248 del, 804 sub ]
%SER 68.00 [ 476 / 700 ]
Laps 8 kHz %WER 13.89 [ 1004 / 7228, 106 ins, 242 del, 656 sub ]
%SER 64.57 [ 452 / 700 ]
%WER 16.03 [ 1159 / 7228, 97 ins, 248 del, 814 sub ]
%SER 67.29 [ 471 / 700 ]
Voxforge 16 kHz %WER 31.23 [ 1067 / 3417, 134 ins, 313 del, 620 sub ]
%SER 54.74 [ 375 / 685 ]
%WER 18.28 [ 616 / 3370, 46 ins, 186 del, 384 sub ]
%SER 39.73 [ 269 / 677 ]
Voxforge 8 kHz %WER 28.62 [ 995 / 3477, 115 ins, 284 del, 596 sub ]
%SER 53.58 [ 374 / 698 ]
%WER 18.05 [ 611 / 3385, 46 ins, 197 del, 368 sub ]
%SER 39.21 [ 267 / 681 ]

These are the results in 5/february/2017. The systems may be upgraded along the
time and these rates may change.