项目作者: Prem-kumar27

项目描述 :
Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler
高级语言: Python
项目地址: git://github.com/Prem-kumar27/Fast-KTSpeechCrawler.git
创建时间: 2020-04-17T09:53:04Z
项目社区:https://github.com/Prem-kumar27/Fast-KTSpeechCrawler

开源协议:MIT License

下载


KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

Google Colab

https://colab.research.google.com/drive/1JVKzB9N2FIcxlib1kXuGlfeIuudkM9Vr

Installation

  1. git clone https://github.com/EgorLakomkin/KTSpeechCrawler
  2. pip install -r requirements.txt

Running crawler

  1. chmod a+x ./crawler/en_corpus.sh
  2. ./crawler/en_corpus.sh <dir_with_intermediate_results> <dir_for_resulting_samples>

Downloading a Playlist

./download_playlist.sh

Browsing samples

  1. python server.py --corpus <dir_for_resulting_samples>
  2. Goto: http://localhost:8888/

Citation

@article{lakomkin2018kt,
title={KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos},
author={Lakomkin, Egor and Magg, Sven and Weber, Cornelius and Wermter, Stefan},
journal={EMNLP 2018},
pages={90},
year={2018}
}