项目作者: xsthunder

项目描述 :
download youtube subtitles(closed caption, cc) as txt or json, support translation and proxy. available on PIP 🐍
高级语言: Jupyter Notebook
项目地址: git://github.com/xsthunder/download-youtube-subtitle.git
创建时间: 2020-03-16T08:58:28Z
项目社区:https://github.com/xsthunder/download-youtube-subtitle

开源协议:MIT License

下载


Updates:

3.0.0 fix download error and it finally supports download entire playlist! see Download the caption of entire playlist

Try it now!

try it online with google’s free python runtime! protip: you are able to download the output file from the sidebar! FREE from installation on your machine!

https://colab.research.google.com/drive/1oseD2yEsScx0YYOZ1x1F8GSG9iJ4x3qi?usp=sharing

图片

download-youtube-subtitle

Due to changes of youtube api, you need to UPGRADE to 3.0.0, see Install and Run

Download youtube subtitles(closed caption, cc) or srt as txt or json.

Features

  1. Support exportting translation at the same time which is useful for language study.
  2. Full control. All available caption will be displayed, use --caption_num --caption_num_second to choose the caption which will be displayed as original or translation transcript.
  3. Support proxy for youtube, follow the step at Using Anaconda behind a company proxy by setting environment-variables.
  4. Full test with traivis Build Status to make sure things are on rail.

python version of algolia/youtube-captions-scraper: Fetch youtube user submitted or fallback to auto-generated captions

Example

save as txt

dl-youtube-cc https://www.youtube.com/watch?v=wgNiGj1nGYE --translation 'ru'
or
dl-youtube-cc wgNiGj1nGYE --translation 'ru'

will saved as Version1.5SpecialProgramGenshinImpact.txt

  1. video_link https://www.youtube.com/watch?v=wgNiGj1nGYE
  2. original code="zh-Hans" name="Chinese (Simplified)"
  3. translation ru
  4. ---------00:00----------
  5. 从前,有一对双胞胎结伴在宇宙中旅行
  6. Давным-давно, два близнеца вместе путешествовали по Вселенной.
  7. ---------00:05----------
  8. 但有一天,他们前路遇阻
  9. Однажды путь им преградило неизвестное божество

save as json

dl-youtube-cc wgNiGj1nGYE --translation ru --to_json=True will saved as Version1.5SpecialProgramGenshinImpact.json

  1. {
  2. "original": [
  3. {
  4. "start": "0",
  5. "dur": "5056",
  6. "text": "从前,有一对双胞胎结伴在宇宙中旅行"
  7. },
  8. // continue
  9. ],
  10. "translation": [
  11. {
  12. "start": "0",
  13. "dur": "5056",
  14. "text": "Давным-давно, два близнеца вместе путешествовали по Вселенной."
  15. },
  16. // continue
  17. ],
  18. "merged": [
  19. {
  20. "start": "0",
  21. "dur": "5056",
  22. "text": "从前,有一对双胞胎结伴在宇宙中旅行",
  23. "translate_text": "Давным-давно, два близнеца вместе путешествовали по Вселенной."
  24. },
  25. // continue
  26. ]

use caption_num caption_num_second to get full control

All available caption will be displayed, use --caption_num --caption_num_second to choose the caption which will be displayed as original or translation transcript.

  1. >> dl-youtube-cc "wgNiGj1nGYE" --caption_num=0 --caption_num_second=3 --output_file="0,3-zh,fr.txt"
  2. INFO: available caption(s):
  3. INFO: #0 ✔ as original code="zh-Hans" name="Chinese (Simplified)"
  4. INFO: #1 ⭕ code="zh-Hant" name="Chinese (Traditional)"
  5. INFO: #2 ⭕ code="en-US" name="English (United States)"
  6. INFO: #3 ✔ as translation code="fr" name="French"
  7. INFO: #4 ⭕ code="de" name="German"
  8. INFO: #5 ⭕ code="id" name="Indonesian"
  9. INFO: #6 ⭕ code="pt" name="Portuguese"
  10. INFO: #7 ⭕ code="ru" name="Russian"
  11. INFO: #8 ⭕ code="es" name="Spanish"
  12. INFO: #9 ⭕ code="th" name="Thai"
  13. INFO: #10 ⭕ code="vi" name="Vietnamese"
  14. INFO: given by --caption_num default to 0 as original
  15. INFO: Save to 0,3-zh,fr.txt

Install and Run

Install via download-youtube-subtitle · PyPI

  1. pip install download-youtube-subtitle or pip install download-youtube-subtitle --user
  2. dl-youtube-cc -h

or uninstall to reinstall new version

pip uninstall download-youtube-subtitle -y

Run in CLI

Download the caption of one video

dl-youtube-cc -h will show the following.

  1. NAME
  2. dl-youtube-cc - download youtube closed caption(subtitles) by videoID
  3. SYNOPSIS
  4. dl-youtube-cc VIDEOID <flags>
  5. DESCRIPTION
  6. Examples:
  7. dl-youtube-cc -h # to see this helpful infomation
  8. dl-youtube-cc wgNiGj1nGYE --translation 'ru' # use russian translation, see ./lang_code for full list
  9. dl-youtube-cc wgNiGj1nGYE --caption_num=1 --translation 'ru' # choose the caption num for original transcript and use russian translation,
  10. dl-youtube-cc wgNiGj1nGYE --caption_num=1 --caption_num_second=2 # manually choose the original and translation transcript from available caption list
  11. dl-youtube-cc wgNiGj1nGYE --translation False # without translation
  12. dl-youtube-cc wgNiGj1nGYE --save_to_file=False # print stuff in console
  13. dl-youtube-cc wgNiGj1nGYE --output_file='test.txt' # print stuff in named file
  14. dl-youtube-cc wgNiGj1nGYE --to_json=True # print stuff in json
  15. POSITIONAL ARGUMENTS
  16. VIDEOID
  17. Type: str
  18. the video link or the id of youtube video, the string after 'v=' in a youtube video link
  19. FLAGS
  20. --translation=TRANSLATION
  21. Type: typing.Union[str, bool]
  22. Default: 'zh-Hans'
  23. which will be displayed as original transcript, default to 'zh-Hans' for simplified Chinese, see ./lang_code.json for full list, or pass False to disable translation
  24. --caption_num=CAPTION_NUM
  25. Type: int
  26. Default: 0
  27. choose the caption which will be displayed as original transcript
  28. --caption_num_second=CAPTION_NUM_SECOND
  29. Type: Optional[int]
  30. Default: None
  31. will surpass translation option, choose the caption which will be displayed as translation transcript
  32. --output_file=OUTPUT_FILE
  33. Type: Optional[str]
  34. Default: None
  35. default to video title
  36. --save_to_file=SAVE_TO_FILE
  37. Type: bool
  38. Default: True
  39. pass False to print in console
  40. --to_json=TO_JSON
  41. Type: bool
  42. Default: False
  43. pass True to export caption to json
  44. --remove_font_tag=REMOVE_FONT_TAG
  45. Type: bool
  46. Default: True
  47. remove font tag

Download the caption of entire playlist

dl-youtube-cc-playlist -h will show the following.

  1. NAME
  2. dl-youtube-cc-playlist - download youtube closed caption(subtitles) by playlist. To figure most of params, please use dl-youtube-cc to download one video first before downloading the entire playlist.
  3. SYNOPSIS
  4. dl-youtube-cc-playlist PLAYLIST_URL <flags>
  5. DESCRIPTION
  6. Examples:
  7. dl-youtube-cc-playlist -h # to see this helpful infomation
  8. dl-youtube-cc-playlist PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n
  9. dl-youtube-cc-playlist PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n 0 3 # download the first 3 videos
  10. dl-youtube-cc-playlist https://www.youtube.com/playlist?list=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n
  11. POSITIONAL ARGUMENTS
  12. PLAYLIST_URL
  13. Type: str
  14. the playlist link or the id of youtube playlist, the string after 'list=' in the url
  15. FLAGS
  16. --start=START
  17. Default: 0
  18. the index number in the playlist to start downloading, starting from 0
  19. -e, --end=END
  20. Type: Optional[]
  21. Default: None
  22. the index number in the playlist to end downloading, exclusively
  23. --translation=TRANSLATION
  24. Type: Optional[typing.Union[st...
  25. Default: None
  26. which will be displayed as original transcript, default to 'zh-Hans' for simplified Chinese, see ./lang_code.json for full list, or pass False to disable translation
  27. --caption_num=CAPTION_NUM
  28. Type: int
  29. Default: 0
  30. choose the caption which will be displayed as original transcript
  31. --caption_num_second=CAPTION_NUM_SECOND
  32. Type: Optional[int]
  33. Default: None
  34. will surpass translation option, choose the caption which will be displayed as translation transcript
  35. -o, --output_file=OUTPUT_FILE
  36. Type: Optional[str]
  37. Default: None
  38. default to video title
  39. --save_to_file=SAVE_TO_FILE
  40. Type: bool
  41. Default: True
  42. pass False to print in console
  43. --to_json=TO_JSON
  44. Type: bool
  45. Default: False
  46. pass True to export caption to json
  47. -r, --remove_font_tag=REMOVE_FONT_TAG
  48. Type: bool
  49. Default: True
  50. remove font tag
  51. NOTES
  52. You can also use flags syntax for POSITIONAL ARGUMENT

Use in Code

  1. import download_youtube_subtitle.common as common
  2. import download_youtube_subtitle.main as download_youtube_subtitle
  3. # ...

Development

Environment Setup

for conda

  1. pip install 'fire' 'requests' 'IPython' 'sure' 'pytube' 'progiter'
  2. pip install -e .

Usage

  1. python main.py -h
  2. python main.py VIDEOID

Tests

  1. cd tests
  2. ./run.sh
  3. ./test_cli.sh

Ref

deployment - How can I use setuptools to generate a console_scripts entry point which calls python -m mypackage? - Stack Overflow

Packaging Python Projects — Python Packaging User Guide

./nb/notebook2script.py from course-v3/nbs/dl2 at master · fastai/course-v3

Google Style Python Docstrings