项目作者: dag7dev

项目描述 :
common-voice-tool è uno strumento che aiuta a revisionare e manipolare stringhe rapidamente
高级语言: Shell
项目地址: git://github.com/dag7dev/common-voice-tool.git
创建时间: 2018-11-11T17:59:45Z
项目社区:https://github.com/dag7dev/common-voice-tool

开源协议:GNU General Public License v3.0

下载


common-voice-tool

Common Voice Tool - CLI Tool

GUI - Coming soon!

[Click here for the Italian Version]

Preliminary operations

LAST STABLE: v0.4.2

You need to clone this repo:

  1. git clone https://github.com/dag7dev/common-voice-tool

Beta branch:

  1. git clone --single-branch -b beta https://github.com/dag7dev/common-voice-tool

BASH VERSION

  1. cd common-voice-tool
  2. cd Bash
  3. sudo chmod 755 common-voice-tool.sh

You can finally run this script ;)

Python version

This version can turn in handy if you’re on a system that can’t run bash scripts (e.g. Windows unless you use WSL or other stuff). It was made by jotaro-sama to automate the work of checking whether the sentences to pass to Mozilla’s Common Voice were properly formatted. To run it, any version of Python 3.x should be fine. Just pass the file with the sentences to the script like this:

  1. python3 common-voice-tool.py sentences.txt #Most GNU/Linux distros, macOS
  2. py -3 common-voice-tool.py sentences.txt #Windows Command Prompt

The exact commands may vary depending on how you configured your system.

At the moment, the Python version works a bit differently from the bash one, but it’s fully functional. It automatically formats the file (putting the output in out.txt) so that:

  • There are no empty lines.
  • There are no double spaces, spaces before the final dot or spaces at the end of the lines.
  • All lines are capitalized.
  • All lines end with a dot.

It also notifies you whether some sentences exceed the 125 characters length (dot included), which was the maximum length for sentences to be passed to Common Voice.

CLI USAGE

If you run the bash script without parameters you will get this (after language selection):

  1. ./common-voice-tool
  2. usage: ./common-voice-tool <language-code> <options>
  3. -h or -help
  4. Show this message
  5. -range or -chkLen
  6. Check if there are lines in the file which exceed a maximum length.
  7. -trim
  8. Trim whitespace at the end of the lines.
  9. -chkPoint
  10. Check if all rows in the file end with a dot (doesn't replace it, just checks).
  11. -ac
  12. Add a dot to the rows not ending with one.
  13. -noEmpty
  14. Remove all the empty lines.
  15. -capitalize
  16. Capitalize the first character of every sentence.
  17. -all
  18. It performs everything listed! Probably using just this option will be what are you looking for ;)

To run this script you need to include your language code and at least one parameter.

For example:

  1. ./common-voice-tool en -range

will check if there are lines in the file which exceed a maximum length.

You don’t need to pass the filename as a parameter, as the script will prompt you to choose a file when launched.

You can run this script with multiple parameters.

As an example:

  1. ./common-voice-tool en -range -noEmpty

will check if there are lines exceeding a maximum length and remove all the empty lines.

LIST OF PARAMETERS

Parameter What does it do
-h show help (as if you run without parameters)
-range Check if there are lines exceeding a maximum length
-chkLen same as above
-trim Trim whitespace at the end of every row.
-chkPoint Check if all rows in the file end with a dot (just check).
-ac same as above but it will add the dot if it’s missing.
-noEmpty Remove all the empty lines.
-capitalize Capitalize the first character of every sentence.
-all It performs every operation listed here!

NOTES

This script MUST be used ONLY with plain TEXT files.

You can select your language by selecting the right country-code when you run the script.
‘lang’ folder must exists and at least one file must be into the folder.

What branch should I use?

The beta one is “experimental”, I always update this branch.

Until the beta becomes stable, I won’t push to master and there won’t be a release.

In future, branches management may change.

WIP

Todo:

  • Split lines automatically
  • Localization DONE
  • Capitalize all first letters at the begin of each row DONE
  • Remove empty lines. DONE
  • Add check row’s length while adding a dot at the end of each row. DONE

Can I contact you?

Sure! You can found my email address inside the source code!

You can also contact me here, on GitHub!

Let me know if you love this software or if it has something which needs a fix!

Why this tool?

This tool was meant to help people prepare strings to be used with Mozilla’s Common Voice project (check it out, it’s really cool!).

It can help you in checking length, adding full stop (when needed) and other several useful things.

How can I help you?

Submit issues and give me more ideas about implementing new features! :)