Skip to content

muellmatto/nanotts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NanoTTS

Speech synthesizer commandline utility that improves pico2wave, included with SVOX PicoTTS

Goal


Rewrite pico2wave front-end into something more user friendly.

Ideally, add features to aid automatic parsing of large text-files (50k+ words) into small batches of automatically named .wav or .mp3 files. The goal is to aid in the structured digestion of papers/articles/books, but also to make more versatile for many other speech synthesization uses as well.

Steps:

  • get a bare-bones working implementation of picotts, sans cruft
  • create cmdline file:
  • implement cmdline switches that do:
    • print detailed help (-h, --help)
    • reads WORDS from stdin (default, if no other input modes detected)
    • reads WORDS from cmdline (-w )
    • reads WORDS from file (-f )
    • writes WAVE to file (-o )
    • silence device pcm playback (--no-play|-m)
    • cleanup printed output
    • select voice (-v )
    • writes PCM-data to stdout (-c)
    • set voice files (lingware) path (-l )
    • set speed
    • set volume
    • set pitch
    • progress meter to stderr
    • playback keys: spacebar, left+right arrow, ESC, +/- (playback speed)
    • run through: gprof, valgrind
    • write man page ; and make install
    • autonaming func
    • -q flag to silence output to {stdout, stderr}
    • catch signals to cancel PCM playback/output cleanly
    • confirm working on both Mac and Linux
  • extra:
    • able to read multiple files at once (-files [file2][file3][..])
    • limit text input to N lines
    • bit-rate, frequency, channel, parms for .wav
    • mp3 output
    • store base settings in $HOME/.config type file, so you dont have to type language prefs every time
    • advanced feature to carve up large text-file into set of auto-named .mp3, supporting -p
    • search & replace, useful for replacing certain problem characters, such as '-' (pico says "hyphen") that can ruin the flow of a book, so replace '-' with ',' which pico interprets instead as a pause.

MP3 PIPE example

echo "eenie meany miny moh" | ./nanotts -c | lame -r -s 16 --bitwidth 16 --signed --little-endian -m m -b 32 -h - out.mp3

I know what you're thinking--mp3 is a mess. And you would be right to think that. Basically, because it's raw PCM, you have to tell lame exactly what format to expect. But hey, at least right now mp3 is automatable!

email: _greg AT naughton DOT org

About

Improved SVOX PicoTTS speech synthesizer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 94.3%
  • C++ 5.6%
  • Other 0.1%