Open Source Software For Transcribing Speech in Audio Files Open Source Software For Transcribing Speech in Audio Files python python

Open Source Software For Transcribing Speech in Audio Files


Why can't it read a wav?

It tells you that the file has wrong sampling rate (8000) instead of requested (16000). Sampling rate is very important for speech recognition software.

Why can't it read /dev/dsp?

In recent versions of Ubuntu pulseaudio framework is used instead of OSS. The version you are trying is using OSS so you need to install oss-compatibility package from your distribution to bring OSS support back.

You can try newer Julius which has pulseaudio support

Why does it then appear to be able to read /dev/dsp, but not react in any way?

Audio input doesn't work properly.

Has anyone else had any success with open source speech recognizers, especially on Linux?

Sure, check this video as an example of what people do with CMUSphinx:

http://www.youtube.com/watch?v=vfaNLIowSyk

I suggest you to revisit CMUSphinx package which is a leading open source speech recognition engine. There are loads of documents on the website, you just need to read them. Remember that speech recognition is a complex area where you can get a great results but you also need to invest your time in understanding the technology. Just like with any other domain.

In short, to transcribe a file with CMUSPhinx you need to do the following 3 simple steps:

  1. Take wav file and resample it to 8khz 16 bit mono file with sox:
    sox input.wav -r 8000 -c 1 resampled.wav
  1. Install pocketsphinx 0.7
   apt-get install pocketsphinx
  1. Decode the file
    pocketsphinx_continuous -samprate 8000 -infile resampled.wav

The result will be printed to standard output. To supress the logger, add stderr redirection to /dev/null

    pocketsphinx_continuous -infile resampled.wav 2> /dev/null