Welcome Guest.   Make a donation to an author on the site August 29, 2014, 05:20:49 PM  *

Please login or register.
Or did you miss your validation email?


Login with username and password (forgot your password?)
Why not become a lifetime supporting member of the site with a one-time donation of any amount? Your donation entitles you to a ton of additional benefits, including access to exclusive discounts and downloads, the ability to enter monthly free software drawings, and a single non-expiring license key for all of our programs.


You must sign up here before you can post and access some areas of the site. Registration is totally free and confidential.
 
The N.A.N.Y. Challenge 2011! Download 30+ custom programs!
   
   Forum Home   Thread Marks Chat! Downloads Search Login Register  
Pages: [1]   Go Down
  Reply  |  New Topic  |  Print  
Author Topic: IDEA: Wav2txt Using Free Recognition Engine  (Read 3643 times)
vtatila
Participant
*
Posts: 1

View Profile Give some DonationCredits to this forum member
« on: January 14, 2008, 11:15:52 AM »

Hi,
I'm new here. Howabout an app which uses the freely downloadable Microsoft speech recognition engine to turn a PCM wave file into raw ASCII text. This might be useful on the audio tracks of Uni lecture videos , that are clearly spoken. And also as the first step of transcribing things like radio plays, audio books, interviews and so on analogously to how scan and OCR (record and recognize) are the steps before proof reading.

Other benefits would include speed, since the data is already there, the processing can be as fast as the computer may recognize the off-line audio data. And flexibility, as SAPI is a standard, you would get better results in VIsta than you do in XP, and also commercial enginse such as Dragon Naturally Speaking might up the quality even further. Frankly speaking, I'm not sure if the quality would be good enough to be practical in any freebie engines out there.

As far as the UI goes a simple, stand-alone, no-install command-line app using the preferred engine would be fine. Especially as a bit of GUi work would easily make a context menu entry for converting files in a file manager. COmmand line usage would also support batch processing via for and possibly piping, depending on which language and the kind of implementation.

One would think that this should be a pretty easy thing to code, since the hardest parts, the recognition engine and API, are already there. I know Perl and bits of OLe and wrote an app to turn text files into wav-files using the speech synthesis components. However, so far I have yet to understand enough of the recognition side to implement this.

Speaking to a file just involved setting up a file stream as the audio output for the engine. So howabout a file input stream for recognition and writing the recognized output to stdout and giving options of redirecting as needed.

There are a couple of problems in the concept of text to wave. In addition to ugly formatting and lack of punctuation + case in the resultant text, another problem might be engines that do require speaker specific training. If some particular input is needed, as in the XP recognition wizard, how does one generate that input? If the training text may be freely specified, then some initial sentences of the beginning of the wave, also given as parameters, could be used for this. Some of the state of the art engines such as Naturally Speaking can work decently without speaker recognition, though.

I do know there are some shareware apps that pretty much clame to do wav to text conversion. But one wouldn't want to buy these without knowing how good the results are in practice. And asking for money, for a keyboard inaccessible GUI as a screen reader user, if in deed writing wav to text is as easy as text to wav, is a bit unfair, I think. COnsidering there are dozens of text to speech converters out there using SAPI i.e. all the non-trivial bits are handled by MS.

Links:

SAPI 5 SDk including OLE automation docs:

http://tinyurl.com/5m6v2

MS Agent components including the recognition engine:

http://www.microsoft.com/msagent/downloads/user.aspx

And just to show how easy speaking to a file is, heres my function speaking to a file:

Warning: this code doesn't die like it should, but that's because I was lazy in the main script error handling.

sub speakFile
{ # Speak the source string to the specified wav file using the voice and options passed.
   my($voice,  $samplerate, $source, $destination) = @_;
   my $fileStream = Win32::OLE->new('SAPI.SpFileStream') or return "Cannot create a file stream to which to write.";
   if(defined $samplerate)
   { # Change the format.
      $samplerate = 'SAFT' . $samplerate . 'kHz16BitMono'; # The name matches the OLE constant.
      return "Unsupported sampling rate." if not exists $const->{$samplerate};
      my $format = Win32::OLE->new('SAPI.SpAudioFormat');
      $format->{Type} = $const->{$samplerate};
      $fileStream->{Format} = $format;
   } # if
   $fileStream->Open($destination, $const->{SSFMCreateForWrite}, 0);
   $voice->{AudioOutputStream } = $fileStream;
   my $flags = $const->{SVSFDefault} | $const->{SVSFIsFilename} | $const->{SVSFIsNotXML};
   $voice->Speak($source, $flags) or return 'Speaking the text failed.';
   $fileStream->Close();
   return undef;
} # sub

--
With kind regards Veli-Pekka Tätilä
Accessibility, game music, synthesizers and programming:
http://www.student.oulu.fi/~vtatila
Logged
zacharyliu
Participant
*
Posts: 8

View Profile Give some DonationCredits to this forum member
« Reply #1 on: February 03, 2008, 10:22:23 AM »

It shouldn't be too hard to do that, but remember that you have to train the engine first, and also Microsoft's speech-to-text engine is terrible.   
Logged
Pages: [1]   Go Up
  Reply  |  New Topic  |  Print  
 
Jump to:  
   Forum Home   Thread Marks Chat! Downloads Search Login Register  

DonationCoder.com | About Us
DonationCoder.com Forum | Powered by SMF
[ Page time: 0.032s | Server load: 0.07 ]