From Mageia wiki
Jump to: navigation, search
Drakconf multiflag.png
Other languages
English ; Français ;

Speech recognition for dictation is a feature that has existed for a long time in Windows. However, under Linux, we only had experimental or embryonic systems. This situation has just changed and I am going to show you a method to have a working system very quickly.

This article focuses on dictation. It does not concern voice control of desktop functions.

For the moment the installation is done with bricks outside the Mageia distribution. However, this situation could change and I will be sure to report on these improvements.

Installation

We will start by installing Vosk in the user space. We refer to the documentation which is on this page:

Installing the engine

If pip3 is not yet installed, do as root :

urpmi python3-pip

then as normal user :

pip3 install --user vosk

Installing the acoustic model

The program needs an acoustic model to work. This template is specific to each language. This page lists the templates that are available. The tests I have done are done with the French model provided by Linto. Attention, this one weighs 1,5 Go. There are also other lighter models.

wget https://alphacephei.com/vosk/models/vosk-model-fr-0.6-linto-2.2.0.zip

Extract the contents of this file in ~/.config/nerd-dictation

Rename the directory vosk-model-fr-0.6-linto to model

Dictation utility

We will complete the speech recognition utility with a tool that allows dictation, nerd-dictation. In practice this tool simulates keyboard input. So it can be used in any application while the tool is running in the background. Please note that I have only tested this tool in a Xorg environment, and it is likely that the tool is not compatible with the Wayland environment.

If needed, you have to install the git tool and xdotool.

cd ~/.local
git clone https://github.com/ideasman42/nerd-dictation.git
cd nerd-dictation
mkdir ~/.local/bin
cp nerd-dictation ~/.local/bin

The copy of the executable in the .local/bin directory allows to have the utility available.

If you uses another keyboard layout than the US one, it is necessary to work around a problem of xdotool, example for French keyboards:

setxkbmap fr

If you have another keyboard layout, you have to change the previous directive.

Now you can start the dictation tool:

 nerd-dictation begin --full-sentence --punctuate-from-previous-timeout=2 --idle-time=0.05 &

and stop it:

nerd-dictation end

Now you can go to the application where you would normally type text, place the cursor in the right place and start dictating. At launch, you have to wait a little while before the first vocalizations are recognized, this is the loading time of the template. With the option --idle-time, we avoid that the tool runs with a full processor core.

If you have placed the acoustic model in another directory, you must specify the location of the model on the command line with the directive :

--vosk-model-dir=<path-to-model>