Speech recognition for dictation is a feature that has existed for a long time in Windows. However, under Linux, we only had experimental or embryonic systems. This situation has just changed and I am going to show you a method to have a working system very quickly.
This article focuses on dictation. It does not concern voice control of desktop functions.
For the moment the installation is done with bricks outside the Mageia distribution. However, this situation could change and I will be sure to report on these improvements.
We will start by installing Vosk in the user space. We refer to the documentation which is on this page:
Installing the engine
If pip3 is not yet installed, do as root :
then as normal user :
pip3 install --user vosk
Installing the acoustic model
The program needs an acoustic model to work. This template is specific to each language. This page lists the templates that are available. The tests I have done are done with the French model provided by Linto. Attention, this one weighs 1,5 Go. There are also other lighter models.
Extract the contents of this file in ~/.config/nerd-dictation
Rename the directory vosk-model-fr-0.6-linto to model
We will complete the speech recognition utility with a tool that allows dictation, nerd-dictation. In practice this tool simulates keyboard input. So it can be used in any application while the tool is running in the background. Please note that I have only tested this tool in a Xorg environment, and it is likely that the tool is not compatible with the Wayland environment.
If needed, you have to install the git tool and xdotool.
cd ~/.local git clone https://github.com/ideasman42/nerd-dictation.git cd nerd-dictation mkdir ~/.local/bin cp nerd-dictation ~/.local/bin
The copy of the executable in the .local/bin directory allows to have the utility available.
If you uses another keyboard layout than the US one, it is necessary to work around a problem of xdotool, example for French keyboards:
If you have another keyboard layout, you have to change the previous directive.
Now you can start the dictation tool:
nerd-dictation begin --full-sentence --punctuate-from-previous-timeout=2 --idle-time=0.05 &
and stop it:
Now you can go to the application where you would normally type text, place the cursor in the right place and start dictating. At launch, you have to wait a little while before the first vocalizations are recognized, this is the loading time of the template. With the option --idle-time, we avoid that the tool runs with a full processor core.
If you have placed the acoustic model in another directory, you must specify the location of the model on the command line with the directive :