From Mageia wiki
Jump to: navigation, search


Drakconf multiflag.png
Other languages
English ; Français ;
Synopsis:
Speech recognition for dictation is a feature that has existed for a long time in Windows. However, under Linux, we only had experimental or embryonic systems. This situation has just changed and I am going to show you a method to have a working system very quickly.

This article focuses on dictation. It does not concern voice control of desktop functions. All the installation is done with bricks from the Mageia distribution except a model to download.

Installation

With package repository

Starting from Mageia 9, all needed applications are installable from Core repository. Simply install elograf. The installation will pull all pieces needed for a working installation, including kaldi, vosk-api and nerd-dictation.

Through external sources

We will start by installing Vosk in the user space. We refer to the documentation which is on this page:

Installing the engine

If pip3 is not yet installed, do as root :

urpmi python3-pip

then as normal user :

pip3 install --user vosk

Installing the dictation tool

We will complete the speech recognition utility with a tool that allows dictation, nerd-dictation. In practice this tool simulates keyboard input. So it can be used in any application while the tool is running in the background. Please note that I have only tested this tool in a Xorg environment, and it is likely that the tool is not compatible with the Wayland environment.

If required, you have to install the git tool and xdotool.

cd ~/.local
git clone https://github.com/ideasman42/nerd-dictation.git
cd nerd-dictation
mkdir ~/.local/bin
cp nerd-dictation ~/.local/bin

The copy of the executable in the .local/bin directory allows having the utility available.

Installing the acoustic model

The program needs an acoustic model to work. This template is specific to each language. This page lists the templates that are available. The tests I have done are done with the French model provided by Linto. Attention, this one weighs 1,5Go. There are also lighter models. elograf allows to download models from this list. It save them either in system space (administrator password is needed) or in use space.

Usage

With Elograf

Elograf adds an icon in the systray in the form of a microphone. The microphone is striked initially, and plain when the dictation is running.

With a right click, a menu is displayed, allowing to launch the dictation through nerd-dictation, or to access a Configuration box. In the Configuration, a list of available models is displayed and one can be chosen. It is possible also to download a model and install it in system space or in user space. By choosing "Active direct click on icon", the launch of the dictation is controlled with left click on the icon.

In advanced options, some parameters of the dictation tool can be set. For Wayland, you have to choose DOTOOL as tool to simulate keyboard, XDOTOOL, the default, is specific to X11. Be sure to have of one them installed.

It is recommended to add elograf in the list of applications which are automatically opened at start. This will add the icon in the systray allowing to access quickly to the commands. The method for setting that depends of the desktop environment and is not detailed here.

With Dictation utility

If you use another keyboard layout than the US one, it is necessary to work around a problem of xdotool, example for French keyboards:

setxkbmap fr

If you have another keyboard layout, you have to change the previous directive.

Now you can start the dictation tool:

 nerd-dictation begin --full-sentence --punctuate-from-previous-timeout=2 --idle-time=0.05 &

and stop it:

nerd-dictation end

Now you can go to the application where you would normally type text, place the cursor in the right place and start dictating. At launch, you have to wait a little while before the first vocalizations are recognized, this is the loading time of the template. With the option --idle-time, we avoid that the tool runs with a full processor core.

If you have placed the acoustic model in another directory, you must specify the location of the model on the command line with the directive :

--vosk-model-dir=<path-to-model>