Vocalizer API main functions

The following topics describe the Vocalizer native API for users who plan to integrate the Vocalizer client with their own third party or proprietary speech server or web service.

The following call sequence shows how to use the native API. See the sample program nvscmdline, described in Testing the Vocalizer installation for extra details and for the utility location. This example is a good starting point for writing applications using the native Vocalizer API. It shows how to use all of the major API functions in the correct sequence, including advanced functionality such as: loading user dictionaries and tuning data, changing the rate and volume, and speaking from a URI that is fetched by Vocalizer.

The "application" refers to any source code that uses the API.

The application calls the NVSClientInit function or the NVSClientInitFromFile function to initialize the Vocalizer client with appropriate parameters. The NVSClientInit function sends these parameters as arguments of the function, while NVSClientInitFromFile retrieves the parameters from a prepared file.
The application calls the TtsSystemInit function to initialize the Vocalizer library. TtsSystemInit is called once per process; if it is called more than once per process, it can lead to memory leaks or crashes.
The application calls the TtsOpen function to create a TTS engine instance. An application that serves multiple end users (such as an IVR platform) makes the call multiple times (once for each simultaneous session it supports). The application can specify a number of general parameters such as the default language and voice, the destination callback used to stream audio to the application, and the desired audio format.
Nuance recommends that you specify the TTS input when a TTS action is requested via the TtsProcessEx function. This approach is easier to use, is more efficient, and enables the specification of the input via a URI, a filename, or a text buffer.
When using user dictionaries, the application must:
1. Define a TTS_USRDICT_DATA structure for each dictionary instance; this structure describes where to find the dictionary data and how to use it. This approach enables references to URI addresses. Fetch properties can be defined in case of remote access to the dictionary data.
2. Optionally call TtsMapCreateNVS to create a map data structure to control fetch properties such as the base URI or to override the default download timeout specified in the Vocalizer configuration file. The application calls TtsMapSet* functions (TtsMapSetBoolNVS, TtsMapSetCharNVS, and TtsMapSetU32NVS) to define these properties in the created map data structure.
3. Call TtsUnloadUsrDictEx for each dictionary; this function returns a handle to a dictionary instance. Note that this must be repeated for each TTS instance opened via TtsOpen; user dictionaries are loaded separately for each TTS instance.
When using rulesets or ActivePrompt databases, the application defines a TTS_FETCHINFO_T structure for each ruleset or ActivePrompt database. This structure describes where to find the data and how to use it.
Like user dictionaries, this approach enables references to URI addresses. In case of remote access to the data, you can define fetch properties using the same API calls described for user dictionaries above. The application then calls TtsLoadTuningData for each ruleset and ActivePrompt database. This function returns a handle to the loaded tuning data instance. Note that this must be repeated for each TTS instance opened via TtsOpen; tuning data is loaded separately for each TTS instance.
Optionally, call TtsSetParamsEx to adjust additional speak parameters, such as the speaking rate and volume.
The application sets up a TTS_SPEAK_DATA structure describing the input text; this structure supports specifying a URI, a filename or a memory block. If fetch properties need to be specified, the application calls TtsMapCreateNVS to create a fetch property map. The application then uses the TtsMapSet* functions to add properties to the map one by one.
The application calls TtsProcessEx to convert the input text to audio of the type defined in TTS_OPEN_PARAMS (previously specified by TtsSystemInit). The input can be specified via the TTS_SPEAK_DATA structure (URI or text buffer) or the source callback method. TtsProcessEx executes the TTS action synchronously; it only returns when all the speech samples have been generated.
Vocalizer streams the audio to the application via the TTS_DEST_CB Destination callback.
If required, the application can perform several TtsProcessEx calls, with different TTS_SPEAK_DATA and/or different TTS_USRDICT_DATA instances. When using dictionaries, TtsEnableUsrDictEx, TtsDisableUsrDictEx, and TtsDisableUsrDictsEx can be used to enable dictionaries, change the priorities in which dictionaries are called, and disable dictionaries. The speak parameters can be updated using TtsSetParamsEx.
Note that a limited number of parameters can be updated while TtsProcessEx is busy (just the speaking rate and volume). When the input text contains markup controlling the speech generation, the parameters are updated for the course of the current TtsProcessEx execution, but at the end of the speak request, the parameters reset to the original values used at the start of the TtsProcessEx call.
When using dictionaries, the application can optionally unload each dictionary by calling TtsUnloadUsrDictEx for each loaded dictionary. If this isn’t done, they are automatically unloaded when TtsClose is called to close the TTS instance.
When using rulesets and/or ActivePrompt databases, the application can optionally unload each ruleset and ActivePrompt database by calling TtsUnloadTuningData for each one loaded. If this isn’t done, they are automatically unloaded when TtsClose is called to close the TTS instance.
When maps of fetch properties are created for either TTS_SPEAK_DATA, TTS_USRDICT_DATA, or TTS_FETCHINFO_T, the application must call TtsMapDestroyNVS for each map that has been created; otherwise there will be a memory leak.
The application calls TtsClose to cleanup the TTS engine instance.
The application calls TtsSystemTerminate when it is completely done with Vocalizer and ready to shut down (exit the process). Do not call this if you plan future Vocalizer operations: it is not safe to call TtsSystemInit and TtsSystemTerminate more than once within the same process.
The application calls NVSClientTerminate when it is completely done with the Vocalizer client.

Compatibility with previous API releases

Vocalizer supports API compatibility with RealSpeak Telecom Host 4.x and some earlier releases. However, this documentation does not describe older API calls from those products that are harder to use and/or less functional than the current API calls. Nuance encourages shifting older applications over to the current API calls.

To ensure your existing application only relies on the current documented API calls, set the following C pre-processor definitions in your code or on the compiler command line. This also helps avoid conflicts with Microsoft Windows API headers, as some of the backward compatible types conflict with those headers.

TTSSO_NO_BACKWARD_COMPATIBLE_TYPES disables the outdated data types
TTSSO_NO_BACKWARD_COMPATIBLE_FUNCS disables the outdated API functions

Functional organization of API calls

The functions can be organized in these groups:

Initialize and shutdown the Vocalizer client:

NVSClientInit

One-time initialization of the Vocalizer client.

If used, NVSClientInitFromFile is not used.

NVSClientInitFromFile

One-time initialization of the Vocalizer client, reading a file to supply the initial parameters. If used, NVSClientInit is not used.

NVSClientTerminate

One-time shutdown of the Vocalizer client.

Initialize and shutdown Vocalizer:

TtsSystemInit	One-time initialization of the Vocalizer library.
TtsSystemTerminate	One-time shutdown of the Vocalizer library.

Manage the TTS process:

TtsClose	Close a TTS engine instance and free all its associated resources.
TtsGetVoiceList	Get the list of installed Vocalizer voices and their properties.
TtsOpen	Open a new TTS engine instance.
TtsProcessEx	Convert input data (text) into output data (speech).
TtsProcessWithParams	Convert input text data into speech, while optionally setting TTS engine instance parameters to specified values for just this speak request.
TtsSessionEnd	Disassociate a TTS engine instance with the application-defined session identifier string.
TtsSessionStart	Associate a TTS engine instance with an application-defined session identifier string.
TtsSessionStartEx	Associate a TTS engine instance with an application-defined session, including an optional session configuration and a session identifier string.
TtsStop	Stop the TTS conversion process initiated by a call to TtsProcessEx.

Manage TTS maps for specifying internet fetch properties:

TtsMapCreateNVS	Create an empty key/value map for specifying internet fetch properties.
TtsMapDestroyNVS	Destroy a key/value map that stored internet fetch properties.
TtsMapFreeCharNVS	Free the character string associated with an internet fetch properties map, returned by an earlier call to TtsMapGetCharNVS.
TtsMapGetBoolNVS	Get a named property of type boolean (LH_S32) from a map.
TtsMapGetCharNVS	Get a named property of type string (LH_CHAR *) from a map.
TtsMapGetU32NVS	Get a named property of type unsigned 32-bit integer (LH_U32) from a map.
TtsMapSetBoolNVS	Set a named property of type boolean (LH_S32) in a map.
TtsMapSetCharNVS	Set a named property of type string (LH_CHAR *) in a map.
TtsMapSetU32NVS	Set a named property of type unsigned 32-bit integer (LH_U32) in a map.

Manage user dictionaries and tuning data:

TtsDisableUsrDictEx	Disable a single user dictionary instance on a TTS engine instance.
TtsDisableUsrDictsEx	Disable all user dictionary instances on a TTS engine instance.
TtsEnableUsrDictEx	Enable a user dictionary instance and/or change its priority on a TTS engine instance.
TtsLoadTuningData	Load a ruleset or ActivePrompt database for use by a TTS instance.
TtsLoadUsrDictEx	Load a user dictionary instance into memory.
TtsUnloadTuningData	Unload a ruleset or ActivePrompt database that was previously loaded using TtsLoadTuningData.
TtsUnloadUsrDictEx	Unload a user dictionary instance, freeing the resources associated with it.

Manage parameters:

TtsGetParamEx	Retrieve the value of one parameter.
TtsGetSupportedParams	Return the list of all the supported parameter names for TtsSetParamsEx and TtsGetParamEx.
TtsGetVoiceList	Get the list of installed Vocalizer voices and their properties.
TtsProcessWithParams	Convert input text data into speech, while optionally setting TTS engine instance parameters to a specified value for just this speak request.
TtsSetParamsEx	Set the value of one or more TTS engine instance parameters.
TtsGetLipSyncInfo	Get the visual cue associated the current TTS audio.

Vocalizer API main functions

Compatibility with previous API releases

Functional organization of API calls

Related topics