Vocalizer API main functions
The following topics describe the Vocalizer native API for users who plan to integrate the Vocalizer client with their own third party or proprietary speech server or web service.
The following call sequence shows how to use the native API.
The "application" refers to any source code that uses the API.
- The application calls the NVSClientInit function or the NVSClientInitFromFile function to initialize the Vocalizer client with appropriate parameters. The NVSClientInit function sends these parameters as arguments of the function, while NVSClientInitFromFile retrieves the parameters from a prepared file.
- The application calls the TtsSystemInit function to initialize the Vocalizer library. TtsSystemInit is called once per process; if it is called more than once per process, it can lead to memory leaks or crashes.
- The application calls the TtsOpen function to create a TTS engine instance. An application that serves multiple end users (such as an IVR platform) makes the call multiple times (once for each simultaneous session it supports). The application can specify a number of general parameters such as the default language and voice, the destination callback used to stream audio to the application, and the desired audio format.
Nuance recommends that you specify the TTS input when a TTS action is requested via the TtsProcessEx function. This approach is easier to use, is more efficient, and enables the specification of the input via a URI, a filename, or a text buffer.
- When using user dictionaries, the application must:
- Define a TTS_USRDICT_DATA structure for each dictionary instance; this structure describes where to find the dictionary data and how to use it. This approach enables references to URI addresses. Fetch properties can be defined in case of remote access to the dictionary data.
- Optionally call TtsMapCreateNVS to create a map data structure to control fetch properties such as the base URI or to override the default download timeout specified in the Vocalizer configuration file. The application calls TtsMapSet* functions (TtsMapSetBoolNVS, TtsMapSetCharNVS, and TtsMapSetU32NVS) to define these properties in the created map data structure.
- Call TtsUnloadUsrDictEx for each dictionary; this function returns a handle to a dictionary instance. Note that this must be repeated for each TTS instance opened via TtsOpen; user dictionaries are loaded separately for each TTS instance.
- When using rulesets or ActivePrompt databases, the application defines a TTS_FETCHINFO_T structure for each ruleset or ActivePrompt database. This structure describes where to find the data and how to use it.
Like user dictionaries, this approach enables references to URI addresses. In case of remote access to the data, you can define fetch properties using the same API calls described for user dictionaries above. The application then calls TtsLoadTuningData for each ruleset and ActivePrompt database. This function returns a handle to the loaded tuning data instance. Note that this must be repeated for each TTS instance opened via TtsOpen; tuning data is loaded separately for each TTS instance.
- Optionally, call TtsSetParamsEx to adjust additional speak parameters, such as the speaking rate and volume.
- The application sets up a TTS_SPEAK_DATA structure describing the input text; this structure supports specifying a URI, a filename or a memory block. If fetch properties need to be specified, the application calls TtsMapCreateNVS to create a fetch property map. The application then uses the TtsMapSet* functions to add properties to the map one by one.
- The application calls TtsProcessEx to convert the input text to audio of the type defined in TTS_OPEN_PARAMS (previously specified by TtsSystemInit). The input can be specified via the TTS_SPEAK_DATA structure (URI or text buffer) or the source callback method. TtsProcessEx executes the TTS action synchronously; it only returns when all the speech samples have been generated.
- Vocalizer streams the audio to the application via the TTS_DEST_CB Destination callback.
- If required, the application can perform several TtsProcessEx calls, with different TTS_SPEAK_DATA and/or different TTS_USRDICT_DATA instances. When using dictionaries, TtsEnableUsrDictEx, TtsDisableUsrDictEx, and TtsDisableUsrDictsEx can be used to enable dictionaries, change the priorities in which dictionaries are called, and disable dictionaries. The speak parameters can be updated using TtsSetParamsEx.
Note that a limited number of parameters can be updated while TtsProcessEx is busy (just the speaking rate and volume). When the input text contains markup controlling the speech generation, the parameters are updated for the course of the current TtsProcessEx execution, but at the end of the speak request, the parameters reset to the original values used at the start of the TtsProcessEx call.
- When using dictionaries, the application can optionally unload each dictionary by calling TtsUnloadUsrDictEx for each loaded dictionary. If this isn’t done, they are automatically unloaded when TtsClose is called to close the TTS instance.
- When using rulesets and/or ActivePrompt databases, the application can optionally unload each ruleset and ActivePrompt database by calling TtsUnloadTuningData for each one loaded. If this isn’t done, they are automatically unloaded when TtsClose is called to close the TTS instance.
- When maps of fetch properties are created for either TTS_SPEAK_DATA, TTS_USRDICT_DATA, or TTS_FETCHINFO_T, the application must call TtsMapDestroyNVS for each map that has been created; otherwise there will be a memory leak.
- The application calls TtsClose to cleanup the TTS engine instance.
- The application calls TtsSystemTerminate when it is completely done with Vocalizer and ready to shut down (exit the process). Do not call this if you plan future Vocalizer operations: it is not safe to call TtsSystemInit and TtsSystemTerminate more than once within the same process.
- The application calls NVSClientTerminate when it is completely done with the Vocalizer client.
Compatibility with previous API releases
Vocalizer supports API compatibility with RealSpeak Telecom Host 4.x and some earlier releases. However, this documentation does not describe older API calls from those products that are harder to use and/or less functional than the current API calls. Nuance encourages shifting older applications over to the current API calls.
To ensure your existing application only relies on the current documented API calls, set the following C pre-processor definitions in your code or on the compiler command line. This also helps avoid conflicts with Microsoft Windows API headers, as some of the backward compatible types conflict with those headers.
- TTSSO_NO_BACKWARD_COMPATIBLE_TYPES disables the outdated data types
- TTSSO_NO_BACKWARD_COMPATIBLE_FUNCS disables the outdated API functions
Functional organization of API calls
The functions can be organized in these groups:
Initialize and shutdown the Vocalizer client:
One-time initialization of the Vocalizer client. If used, NVSClientInitFromFile is not used. |
|
One-time initialization of the Vocalizer client, reading a file to supply the initial parameters. If used, NVSClientInit is not used. |
|
One-time shutdown of the Vocalizer client. |
Initialize and shutdown Vocalizer:
One-time initialization of the Vocalizer library. |
|
One-time shutdown of the Vocalizer library. |
Manage the TTS process:
Close a TTS engine instance and free all its associated resources. |
|
Get the list of installed Vocalizer voices and their properties. |
|
Open a new TTS engine instance. |
|
Convert input data (text) into output data (speech). |
|
Convert input text data into speech, while optionally setting TTS engine instance parameters to specified values for just this speak request. |
|
Disassociate a TTS engine instance with the application-defined session identifier string. |
|
Associate a TTS engine instance with an application-defined session identifier string. |
|
Associate a TTS engine instance with an application-defined session, including an optional session configuration and a session identifier string. |
|
Stop the TTS conversion process initiated by a call to TtsProcessEx. |
Manage TTS maps for specifying internet fetch properties:
Create an empty key/value map for specifying internet fetch properties. |
|
Destroy a key/value map that stored internet fetch properties. |
|
Free the character string associated with an internet fetch properties map, returned by an earlier call to TtsMapGetCharNVS. |
|
Get a named property of type boolean (LH_S32) from a map. |
|
Get a named property of type string (LH_CHAR *) from a map. |
|
Get a named property of type unsigned 32-bit integer (LH_U32) from a map. |
|
Set a named property of type boolean (LH_S32) in a map. |
|
Set a named property of type string (LH_CHAR *) in a map. |
|
Set a named property of type unsigned 32-bit integer (LH_U32) in a map. |
Manage user dictionaries and tuning data:
Disable a single user dictionary instance on a TTS engine instance. |
|
Disable all user dictionary instances on a TTS engine instance. |
|
Enable a user dictionary instance and/or change its priority on a TTS engine instance. |
|
Load a ruleset or ActivePrompt database for use by a TTS instance. |
|
Load a user dictionary instance into memory. |
|
Unload a ruleset or ActivePrompt database that was previously loaded using TtsLoadTuningData. |
|
Unload a user dictionary instance, freeing the resources associated with it. |
Manage parameters:
Retrieve the value of one parameter. |
|
Return the list of all the supported parameter names for TtsSetParamsEx and TtsGetParamEx. |
|
Get the list of installed Vocalizer voices and their properties. |
|
Convert input text data into speech, while optionally setting TTS engine instance parameters to a specified value for just this speak request. |
|
Set the value of one or more TTS engine instance parameters. |
|
TtsGetLipSyncInfo |
Get the visual cue associated the current TTS audio. |