Defined data types
This topic describes the defined data types that are required to interact with the API. Unless otherwise indicated, each of these types is declared in the header file lh_ttsso.h.

Represents the handle to an open TTS engine instance. It is returned from a successful call to TtsOpen.

Represents the handle to an open TTS map. A TTS map contains the fetch properties for speech data, a dictionary instance, or a tuning data instance. It is returned from a successful call to TtsMapCreateNVS.

Represents the handle to a tuning data (ruleset or ActivePrompt database) instance. It is returned from a successful call to TtsLoadTuningData.

Represents the handle to a dictionary instance. It is returned from a successful call to TtsLoadUsrDictEx.

Represents the handle to an open TTS vector. It is a member of the TTS_SPEAK_DATA and TTS_USRDICT_DATA structures, and is used for storing Internet fetch cookie jars. It is updated when the API is called.

Represents a TTS error. This type is in the header ttsso_types.h, but the TTSRETVAL values are defined in lh_err.h.

Defines all the error severity levels that can be delivered to the TTS_LOG_ERROR_CB callback function.
typedef enum TTS_ERROR_SEVERITY {
TTS_SEVERITY_UNKNOWN = 0, /* Error severity is unknown */
TTS_SEVERITY_CRITICAL, /* All instances out of service */
TTS_SEVERITY_SEVERE, /* Service affecting failure */
TTS_SEVERITY_WARNING, /* Application or non-service affecting failure */
TTS_SEVERITY_INFO /* Informational message */
} TTS_ERROR_SEVERITY;

Defines all event types that can be delivered to the TTS_EVENT_CB callback function.
typedef enum TTS_EVENT {
TTS_EVENT_SENTENCEMARK,
TTS_EVENT_BOOKMARK,
TTS_EVENT_WORDMARK,
TTS_EVENT_PHONEMEMARK,
TTS_EVENT_PARAGRAPHMARK,
} TTS_EVENT;
Events normally mark the beginning of a particular kind of data (sentence, word, and so on) in the audio output. But an event is also issued when the audio output reaches a bookmark inserted in the input text.
TTS_EVENT_SENTENCEMARK |
Marks the beginning of a sentence. Use the type TTS_MARKER_SENTENCE to store the marker’s properties. |
TTS_EVENT_BOOKMARK |
Marks the position of a user bookmark; bookmarks can be inserted in the input text via the SSML <mark> element or the Vocalizer <ESC>\mrk=x\ tag. Use the type TTS_MARKER_BOOK to store the marker’s properties. |
TTS_EVENT_WORDMARK |
Marks the beginning of a word. Note that word marks indicate words as identified by Vocalizer’s lexical analyzer, an early phase of TTS processing that is done before text normalization. This is sufficient for most input texts and most applications, but in some cases, the word markers may not match what a human would consider a single word. For example, there may be word marks for isolated punctuation, the word mark may include trailing punctuation, and for special text normalization types, a single word mark may span a region that is actually spoken as multiple words (for example, a date written in yyyy-mm-dd form might only trigger one word mark). Use the type TTS_MARKER_WORD to store the marker’s properties. |
TTS_EVENT_PARAGRAPHMARK |
Marks the beginning of a paragraph. Use the type TTS_MARKER_PARAGRAPH to store the marker’s properties. Note: Paragraph markers are only issued when paragraphs have been marked in the input text via the paragraph tag (native <ESC>\para\ tag or SSML <p> element). |
TTS_EVENT_PHONEMEMARK |
Marks the beginning of a phoneme. Note that these phoneme marks are not sufficient for use as user dictionary entries or phonetic input; they are just designed for synchronizing the lips of avatars with the audio stream. This is because these phoneme marks don’t include syllable marks, stress, tone information (for Mandarin), and other suprasegmental information that is required for good phonetic transcriptions. For obtaining full L&H+ phonetic transcriptions for user dictionaries or phonetic input, use Nuance Vocalizer Studio instead. If your application requires online generation of full phonetic transcriptions, please contact Nuance Sales to discuss your requirements and product alternatives. Use the type TTS_MARKER_PHONEME to store the marker’s properties. |

Provides all the information to load (fetch) tuning data: a ruleset or ActivePrompt database.
typedef struct TTS_FETCHINFO_T {
const LH_CHAR * szUri;
const LH_CHAR * szContentType;
HTTSMAP hFetchProperties;
} TTS_FETCHINFO_T;
Name |
Value |
---|---|
szUri |
String (zero-terminated) specifying the location of the ruleset or ActivePrompt database. This can be an http address (http://) or a file name (regular or with file://). |
szContentType |
MIME content type of the tuning data:
|
hFetchProperties |
Optional; specify NULL if not used. Used to set the properties of the fetch (note that some properties such as URL_BASE are also used for file fetching). The properties are stored in a map. The following functions maintain this map: TtsMapCreateNVS, TtsMapDestroyNVS, TtsMapSetCharNVS, and TtsMapGetCharNVS. See the lh_inettypes.h header file for available properties. For example, use SPIINET_URL_BASE to support relative URIs and filenames and SPIINET_TIMEOUT_DOWNLOAD to override the default fetch timeout configured in the Vocalizer configuration file. |

Provides bit masks for all marker types. By bitwise or’ing the types of interest, an integer is created that can be used to specify the TTS_MARKER_MODE_PARAM parameter via TtsSetParamsEx. Only the corresponding event types will be issued by the event callback function. However, TTS_EVENT_SENTENCEMARK events are always generated.
typedef enum TTS_MARKER {
TTS_MRK_SENTENCE = 0x0001,
TTS_MRK_WORD = 0x0002,
TTS_MRK_PHONEME = 0x0004,
TTS_MRK_BOOK = 0x0008,
TTS_MRK_PARAGRAPH = 0x0200
} TTS_MARKER;

Describes the parameters for a bookmark marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_BOOKMARK.
typedef struct TTS_MARKER_BOOK {
const LH_CHAR * szID;
TTS_MARKER_POS mrkPos;
const wchar_t * wszID;
} TTS_MARKER_BOOK;
Name |
Value |
---|---|
szID |
The bookmark string as a NULL terminated char string (ISO-8859-1 string). Both szID and wszID indicate the bookmark ID string. szID is only accurate for ISO-8859-1 characters. (Unicode characters within the original bookmark ID are changed to a question mark ("?") character.) |
mrkPos |
Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen. |
wszID |
The bookmark string as a NULL-terminated wchar_t string (Unicode string). Both szID and wszID indicate the bookmark ID string. wszID is recommended because, as type wchar_t, it supports Unicode characters; it is accurate for all possible bookmark IDs within the input text. |

Describes the parameters for a paragraph marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_PARAGRAPHMARK.
typedef struct TTS_MARKER_PARAGRAPH {
TTS_MARKER_POS mrkPos;
} TTS_MARKER_PARAGRAPH;
Name |
Value |
---|---|
mrkPos |
Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen. |

Describes the parameters for a phoneme marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_PHONEMEMARK.
typedef struct TTS_MARKER_PHONEME {
const LH_CHAR * szName;
TTS_MARKER_POS mrkPos;
} TTS_MARKER_PHONEME;
Name |
Value |
---|---|
szName |
A NULL-terminated L&H+ phoneme string. |
mrkPos |
Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen. |

Describes the common properties of a marker. This structure is part of a marker structure that describes a particular kind of marker (TTS_MARKER_BOOK, TTS_MARKER_PHONEME…).
typedef struct TTS_MARKER_POS {
LH_U32 nInputPos;
LH_U32 nInputLen;
LH_U32 nOutputPos;
LH_U32 nOutputLen;
} TTS_MARKER_POS;
Name |
Value |
---|---|
nInputPos |
Starting position for the marker within the input text in bytes, counted from the beginning of the input text. However, when SSML input is used or rulesets are active, these positions refer to the text positions after SSML processing (after expansion to proprietary markup) and after ruleset transformations, not the original input text positions. |
nInputLen |
Length of the marker within the input text in bytes. |
nOutputPos |
Starting position for the marker within the output audio stream in samples, counted from the beginning of the audio stream. |
nOutputLen |
Length of the marker within the output audio stream in samples. |
Not every marker type supports all four attributes. Here’s an overview of which TTS_EVENT event type supports what kind of data:
Event type |
nInputPos |
nInput Len |
nOutput Pos |
nOutput Len |
---|---|---|---|---|
BOOKMARK |
Yes |
No |
Yes |
No |
SENTENCEMARK |
Yes |
Yes |
Yes |
No |
WORDMARK |
Yes |
Yes |
Yes |
No |
PARAGRAPHMARK |
Yes |
No |
Yes |
No |
PHONEMEMARK |
No |
No |
Yes |
No |

Describes the parameters for a sentence marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_SENTENCEMARK.
typedef struct TTS_MARKER_SENTENCE {
TTS_MARKER_POS mrkPos;
} TTS_MARKER_SENTENCE;
Name |
Value |
---|---|
mrkPos |
Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen. |

Describes the parameters for a word marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_WORDMARK.
typedef struct TTS_MARKER_WORD {
TTS_MARKER_POS mrkPos;
} TTS_MARKER_WORD;
Name |
Value |
---|---|
mrkPos |
Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen. |

Specifies the (initial) parameters for a given TTS engine instance when calling TtsOpen.
typedef struct TTS_OPEN_PARAMS {
/* Format version of this struct */
TTS_VERSION fmtVersion;
/* Voice parameters */
LH_CHAR * szLanguage;
LH_CHAR * szVoice;
LH_U16 nFrequency;
LH_U16 nOutputType;
/* Synthesis callbacks */
TTS_SOURCE_CB * TtsSourceCb;
TTS_DEST_CB * TtsDestCb;
TTS_EVENT_CB * TtsEventCb;
TTS_ALLOW_VOICE_SWITCH_CB * TtsAllowVoiceSwitchCb;
/* Logging callbacks, any or all of these may be NULL */
TTS_LOG_ERROR_CB * TtsLogErrorCb;
TTS_LOG_EVENT_CB * TtsLogEventCb;
TTS_LOG_DIAGNOSTIC_CB * TtsDiagnosticsCb;
LH_CHAR * szVoiceModel;
} TTS_OPEN_PARAMS;
Name |
Value |
---|---|
fmtVersion |
Structure version to allow forward compatibility with future releases, use TTS_CURRENT_VERSION. |
szLanguage |
Language string, either a Vocalizer Language name (for example, "American English") or an IETF language code ("en-US"). |
szVoice |
Voice name string (for example, "Samantha"). |
nFrequency |
Voice sampling rate: TTS_FREQ_8KHZ, TTS_FREQ_11KHZ (currently not supported), or TTS_FREQ_22KHZ. |
nOutputType |
Audio output format:
|
TtsSourceCb |
(Optional, can be NULL.) Application-defined callback for supplying the input text when the TTS_SPEAK_DATA structure that is passed to TtsProcessEx has NULL uri and data fields. |
TtsDestCb |
Application-defined callback for supplying the audio output buffer and receiving audio output. |
TtsEventCb |
(Optional, can be NULL.) Application-defined callback for TTS marker notifications, including bookmarks, word marks, phoneme marks, sentence marks, and paragraph marks. |
TtsLogErrorCb |
(Optional, may be NULL.) Application-defined callback for error message notifications. By default Vocalizer logs errors to a Vocalizer log file and to the system log. This allows the application to also receive these notifications for reporting to an application-specific error log. This is only called when it is not NULL and <log_cb_enabled> is true in the Vocalizer configuration file. |
TtsLogEventCb |
(Optional, may be NULL.) Application-defined callback for being notified of events that trace normal application behavior, useful for application monitoring, tuning, and capacity planning. Vocalizer can optionally log this information to a Vocalizer log file as well, but by default that is disabled. This is only called when it is not NULL and <log_cb_enabled> is true in the Vocalizer configuration file. |
TtsLogDiagnosticsCb |
(Optional, may be NULL.) Application-defined callback for diagnostic messages. By default Vocalizer logs errors to a Vocalizer log file. This allows the application to also receive these notifications for reporting to an application-specific log. This is only called when it is not NULL, <log_cb_enabled> is true in the Vocalizer configuration file, and <log_level> is set to enable diagnostic messages in the Vocalizer configuration file. |
szVoiceModel | (Optional, can be NULL.) Voice model or operating point. See Supporting existing applications. |
TtsResolveURICb | (Optional, may be NULL.) Application-defined callback invoked before the engine fetches an external resource specified through a URI. |
TtsAllowVoiceSwitchCb | (Optional, may be NULL.) Application-defined callback invoked whenever the application changes voices during synthesis. |

TTS_SPEAK_DATA is used when calling TtsProcessEx. It describes the location of the input data for a text-to-speech action and its properties. Note that it is still possible to use the source callback method (in which case, set the uri and data structure members to NULL).
typedef struct TTS_SPEAK_DATA {
LH_CHAR* uri;
LH_VOID* data;
LH_U32 lengthBytes;
LH_CHAR* contentType;
HTTSMAP fetchProperties;
HTTSVECTOR fetchCookieJar;
} TTS_SPEAK_DATA;
Name |
Value |
---|---|
uri |
String specifying the location of the input data. This can be an http address (http://) or a file name (regular or with file://). Set the uri member to NULL to indicate that the input data is provided via the data member or the source callback. |
data |
Pointer to a buffer containing the input text. This structure member is used only when uri is NULL. Set both uri and data to NULL to use the source callback function. |
lengthBytes |
The length of the data buffer in bytes. Set this to 0 if the data field is set to NULL. |
contentType |
Specifies the MIME content type of the data. The string is case-sensitive.
Supported values:
|
fetchProperties |
Sets the properties of the fetch. The properties are stored in a map. The following functions manipulate this map: TtsMapCreateNVS, TtsMapDestroyNVS, TtsMapSetCharNVS, and TtsMapGetCharNVS. See the lh_inettypes.h header file for available properties. |
fetchCookieJar |
Reserved for future use; pass NULL. |
Vocalizer does its internal processing using Unicode UTF-16. When the input text is in a different character set, Vocalizer transcodes it to UTF-16 at the start of its processing. The following table lists some common supported character sets.
Character set |
Languages |
Notes |
---|---|---|
UTF-8 |
All languages |
|
UTF-16 |
All languages |
This is the recommended character set, because Vocalizer uses UTF-16 for its internal processing. (If UTF-16 is not convenient, UTF-8 is the next best choice.) If the byte-order mark is missing, big-endian is assumed. |
ISO-8859-1 |
Western languages |
|
windows-1252 |
Western languages |
|
EUC-jp (synonym: EUC) |
Japanese |
|
Shift-JIS |
Japanese |
A third-party component called IBM ICU is used to transcode the input character set to the native UTF-16 character set. It supports a very broad range of character sets.
To learn about the character sets for the contentType parameter, visit the IANA website at www.iana.org/assignments/character-sets.

Describes the properties of a dictionary instance when calling TtsLoadUsrDictEx.
typedef struct TTS_USRDICT_DATA {
LH_U32 version;
LH_CHAR * uri;
LH_VOID * data;
LH_U32 lengthBytes;
LH_CHAR * contentType;
HTTSMAP fetchProperties;
HTTSVECTOR fetchCookieJar;
} TTS_USRDICT_DATA;
Field |
Description |
---|---|
version |
Structure version to allow forward compatibility with future releases. Use TTS_CURRENT_VERSION. |
uri |
String specifying the location of the dictionary. This can be an http address (http://) or a filename (regular or with file://). Set the uri member to NULL to indicate that the input data is read from the data member. |
data |
Pointer to a buffer containing the user dictionary data. This structure member is used when uri is NULL. |
lengthBytes |
The length of the data buffer in bytes. Specify 0 when the data member is NULL. |
contentType |
String that specifies the MIME content type of the user dictionary.
Supported value:
|
fetchProperties |
Used to set the properties of the fetch. The properties are stored in a map. The following functions manipulate this map:
See the lh_inettypes.h header file for available properties. |
fetchCookieJar |
Reserved for future use; pass NULL. |

The TTS_VOICE_INFO structure is used by TtsGetVoiceList to return information about an installed TTS engine voice.
typedef struct TTS_VOICE_INFO {
LH_CHAR szVersion[TTS_MAX_STRING_LENGTH];
LH_CHAR szLanguage[TTS_MAX_STRING_LENGTH];
LH_CHAR szLanguageIETF[TTS_MAX_STRING_LENGTH];
LH_CHAR szLanguageTLW[4];
LH_CHAR szVoice[TTS_MAX_STRING_LENGTH];
LH_CHAR szAge[TTS_MAX_STRING_LENGTH];
LH_CHAR szGender[TTS_MAX_STRING_LENGTH];
LH_CHAR szVoiceModel[TTS_MAX_STRING_LENGTH];
LH_U16 nFrequency;
LH_BOOL bVop;
LH_CHAR szForeignLanguagesIETF[TTS_MAX_STRING_LENGTH];
} TTS_VOICE_INFO;
Field |
Description |
---|---|
szVersion |
Voice version number, such as 5.2.0.7151 |
szLanguage |
Language name, such as American English |
szLanguageIETF |
IETF language code, such as en-us |
szLanguageTLW |
Three-letter language code as used in user dictionaries and rulesets, such as ENU |
szVoice |
Voice name, such as Samantha |
szAge |
Voice age, such as Adult |
szGender |
Voice gender: Male, Female, or Neutral |
szVoiceModel |
Vocalizer internal name for the synthesizer technology |
nFrequency |
Voice sampling rate:
|
bVop |
A boolean flag to indicates whether the voice format:
|
szForeignLanguagesIETF | A comma-separated list of foreign languages supported by a voice. Each language code has 5 chars. For example: sp_mx,fr_ca. If the voice is not multilingual, the string is empty. |