Defined data types

HTTSINSTANCE

Represents the handle to an open TTS engine instance. It is returned from a successful call to TtsOpen.

HTTSMAP

Represents the handle to an open TTS map. A TTS map contains the fetch properties for speech data, a dictionary instance, or a tuning data instance. It is returned from a successful call to TtsMapCreateNVS.

HTTSTUNINGDATA

Represents the handle to a tuning data (ruleset or ActivePrompt database) instance. It is returned from a successful call to TtsLoadTuningData.

HTTSUSRDICT

Represents the handle to a dictionary instance. It is returned from a successful call to TtsLoadUsrDictEx.

HTTSVECTOR

Represents the handle to an open TTS vector. It is a member of the TTS_SPEAK_DATA and TTS_USRDICT_DATA structures, and is used for storing Internet fetch cookie jars. It is updated when the API is called.

TTSRETVAL

Represents a TTS error. This type is in the header ttsso_types.h, but the TTSRETVAL values are defined in lh_err.h.

TTS_ERROR_SEVERITY

Defines all the error severity levels that can be delivered to the TTS_LOG_ERROR_CB callback function.

typedef enum TTS_ERROR_SEVERITY {

 TTS_SEVERITY_UNKNOWN = 0, /* Error severity is unknown */

 TTS_SEVERITY_CRITICAL, /* All instances out of service */

 TTS_SEVERITY_SEVERE, /* Service affecting failure */

 TTS_SEVERITY_WARNING, /* Application or non-service affecting failure */

 TTS_SEVERITY_INFO /* Informational message */

} TTS_ERROR_SEVERITY;

TTS_EVENT

Defines all event types that can be delivered to the TTS_EVENT_CB callback function.

typedef enum TTS_EVENT {

 TTS_EVENT_SENTENCEMARK,

 TTS_EVENT_BOOKMARK,

 TTS_EVENT_WORDMARK,

 TTS_EVENT_PHONEMEMARK,

 TTS_EVENT_PARAGRAPHMARK,

} TTS_EVENT;

Events normally mark the beginning of a particular kind of data (sentence, word, and so on) in the audio output. But an event is also issued when the audio output reaches a bookmark inserted in the input text.

TTS_EVENT_SENTENCEMARK	Marks the beginning of a sentence. Use the type TTS_MARKER_SENTENCE to store the marker’s properties.
TTS_EVENT_BOOKMARK	Marks the position of a user bookmark; bookmarks can be inserted in the input text via the SSML <mark> element or the Vocalizer <ESC>\mrk=x\ tag. Use the type TTS_MARKER_BOOK to store the marker’s properties.
TTS_EVENT_WORDMARK	Marks the beginning of a word. Note that word marks indicate words as identified by Vocalizer’s lexical analyzer, an early phase of TTS processing that is done before text normalization. This is sufficient for most input texts and most applications, but in some cases, the word markers may not match what a human would consider a single word. For example, there may be word marks for isolated punctuation, the word mark may include trailing punctuation, and for special text normalization types, a single word mark may span a region that is actually spoken as multiple words (for example, a date written in yyyy-mm-dd form might only trigger one word mark). Use the type TTS_MARKER_WORD to store the marker’s properties.
TTS_EVENT_PARAGRAPHMARK	Marks the beginning of a paragraph. Use the type TTS_MARKER_PARAGRAPH to store the marker’s properties. Note: Paragraph markers are only issued when paragraphs have been marked in the input text via the paragraph tag (native <ESC>\para\ tag or SSML <p> element).
TTS_EVENT_PHONEMEMARK	Marks the beginning of a phoneme. Note that these phoneme marks are not sufficient for use as user dictionary entries or phonetic input; they are just designed for synchronizing the lips of avatars with the audio stream. This is because these phoneme marks don’t include syllable marks, stress, tone information (for Mandarin), and other suprasegmental information that is required for good phonetic transcriptions. For obtaining full L&H+ phonetic transcriptions for user dictionaries or phonetic input, use Nuance Vocalizer Studio instead. If your application requires online generation of full phonetic transcriptions, please contact Nuance Sales to discuss your requirements and product alternatives. Use the type TTS_MARKER_PHONEME to store the marker’s properties.

TTS_FETCHINFO_T

Provides all the information to load (fetch) tuning data: a ruleset or ActivePrompt database.

typedef struct TTS_FETCHINFO_T {

 const LH_CHAR * szUri;

 const LH_CHAR * szContentType;

 HTTSMAP hFetchProperties;

} TTS_FETCHINFO_T;

Name	Value
szUri	String (zero-terminated) specifying the location of the ruleset or ActivePrompt database. This can be an http address (http://) or a file name (regular or with file://).
szContentType	MIME content type of the tuning data: For a ruleset: "application/x-vocalizer-rettt+text" (TTS_MIME_RULESET_TEXT in lh_ttsso.h) For an ActivePrompt database: "application/x-vocalizer-activeprompt-db" (TTS_MIME_ACTIVEPROMPT_DB in lh_ttsso.h). Append ";mode=automatic" (TTS_MIME_ACTIVEPROMPT_DB_AUTOMATIC in lh_ttsso.h) to override the default ActivePrompt matching mode to automatic.
hFetchProperties	Optional; specify NULL if not used. Used to set the properties of the fetch (note that some properties such as URL_BASE are also used for file fetching). The properties are stored in a map. The following functions maintain this map: TtsMapCreateNVS, TtsMapDestroyNVS, TtsMapSetCharNVS, and TtsMapGetCharNVS. See the lh_inettypes.h header file for available properties. For example, use SPIINET_URL_BASE to support relative URIs and filenames and SPIINET_TIMEOUT_DOWNLOAD to override the default fetch timeout configured in the Vocalizer configuration file.

Name

Value

szUri

String (zero-terminated) specifying the location of the ruleset or ActivePrompt database. This can be an http address (http://) or a file name (regular or with file://).

szContentType

MIME content type of the tuning data:

For a ruleset: "application/x-vocalizer-rettt+text" (TTS_MIME_RULESET_TEXT in lh_ttsso.h)
For an ActivePrompt database: "application/x-vocalizer-activeprompt-db" (TTS_MIME_ACTIVEPROMPT_DB in lh_ttsso.h). Append ";mode=automatic" (TTS_MIME_ACTIVEPROMPT_DB_AUTOMATIC in lh_ttsso.h) to override the default ActivePrompt matching mode to automatic.

hFetchProperties

Optional; specify NULL if not used. Used to set the properties of the fetch (note that some properties such as URL_BASE are also used for file fetching). The properties are stored in a map. The following functions maintain this map: TtsMapCreateNVS, TtsMapDestroyNVS, TtsMapSetCharNVS, and TtsMapGetCharNVS. See the lh_inettypes.h header file for available properties.

For example, use SPIINET_URL_BASE to support relative URIs and filenames and SPIINET_TIMEOUT_DOWNLOAD to override the default fetch timeout configured in the Vocalizer configuration file.

TTS_MARKER

Provides bit masks for all marker types. By bitwise or’ing the types of interest, an integer is created that can be used to specify the TTS_MARKER_MODE_PARAM parameter via TtsSetParamsEx. Only the corresponding event types will be issued by the event callback function. However, TTS_EVENT_SENTENCEMARK events are always generated.

typedef enum TTS_MARKER {

 TTS_MRK_SENTENCE = 0x0001,

 TTS_MRK_WORD = 0x0002,

 TTS_MRK_PHONEME = 0x0004,

 TTS_MRK_BOOK = 0x0008,

 TTS_MRK_PARAGRAPH = 0x0200

} TTS_MARKER;

TTS_MARKER_BOOK

Describes the parameters for a bookmark marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_BOOKMARK.

typedef struct TTS_MARKER_BOOK {

 const LH_CHAR * szID;

 TTS_MARKER_POS mrkPos;

 const wchar_t * wszID;

} TTS_MARKER_BOOK;

Name	Value
szID	The bookmark string as a NULL terminated char string (ISO-8859-1 string). Both szID and wszID indicate the bookmark ID string. szID is only accurate for ISO-8859-1 characters. (Unicode characters within the original bookmark ID are changed to a question mark ("?") character.)
mrkPos	Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen.
wszID	The bookmark string as a NULL-terminated wchar_t string (Unicode string). Both szID and wszID indicate the bookmark ID string. wszID is recommended because, as type wchar_t, it supports Unicode characters; it is accurate for all possible bookmark IDs within the input text.

Name

Value

szID

The bookmark string as a NULL terminated char string (ISO-8859-1 string).

Both szID and wszID indicate the bookmark ID string. szID is only accurate for ISO-8859-1 characters. (Unicode characters within the original bookmark ID are changed to a question mark ("?") character.)

mrkPos

Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen.

wszID

The bookmark string as a NULL-terminated wchar_t string (Unicode string).

Both szID and wszID indicate the bookmark ID string. wszID is recommended because, as type wchar_t, it supports Unicode characters; it is accurate for all possible bookmark IDs within the input text.

TTS_MARKER_PARAGRAPH

Describes the parameters for a paragraph marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_PARAGRAPHMARK.

typedef struct TTS_MARKER_PARAGRAPH {

 TTS_MARKER_POS mrkPos;

} TTS_MARKER_PARAGRAPH;

Name	Value
mrkPos	Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen.

TTS_MARKER_PHONEME

Describes the parameters for a phoneme marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_PHONEMEMARK.

typedef struct TTS_MARKER_PHONEME {

 const LH_CHAR * szName;

 TTS_MARKER_POS mrkPos;

} TTS_MARKER_PHONEME;

Name	Value
szName	A NULL-terminated L&H+ phoneme string.
mrkPos	Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen.

TTS_MARKER_POS

Describes the common properties of a marker. This structure is part of a marker structure that describes a particular kind of marker (TTS_MARKER_BOOK, TTS_MARKER_PHONEME…).

typedef struct TTS_MARKER_POS {

 LH_U32 nInputPos;

 LH_U32 nInputLen;

 LH_U32 nOutputPos;

 LH_U32 nOutputLen;

} TTS_MARKER_POS;

Name	Value
nInputPos	Starting position for the marker within the input text in bytes, counted from the beginning of the input text. However, when SSML input is used or rulesets are active, these positions refer to the text positions after SSML processing (after expansion to proprietary markup) and after ruleset transformations, not the original input text positions.
nInputLen	Length of the marker within the input text in bytes.
nOutputPos	Starting position for the marker within the output audio stream in samples, counted from the beginning of the audio stream.
nOutputLen	Length of the marker within the output audio stream in samples.

Not every marker type supports all four attributes. Here’s an overview of which TTS_EVENT event type supports what kind of data:

Event type	nInputPos	nInput Len	nOutput Pos	nOutput Len
BOOKMARK	Yes	No	Yes	No
SENTENCEMARK	Yes	Yes	Yes	No
WORDMARK	Yes	Yes	Yes	No
PARAGRAPHMARK	Yes	No	Yes	No
PHONEMEMARK	No	No	Yes	No

TTS_MARKER_SENTENCE

Describes the parameters for a sentence marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_SENTENCEMARK.

typedef struct TTS_MARKER_SENTENCE {

 TTS_MARKER_POS mrkPos;

} TTS_MARKER_SENTENCE;

Name	Value
mrkPos	Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen.

TTS_MARKER_WORD

Describes the parameters for a word marker. This structure is passed to TTS_EVENT_CB callback when the event is TTS_EVENT_WORDMARK.

typedef struct TTS_MARKER_WORD {

 TTS_MARKER_POS mrkPos;

} TTS_MARKER_WORD;

Name	Value
mrkPos	Struct that describes the data values for one marker. Consists of nInputPos, nInputLen, nOutputPos, and nOutputLen.

TTS_OPEN_PARAMS

Specifies the (initial) parameters for a given TTS engine instance when calling TtsOpen.

typedef struct TTS_OPEN_PARAMS {

 /* Format version of this struct */

 TTS_VERSION fmtVersion;

/* Voice parameters */

 LH_CHAR * szLanguage;

 LH_CHAR * szVoice;

 LH_U16 nFrequency;

 LH_U16 nOutputType;

 /* Synthesis callbacks */

 TTS_SOURCE_CB * TtsSourceCb;

 TTS_DEST_CB * TtsDestCb;

 TTS_EVENT_CB * TtsEventCb;

 TTS_ALLOW_VOICE_SWITCH_CB * TtsAllowVoiceSwitchCb;

/* Logging callbacks, any or all of these may be NULL */

 TTS_LOG_ERROR_CB * TtsLogErrorCb;

 TTS_LOG_EVENT_CB * TtsLogEventCb;

 TTS_LOG_DIAGNOSTIC_CB * TtsDiagnosticsCb;

 LH_CHAR    * szVoiceModel;

} TTS_OPEN_PARAMS;

Name	Value
fmtVersion	Structure version to allow forward compatibility with future releases, use TTS_CURRENT_VERSION.
szLanguage	Language string, either a Vocalizer Language name (for example, "American English") or an IETF language code ("en-US").
szVoice	Voice name string (for example, "Samantha").
nFrequency	Voice sampling rate: TTS_FREQ_8KHZ, TTS_FREQ_11KHZ (currently not supported), or TTS_FREQ_22KHZ.
nOutputType	Audio output format: TTS_LINEAR_16BIT for linear 16-bit PCM samples TTS_MULAW_8BIT for 8-bit µ-law samples (8kHz voices only) TTS_ALAW_8BIT for 8-bit A-law samples (8kHz voices only)
TtsSourceCb	(Optional, can be NULL.) Application-defined callback for supplying the input text when the TTS_SPEAK_DATA structure that is passed to TtsProcessEx has NULL uri and data fields.
TtsDestCb	Application-defined callback for supplying the audio output buffer and receiving audio output.
TtsEventCb	(Optional, can be NULL.) Application-defined callback for TTS marker notifications, including bookmarks, word marks, phoneme marks, sentence marks, and paragraph marks.
TtsLogErrorCb	(Optional, may be NULL.) Application-defined callback for error message notifications. By default Vocalizer logs errors to a Vocalizer log file and to the system log. This allows the application to also receive these notifications for reporting to an application-specific error log. This is only called when it is not NULL and <log_cb_enabled> is true in the Vocalizer configuration file.
TtsLogEventCb	(Optional, may be NULL.) Application-defined callback for being notified of events that trace normal application behavior, useful for application monitoring, tuning, and capacity planning. Vocalizer can optionally log this information to a Vocalizer log file as well, but by default that is disabled. This is only called when it is not NULL and <log_cb_enabled> is true in the Vocalizer configuration file.
TtsLogEventCb
TtsLogDiagnosticsCb	(Optional, may be NULL.) Application-defined callback for diagnostic messages. By default Vocalizer logs errors to a Vocalizer log file. This allows the application to also receive these notifications for reporting to an application-specific log. This is only called when it is not NULL, <log_cb_enabled> is true in the Vocalizer configuration file, and <log_level> is set to enable diagnostic messages in the Vocalizer configuration file.
szVoiceModel	(Optional, can be NULL.) Voice model or operating point. See Supporting existing applications.
TtsResolveURICb	(Optional, may be NULL.) Application-defined callback invoked before the engine fetches an external resource specified through a URI.
TtsAllowVoiceSwitchCb	(Optional, may be NULL.) Application-defined callback invoked whenever the application changes voices during synthesis.

TTS_SPEAK_DATA

TTS_SPEAK_DATA is used when calling TtsProcessEx. It describes the location of the input data for a text-to-speech action and its properties. Note that it is still possible to use the source callback method (in which case, set the uri and data structure members to NULL).

typedef struct TTS_SPEAK_DATA {

 LH_CHAR* uri;

 LH_VOID* data;

 LH_U32 lengthBytes;

 LH_CHAR* contentType;

 HTTSMAP fetchProperties;

 HTTSVECTOR fetchCookieJar;

} TTS_SPEAK_DATA;

Name	Value
uri	String specifying the location of the input data. This can be an http address (http://) or a file name (regular or with file://). Set the uri member to NULL to indicate that the input data is provided via the data member or the source callback.
data	Pointer to a buffer containing the input text. This structure member is used only when uri is NULL. Set both uri and data to NULL to use the source callback function.
lengthBytes	The length of the data buffer in bytes. Set this to 0 if the data field is set to NULL.
contentType	Specifies the MIME content type of the data. The string is case-sensitive. This string is required when specifying data using the data member. This string is optional when specifying data using the uri member: NULL indicates automatic detection of the MIME content type from the URI fetch. Non-NULL overrides the MIME content type from the URI fetch. For http access the web server returns the MIME content type. For file:// access the XML configuration parameter inet_extension_rules maps the file’s extension to a MIME content type. This string is optional, but strongly recommended when specifying data using the source callback method. NULL is only supported for backward compatibility with older releases. It assumes plain text in a language-specific character set. Supported values: "application/synthesis+ssml" (preferred) or "text/xml" for a W3C SSML document "text/plain;charset=charset" for a plain text document, where charset is replaced by a character set name. Recommended character sets are Unicode UTF-16 (which Vocalizer uses internally) and UTF-8, but Vocalizer also supports a broad variety of character sets—see the description below.
fetchProperties	Sets the properties of the fetch. The properties are stored in a map. The following functions manipulate this map: TtsMapCreateNVS, TtsMapDestroyNVS, TtsMapSetCharNVS, and TtsMapGetCharNVS. See the lh_inettypes.h header file for available properties.
fetchCookieJar	Reserved for future use; pass NULL.

Vocalizer does its internal processing using Unicode UTF-16. When the input text is in a different character set, Vocalizer transcodes it to UTF-16 at the start of its processing. The following table lists some common supported character sets.

Character set	Languages	Notes
UTF-8	All languages
UTF-16	All languages	This is the recommended character set, because Vocalizer uses UTF-16 for its internal processing. (If UTF-16 is not convenient, UTF-8 is the next best choice.) If the byte-order mark is missing, big-endian is assumed.
ISO-8859-1	Western languages
windows-1252	Western languages
EUC-jp (synonym: EUC)	Japanese
Shift-JIS	Japanese

A third-party component called IBM ICU is used to transcode the input character set to the native UTF-16 character set. It supports a very broad range of character sets.

To learn about the character sets for the contentType parameter, visit the IANA website at www.iana.org/assignments/character-sets.

TTS_USRDICT_DATA

Describes the properties of a dictionary instance when calling TtsLoadUsrDictEx.

typedef struct TTS_USRDICT_DATA {

 LH_U32 version;

 LH_CHAR * uri;

 LH_VOID * data;

 LH_U32 lengthBytes;

 LH_CHAR * contentType;

 HTTSMAP fetchProperties;

 HTTSVECTOR fetchCookieJar;

} TTS_USRDICT_DATA;

Field	Description
version	Structure version to allow forward compatibility with future releases. Use TTS_CURRENT_VERSION.
uri	String specifying the location of the dictionary. This can be an http address (http://) or a filename (regular or with file://). Set the uri member to NULL to indicate that the input data is read from the data member.
data	Pointer to a buffer containing the user dictionary data. This structure member is used when uri is NULL.
lengthBytes	The length of the data buffer in bytes. Specify 0 when the data member is NULL.
contentType	String that specifies the MIME content type of the user dictionary. This string is required when specifying data using the data member. This string is optional when specifying data using the uri member: NULL indicates automatic detection of the MIME content type from the URI fetch. Non-NULL overrides the MIME content type from the URI fetch. For http access the web server returns the MIME content type. For file:// access the XML configuration parameter inet_extension_rules maps the file’s extension to a MIME content type. Supported value: "application/edct-bin-dictionary" (TTS_MIME_USRDICT_BINARY) for a Vocalizer binary-format user dictionary.
fetchProperties	Used to set the properties of the fetch. The properties are stored in a map. The following functions manipulate this map: TtsMapCreateNVS TtsMapDestroyNVS TtsMapSetCharNVS TtsMapGetCharNVS See the lh_inettypes.h header file for available properties.
fetchCookieJar	Reserved for future use; pass NULL.

TTS_VOICE_INFO

The TTS_VOICE_INFO structure is used by TtsGetVoiceList to return information about an installed TTS engine voice.

typedef struct TTS_VOICE_INFO {

 LH_CHAR szVersion[TTS_MAX_STRING_LENGTH];

 LH_CHAR szLanguage[TTS_MAX_STRING_LENGTH];

 LH_CHAR szLanguageIETF[TTS_MAX_STRING_LENGTH];

 LH_CHAR szLanguageTLW[4];

 LH_CHAR szVoice[TTS_MAX_STRING_LENGTH];

 LH_CHAR szAge[TTS_MAX_STRING_LENGTH];

 LH_CHAR szGender[TTS_MAX_STRING_LENGTH];

 LH_CHAR szVoiceModel[TTS_MAX_STRING_LENGTH];

 LH_U16 nFrequency;

 LH_BOOL bVop;

 LH_CHAR szForeignLanguagesIETF[TTS_MAX_STRING_LENGTH];

} TTS_VOICE_INFO;

Field	Description
szVersion	Voice version number, such as 5.2.0.7151
szLanguage	Language name, such as American English
szLanguageIETF	IETF language code, such as en-us
szLanguageTLW	Three-letter language code as used in user dictionaries and rulesets, such as ENU
szVoice	Voice name, such as Samantha
szAge	Voice age, such as Adult
szGender	Voice gender: Male, Female, or Neutral
szVoiceModel	Vocalizer internal name for the synthesizer technology
nFrequency	Voice sampling rate: TTS_FREQ_8KHZ TTS_FREQ_11KHZ (currently not supported) TTS_FREQ_22KHZ
bVop	A boolean flag to indicates whether the voice format: 0 is the legacy format (for example, full_vssq5f22) 1 is the newer VOP format (xpremium-high or xpremium)
szForeignLanguagesIETF	A comma-separated list of foreign languages supported by a voice. Each language code has 5 chars. For example: sp_mx,fr_ca. If the voice is not multilingual, the string is empty.

Defined data types

Related topics