Application call logs

At runtime, Nuance Vocalizer writes call-log information to report TTS engine statistics for application tuning and capacity planning. When using the Vocalizer API, applications can register a callback to receive event notifications, allowing the application to write events to an application specific location or use them for online monitoring and reporting. Vocalizer also supports writing its own call log files, configured via the XML configuration file, but this is disabled by default.

Application-provided logging callbacks

For the native API, Vocalizer (optionally) logs event notifications to the application-provided logging callback TTS_LOG_EVENT_CB, when that callback is not NULL and the log_cb_enabled configuration parameter is true. (This is enabled even if the event_log_file_enabled configuration parameter is false; that parameter only controls the Vocalizer-written call log files.) Applications can use this feature to integrate Vocalizer event notifications within an application-defined logging system. For details, see TTS_LOG_EVENT_CB.

Vocalizer written call log files

Vocalizer-written call-log files are disabled by default. These files are similar to Vocalizer error and diagnostic logs: there are two call-log files at most, and there is no hierarchical directory tree of separate call logs.

Call log file format

Call logs are encoded as UTF-8. For storage and transport purposes (for example, storing in CVS and transporting via ftp), treat these files as binary.

Input text log files

Vocalizer supports logging the full input text for speak requests, which is important for analyzing the text spoken by Vocalizer and reviewing it for TTS tuning purposes. For Vocalizer-written log files, it does so by logging the input text to a separate XML format input text log file, then logging a NVOCinpt event to the call log to correlate that information with a speak request within the main call log. See the description of NVOCinpt—input text.

You can use Vocalizer configuration parameters to limit the number of simultaneous speak requests where the input text is being logged or to completely disable input text capture. When the secure_context parameter is set, input text capture is automatically disabled for the corresponding speak requests.

To configure input text logging, set the following parameters:

  • event_log_input_text_max_capture to allow disabling input capture completely, or throttling input text capture to a limited number of simultaneous speak requests to limit the performance impact and the logged data volume.
  • event_log_input_text_file_base_name to specify the XML file name used for logging input texts.
  • event_log_input_text_file_max_size to specify the maximum size for the input text log file.

(For details of these configuration parameters, see Logging parameters.)

The input text log file is a UTF-8 encoded XML file, where each input text is written using an <entry> element with a unique "id" attribute that is generated using the current date, timestamp to the millisecond, and session ID, similar to how waveform capture file names are generated. By generating the "id" attribute in this manner, the ID will be unique for at least that single system, and if the session IDs specified by the application are unique, then the ID will be unique across the entire deployment. The content of the <entry> element is the input text.

The logged input text is the original plain text or SSML input, without any modifications except for transcoding to UTF-8 for logging purposes. For Microsoft SAPI 5 based applications, Microsoft SAPI 5 XML markup is parsed by SAPI 5 before it is passed to Vocalizer, so the captured input text will be plain text with Vocalizer control sequences, rather than the original XML markup.

Note: For SSML with an encoding specified in the XML declaration, that encoding is not updated to indicate the input log file’s encoding of UTF-8. Before playing back that SSML for analysis, make sure you update the encoding attribute to specify encoding="utf-8" (or simply remove the encoding attribute).

The NVOCinpt events that cross-reference these entries report the MIME content type for the input text (MIME token), a reference to the input text (TXID token, empty if input text capture logging is disabled in the configuration file or for a secure context), and the text input size in bytes (TXSZ token). See the description of NVOCinpt—input text.

Call logs—merging distributed logs

A voice platform might have several different call-log streams depending on which products are being used. For example, each of the following components can write a call log:

  • Nuance Dialog Modules
  • Speech recognition service)
  • Audio output service (text-to-speech) engine

To merge the logs, you can set "SWI.appsessionid" and "SWI.appstepid" parameters via SSML <meta>; setting both of these parameters generates a NVOCapps event. Nuance packaged applications use these identifiers to merge component logs, including application logs, to enable analysis, tuning, and reporting. The parameters are typically set several times during a session to provide information about logical steps within the application. For example:

<meta name="SWI.appsessionid"
 content="431cc972eaa41c1a22e99ac59f5e4fa4"/> 
<meta name="SWI.appstepid" content="3"/>

Suppressing confidential data

You can use the TTS_PARAM_SECURE_CONTEXT parameter in TtsSetParamsEx to encrypt or suppress logging confidential data to the call log.

For mask-sensitive (suppress) mode, all affected events report a SECURE=mask-sensitive token and substitute the string "_SUPPRESSED" where confidential data would otherwise appear. For encrypt-sensitive mode, all affected events report a SECURE=encrypt-sensitive token and encrypt all the confidential data.

You can also set "secure_context" via an SSML <meta> element to affect the current speak request only. For example:

<meta name="secure_context" content="encrypt-sensitive"/>

Tokens used for every event

The first entries in each log record are TIME, CHAN, and EVNT; the last entries are UCPU and SCPU.

Token

Description

TIME

System time when the event occurred, in the following format (accurate to within 0.01 second): YYYYMMDDhhmmssmmm

CHAN

Unique session identification name provided in calls to TtsSessionStart or TtsSessionStartEx.

EVNT

Prefix used for event codes. Limited to 8 characters; longer names are truncated. All Vocalizer event codes are prefixed with "NVOC".

UCPU

Current running value of "user" CPU time consumed from the start of synthesis. This value is reported in milliseconds, accurate to within 0.01 second.

SCPU

Current running value of "system" CPU time consumed from the start of synthesis. This value is reported in milliseconds, accurate to within 0.01 second.

Standard events and tokens

To get call logging information, users register with the Vocalizer API so they get notifications of all the events. The user must log the token to its own logging stream: there is no Nuance logging server, no call log files created by Nuance, and no events received from other Nuance speech products (that might be running at the same time).

The following list shows groups of standard Vocalizer event codes. Production sites might also encounter events that are defined and inserted by the application.