Skip to main content

Truebar API v2.8

New in this version#

Introduction#

This guide covers basic functionality of the Truebar service. It provides an overview of the authentication process followed by the steps required to perform speech recognition.

The Truebar service provides both, online (streaming) and offline transcription API. The online API is based on bidirectional communication between a client and server, and is implemented with web socket protocol. Clients, using this protocol, stream audio to the server, while the server is responding by sending back associated transcripts. The online API is useful for realtime transcription of live audio stream from sources such as microphone. More information about the online transcription can be found here.

In contrast to the online API, the offline API is based on the use of conventional HTTP and provides an easy way for transcribing audio data when immediate results are not needed. The offline API is particularly useful for transcribing audio files. More information can be found in this section.

For an initial test of your user account and associated privileges, you can use the Truebar, dictation and transcript editor, contact support at support@true-bar.si. Please make sure to first read the QuickStart Guide (Eng, Slo).

Authentication#

All requests that are sent to the Truebar service must contain a valid access token, otherwise they will get denied with HTTP 401 UNAUTHORIZED status code. Access tokens can be acquired by sending the following HTTP request to the authentication server:

curl --location --request POST 'https://<auth-host>/auth/realms/truebar/protocol/openid-connect/token' \    --header 'Content-Type: application/x-www-form-urlencoded' \    --data-urlencode 'grant_type=password' \    --data-urlencode 'username={username}' \    --data-urlencode 'password={password}' \    --data-urlencode 'client_id=truebar-client'

A response is received in JSON format with the following structure:

{    "access_token": "<string>",    "expires_in": "<number>",    "refresh_expires_in": "<number>",    "refresh_token": "<number>",    "token_type": "<string>",    "not-before-policy": "<number>",    "session_state": "<string>",    "scope": "<string>"}

Each request that is sent to the Truebar service, must be authenticated via an access_token. The token can be specified as a Bearer token inside the HTTP request header or as an URL parameter. It is recommended that it is specified via a Bearer to avoid being caught into server logs or browser's history. The following code demonstrates how to send a request to the backend and pass a token via an Authorization header:

curl --location --request GET 'https://<api-host>/...' \--header 'Authorization: Bearer {access_token}'

Despite the above reasons to pass tokens via Authorization headers, it sometimes becomes necessary to use URL parameters instead. Such example would be an application that does not allow to change request headers. In such cases, access tokens might be sent to the backend over URL parameters:

curl --location --request GET 'https://<api-host>/...?access_token={access_token}'

access_token is valid for the duration of expires_in seconds. After that time, it expires. To be used further, it must be refreshed using the refresh_token:

curl --location --request POST 'https://<auth-host>/auth/realms/truebar/protocol/openid-connect/token' \    --header 'Content-Type: application/x-www-form-urlencoded' \    --data-urlencode 'grant_type=refresh_token' \    --data-urlencode 'refresh_token={refresh_token}' \    --data-urlencode 'client_id=truebar-client'

refresh_token is valid for the duration of refresh_expires_in seconds. When this time period elapses, the refresh token becoms invalid and the only way to acquire a new token is by using user credentials (i.e. username and password).

User roles#

To use the services that are available through the Truebar API, different levels of permissions are required. These are binded to user roles which are encoded within a JWT token. If the supplied JWT token does not contain user roles that are required by the service, the request will get denied with 403 FORBIDDEN response code.

There are currently five user roles available to a regular user:

  • LIVE_STREAMER,
  • DICTIONARY_READER,
  • DICTIONARY_WRITER,
  • MODEL_UPDATER_READER,
  • MODEL_UPDATER_WRITER.

To check out which roles are associated with your user account, a JWT parser is needed. Use any of the free libraries that are available for this purpose or try with the following example:

let token = JSON.parse(atob(token.split('.')[1]));/*token = { "realm_access": {    "roles": ["LIVE_STREAMER", "DICTIONARY_READER", etc]  }  ...other properties omitted for readability}*/

Configuration#

Each user has his/her own configuration that is stored on the server and is persisted between sessions. The configuration contains all the information required to start a transcription session. The steps below are optional, i.e. they are only required if user wants to read or modify existing configuration.

Reading configuration#

The following example shows how to check the configuration settings that are currently active:

Request:

curl --location --request GET 'https://<api-host>/api/client/configuration' \--header 'Authorization: Bearer <access_token>'

Response:

{    "nlp": {        "enableDenormalization": {            "isAllowed": "<boolean>",             "value": "<boolean>"        },        "enableTruecasing": {            "isAllowed": "<boolean>",             "value": "<boolean>"        },        "punctuation": {            "enabled": {                "isAllowed": "<boolean>",                 "value": "<boolean>"            },            "model": {                "isAllowed": "<boolean>",                 "value": "string"            },            "enableOnInterimTranscripts": {                "isAllowed": "<boolean>",                "value": "<boolean>"            },            "enableRealFinals": {                "isAllowed": "<boolean>",                "value": "<boolean>"            },        },        "translation": {            "enabled": {                "isAllowed": "<boolean>",                 "value": "<boolean>"            },            "language": {                "isAllowed": "<boolean>"            }        }    },    "stt": {        "framework": {            "isAllowed": "<boolean>",             "value": "string"        },        "language": {            "isAllowed": "<boolean>",             "value": "string"        },        "domain": {            "isAllowed": "<boolean>",             "value": "string"        },        "model": {            "isAllowed": "<boolean>",             "value": "string"        },        "enableDiarization": {            "isAllowed": "<boolean>",             "value": "<boolean>"        },        "enableDictatedCommands": {            "isAllowed": "<boolean>",             "value": "<boolean>"        },        "enableDictatedPunctuations": {            "isAllowed": "<boolean>",             "value": "<boolean>"        },        "enableInterimTranscripts": {            "isAllowed": "<boolean>",             "value": "<boolean>"        },        "enableUnks": {            "isAllowed": "<boolean>",             "value": "<boolean>"        }    }}

The above response contains two sections, one that pertains to the speech to text options (stt) and the other that includes various settings related to the pre/post processing (nlp).

The model that is used when the service is invoked, is defined by the following keys in the stt section:

  • framework: it tells which framework the model belongs to (e.g. KALDI, NEMO, etc.),
  • language: it tells the language the model supports (e.g. sl-SI, en-US, etc.),
  • domain: it tells the domain the model is specialized for (e.g. MED, LAW, COL, etc.), and
  • model: it tells the identification of the model. Usually in format like: 20220829-0914.

The other settings that can be found within the stt section, include the following features:

  • enableDiarization: it tells whether to detect speakers within the resulting transcript,
  • enableDictatedCommands: it tells whether the dictation of commands is enabled,
  • enableDictatedPunctuations: it tells, whether the dictation of punctuations is enabled,
  • enableInterimTranscripts: it tells whether the service should send back interim transcripts or just finals,
  • enableUnks: it tells, whether to identify unkrecognised tokens in speech with the <unk> symbol.

The nlp section contains four additional features that support post processing of transcripts. These are:

  • enableDenormalization: it tells whether to apply inverse normalization on the resulting transcript,
  • enableTruecasing: it tells whether to perform trucasing on the resulting transcript,
  • punctuation: it tells whether to auto punctuate the resulting transcript and
  • translation: it tells whether to perform translation on the resulting transcript.

The punctuation and translation features provide additional subkeys that allow for the configuration of the postprocessing in the way the user requires.

Each of the above described features come with two subkeys, isAllowed and value. The isAllowed key tells whether the feature is allowed for the user,
while the value key determines the current setting of the feature.

The user can change the value if and only if the isAllowed key is set to True. The isAllowed keys are preset and cannot be changed by the user!

Updating configuration#

If the client wants to update his configuration, he can do that by sending PATCH request to the same endpoint as above. The request body must be JSON formatted and must contain properties matching those returned by the request (see the example above). Options that are not allowed to be changed by the user, will be ignored (e.g. if enableDictatedCommands is false, then setting its value will have no effect).
Properties that are not included in the request body, will not be modified.

The below example shows an example of request to modify the configuration so that the punctuation feature will be turned on.

curl --location --request PATCH 'https://<api-host>/api/client/configuration' \--header 'Authorization: Bearer {access_token}' \--header 'Content-Type: application/json' \--data-raw '{    "nlp": {        "punctuation": {            "enabled": {                "value": true            }        }    }}'

Reading available options and features#

If the user wants to check the STT models that are available within his/her user account, the following endpoint can be used:

curl -X 'GET' \  'https://<api-host>/api/stt/status' \  -H 'accept: */*'

In the response, all available models will be listed. The response comes as a nested JSON object that includes: for each framework, all supported languages, for each language all supported domains, and for each domain, all supported models.

{  "frameworks": [    {      "code": "string",      "isAllowed": true,      "languages": [        {          "code": "string",          "isAllowed": true,          "domains": [            {              "code": "string",              "isAllowed": true,              "models": [                {                  "code": "string",                  "isAllowed": true,                  "isAvailable": true,                  "isRealtime": true,                  "dictationCommandsSupport": true,                  "diarizationSupport": true,                  "metadata": {                    "workers": {                      "available": 0,                      "active": 0                    },                    "info": {                      "am": "string",                      "lx": "string",                      "lm": "string",                      "src": "string",                      "framework": "string"                    }                  }                }              ]            }          ]        }      ]    }  ]}

Online transcription#

Creating connection#

The Truebar service provides a real time transcription of audio data. This is achieved by sending small pieces of audio data (chunks) trough a websocket connection. Given that websocket is an asynchronous bidirectional protocol, the transcripts can be returned as soon as they are available.

The following example shows how to establish a Websocket connection from javascript:

let ws = new Websocket("wss://<api-host>/ws?access_token={access_token}&session_id={session_id}")

Both URL parameters in the above example are optional. Despite that, access_token must be included in the connection attempt in one way or another. If Websocket client supports modifying HTTP headers, then access_token can be sent as a bearer token via an Authorization header. Otherwise it must be sent using an access_token URL parameter.

If session_id parameter is given, the server will search for a session with the given id and resume it, in case it exists. Otherwise, if the parameter is not given, then request will create a new session on the fly. Each created session can be resumed multiple times if session_id parameter is given to websocket connection request. Resuming existing session creates a new recording entity on server but it does not create a new session entity. All recordings created as part of existing session or new session, can be later accessed individually or as a part of the session to which they belong.

If the connection attempt is denied or an error happens before the connection is actually upgraded to Websocket, then the response will be returned as a HTTP response with appropriate error code and description. In case an error happens after establishing Websocket connection, the connection will automatically be closed by the server with a close status 1011 an error description.

After connection is successfully established, the server will send send a message containing basic information about the session:

{    "sessionId" : "<number>",    "isNew" : "<boolean>",    "previousRecordings" : "<number>",    "totalRecordedSeconds" : "<number>"}

In case session_id parameter was given in the connection attempt then sessionId will be the same, with isNew flag set to false. Otherwise if session_id parameter was not given then isNew will be set to true with sessionId representing automatically generated unique session identifier, that can later be used to resume that session.

In case session_id parameter is specified in the connection attempt, sessionId will remain unchanged with isNew flag set to False. Otherwise, if the session_id parameter is not provided, isNew will be set to True with an unique auto-generated session identifier sessionId for further reference – i.e. to resume the session.

previousRecordings and totalRecordedSeconds tell how many times the session was resumed in the past and a total length of all recordings associated with given session, respectively.

Streaming audio data#

After a Websocket connection has been successfully established, the user can begin sending audio chunks. Audio chunks must be sent as binary websocket messages with a maximum size of 64KB. Each chunk must contain an array of 16 bit little-endian encoded integers, sampled at 16 kHz, which is also known as S16LE PCM encoding. Only single channel audio is supported.

To ensure realtime transcription, transcripts are asynchronously returned to the client as soon as they are available. They are formatted as JSON encoded text messages:

{    "decoded" : "<number>",    "isFinal" : "<boolean>",    "transcript" : {        "id" : "<number>",        "content" : "[            {                "text" : "<string>",                "startTime" : "<number>",                "endTime" : "<number>",                "confidence" : "<number>",                "speakerCode" : "<string>" // Present only if automatic speaker detection is enabled            },            {                "text" : "<string>",                "startTime" : "<number>",                "endTime" : "<number>",                "confidence" : "<number>",                "speakerCode" : "<string>" // Present only if automatic speaker detection is enabled            }            , ...        ]"    }}

Note that content is JSON encoded string and not object. There are basically two types of transcripts, interims and finals, the information which are marked with isFinal flag. While chunks of audio are being decoded on the backend, the transcript is getting updated with new content. Every time the transcript is updated, it is returned as an interim transcript to the client. When a certain transcript length is reached or when there are no new words recognized for some time, the transcript is returned as final. When transcript is returned as final, it means that it's content will not update anymore. Instead, a new content will be returned as an interim transcript, which again will be incrementally updated until it becomes final. The following example illustrates the above procedure:

  • Interim : Da
  • Interim : Danes
  • Interim : Danes je
  • Interim : Danes je lepo
  • Interim : Danes je lepo da
  • Interim : Danes je lep dan
  • Final : Danes je lep dan.
  • Interim : Jutri
  • Interim : Jutri pa
  • Interim : Jutri pa bo
  • Interim : Jutri pa bo Å¡e lepÅ¡i
  • Final : Jutri pa bo Å¡e lepÅ¡i.

Punctuations and Commands#

The Truebar transcription service can handle punctuations and commands. The punctuation is supported in two ways: a) Using an automatic punctuation service – in this case, the punctuations like comma, period, question mark etc. will be set in the transcript automatically by the punctuation service. b) By dictation – when this option is selected, the punctuation symbols are expected to be dictated by the user (to check how to pronounce a specific punctuation, refer to the QuickStart Guide (Eng, Slo).

Which option the user wants to use can be specified through the Configuration. The two options are not mutually exclusive, i.e. they can be both used at the same time. In the transcript returned by the Truebar transcription service, the punctuation symbols, either set by the automatic service or dictated, are treated as stand-alone objects and returned as symbols, closed in sharp brackets, e.g. <.> for a full stop or <?> for a question mark. Apart from the punctuations, the Truebar transcription service can handle specific commands that can be user-dictated. For example, a command for making a new line within the transcript or new paragraph. Other commands include uppercase, lowercase, capital letter, delectation, substitution etc. For the full list please check the QuickStart Guide (Eng, Slo). Similarly, to the punctuation symbols, all the supported commands are returned as stand-alone objects, closed in sharp parenthesis. For example <nl> for a new line, <np> for a new paragraph, <uc> for the beginning of all upper cased transcript, </uc> for the end of upper transcript etc.

Closing connection#

Transcription can be completed in either of the following two ways: a) by closing websocket from the client side or b) by sending an empty binary message.

When closing websocket connection from the client side, it may happen that a short audio chunk at the end will not get transcribed. This will happen when some audio chunks the backend has already received, were not yet processed at the time the client closed the connection. Nevertheless, these chunks will be stored in the database and will later be accessible by an API call.

In case the client can wait until the transcription is complete, the second approach is more appropriate. With this approach, the client sends an empty audio chunk indicating that there is no more data to be sent. From this time on no more audio chunks are accepted by the server, but transcription of already received audio chunks will continue until all audio chunks are processed. When all transcripts are sent to the client, the server will automatically close the websocket session with status code 4500 indicating that the session was completed without error.

Offline transcription#

To transcribe an audio file offline, the following HTTP API is available:

Request:

curl --location --request POST 'https://<api-host>/api/client/upload?async=<true|false>' \--header 'Authorization: Bearer <access_token>'--form 'file=@"/path/to/audio.wav"'

The endpoint accepts WAVE formatted audio files that must be encoded as S16LE PCM, 16bit, 16kHz. There are two different modes of operation available, synchronous and asynchronous, which can be selected via the async request parameter.

Synchronous mode#

When using the synchronous mode, the request is blocked until the transcription result is ready. The response will contain the following data:

{  "sessionId": <number>,        // Session identifier  "transcripts": [              // List of transcripts    {      "id": <number>,           // Transcript identifier      "content": <string>       // JSON encoded transcript content    }  ]}

Synchronous calls require that underlying TCP connection is kept open until the transcription result is ready. When uploading large audio files, transcription times can exceed network timeouts and can cause TCP connection to be dropped before the transcription is complete and the result is send back to the client. The synchronous mode should thus be used for short audio files only.

Asynchronous mode#

The asynchronous operation mode can be selected with the async request parameter set to true. When using this mode, the response is returned immediately after the file has been successfully uploaded. The response contains the session identifier which can be further used to pull the session status from the server. The following endpoint is available for this purpose:

Request:

curl --location --request GET '<api-host>/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'

Response:

{      "id": "<number>",                   // Session identifier      "name": "<string>",                 // Name of the session      "status" : "ERROR | INITIALIZING | IN_PROGRESS | FINISHING | FINISHED",      "numRecordings": "<number>",        // Number of recordings created under the same session      "recordedSeconds": "<number>",      // Total seconds recorded      "processedSeconds": "<number>",     // Total seconds recorded (transcribed)      "createdAt": "<string>",            // Creation date and time      "updatedAt": "<string>",            // Date and time of last update (eg. new recording)      "createdBy": {                      // Username under which the session was created        "id": "<number>",        "username": "<string>"      },      "isSaved": "<boolean>",                   // Not relevant for API users - always false      "isDiscarded": "<boolean>",               // Not relevant for normal users - always false      "allRecordingsDiscarded": "<boolean>",    // Not relevant for normal users - always false      "notes": "<string>",                      // Session notes      "labels": [                               // List of labels assigned to session        {          "isEnabled": "<boolean>",          "label": {            "id": "<number>",            "code": "<string>",            "color": "<string>",            "isDefault": "<boolean>",            "isAssigned": "<boolean>"          }        }      ]    }

The status property of the above object can be used to determine the transcription status. Statuses INITIALIZING, IN_PROGRESS, FINISHING are transient and can change while uploaded audio file is still processing. After the transcription has been completed, the session gets one of the following statuses: FINISHED or ERROR. Transcription progress in seconds of processed audio can alo be monitored with processedSeconds field.

The transcript of the uploaded audio file can be accessed trough the endpoint described here and can be invoked even while the transcription is still in progress. In the later case, the service will return a partial transcript of the uploaded audio file.

History#

List of sessions#

All sessions created trough Websocket or HTTP API can later be accessed trough HTTP API endpoints described in this section. All endpoints described here are protected using JWT token. For details about obtaining a valid JWT token, please refer to the section Authentication. After a valid token has been acquired, it has to be included in all request that follow.

The following example shows how to obtain a list of past sessions:

curl --location --request GET '<api-host>/api/client/sessions' \--header 'Authorization: Bearer <access_token>'

Response:

{  "totalPages": "<number>",       // Total number of pages available  "totalElements": "<number>",    // Total number of elements available on all pages  "first": "<boolean>",           // True if this is the first page  "last": "<boolean>",            // True if this is the last page  "number": "<number>",           // Current page number  "size": "<number>",             // Number of elements on current page  "empty": "<boolean>",           // True if this page is empty  // ...Some page properties omitted for readability  "content": [                    // List of sessions    {      "id": "<number>",                   // Session identifier      "name": "<string>",                 // Name of the session      "status" : "ERROR | INITIALIZING | IN_PROGRESS | FINISHING | FINISHED",      "numRecordings": "<number>",        // Number of recordings created under the same session      "recordedSeconds": "<number>",      // Total recorded seconds      "processedSeconds": "<number>",     // Total seconds recorded (transcribed)      "createdAt": "<string>",            // Creation date and time      "updatedAt": "<string>",            // Date and time of last update (eg. new recording)      "createdBy": {                      // Username under which the session was created        "id": "<number>",        "username": "<string>"      },      "isSaved": "<boolean>",                   // Not relevant for API users - always false      "isDiscarded": "<boolean>",               // Not relevant for normal users - always false      "allRecordingsDiscarded": "<boolean>",    // Not relevant for normal users - always false      "notes": "<string>",                      // Session notes      "labels": [                               // List of labels assigned to session        {          "isEnabled": "<boolean>",          "label": {            "id": "<number>",            "code": "<string>",            "color": "<string>",            "isDefault": "<boolean>",            "isAssigned": "<boolean>"          }        }      ]    }  ]}

The endpoint implements a pageable interface which means that the response will contain only a single page of results and not the entire list. By default the page size is set to 20 elements. A specific page size and page number can be requested by setting custom page and size request parameters. Results can also be sorted by using sort request parameter and by providing one of the following properties:

  • id
  • name
  • createdAt,
  • updatedAt,
  • numRecordings
  • recordedSeconds

Sessions can also be filtered by providing any of the following optional request parameters:

  • name : Return only sessions with names containing the given value,
  • label : Return only sessions that are labeled with the given label,
  • created-after : Return only sessions that were created after the given date and time,
  • created-before : Return only sessions that were created before the given date and time.

The following example shows the request that returns first 30 sessions containing word "Test" in their names and sorts the results by creation time ascending:

curl --location --request GET '<api-host>/api/client/sessions?page=0&size=30&name=Test&sort=createdAt,asc' \--header 'Authorization: Bearer <access_token>'

Specific session details#

Request:

curl --location --request GET '<api-host>/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Response:

Same as single entry from the list of returned sessions described above.

Audio file#

Returns audio file created within the given session. The request supports HTTP resource region specification which allows the file to be transferred by smaller pieces. This allows audio players to stream audio file instead of transferring the whole file before playing it. Like any other request, this request also supports passing access_token as a request parameter instead of passing it as a Bearer token inside Authorization header. This can be useful if requesting the audio file directly from audio player which does not support modification of request headers.

Request:

curl --location --request GET '<api-host>/api/client/sessions/<session-id>/audio.wav' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Session transcripts#

Returns list of transcripts generated within the given session.

Request:

curl --location --request GET '<api-host>/api/client/sessions/<session-id>/transcripts' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Response:

[    {        "id":"<number>",            // Transcript identifier        "content":"<JSON string>"   // JSON encoded transcript content with the same structure as transcripts received trough Websocket API.    },    ...]

Session recordings#

Returns list of recordings created within the given session. The request supports a pageable interface. For the list of available request parameters regarding pageable interface check List of sessions section.

Request:

curl --location --request GET '<api-host>/api/client/sessions/<session-id>/recordings' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Response:

{  "totalPages": "<number>",       // Total number of pages available  "totalElements": "<number>",    // Total number of elements available on all pages  "first": "<boolean>",           // True if this is the first page  "last": "<boolean>",            // True if this is the last page  "number": "<number>",           // Current page number  "size": "<number>",             // Number of elements on current page  "empty": "<boolean>",           // True if this page is empty  "content": [        {            "id": "<number",                                    // Recording identifier            "duration": "<number>",                             // Length of the recording in seconds            "isDiscarded": "<boolean>",                         // Should always be true for normal users            "transcriptionLanguage": "<string",                 // Selected settings when recording was created...            "transcriptionDomain": "<string",            "transcriptionModelVersion": "<string",            "transcriptionEndpointingType": "<string",            "transcriptionDoInterim": "<boolean>",            "transcriptionDoPunctuation": "<boolean>",            "transcriptionDoInterimPunctuation": "<boolean>",            "transcriptionShowUnks": "<boolean>",            "transcriptionDoDictation": "<boolean>",            "transcriptionDoNormalisation": "<boolean>",            "translationEnabled": "<boolean>"        }    ]  }

Session sharing#

Sessions are by default only visible to the user that created them. Session sharing mechanism allows the user to share his sessions with other users that belong to the same group. There are several endpoints that can be used to manage session shares.

List session shares#

Request:

curl --location --request GET '<api-host>/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>'

Response:

[    {        "id": "<number>",           // share id        "createdAt": "<string>",        "sharedWith": {            "id": "<number>",       // user id            "username": "<string>"        }    },    ...]

Adding session share#

List of users whom the session can be shared with can be obtained by the following request.

Request:

curl --location --request GET '<api-host>/api/client/users' \--header 'Authorization: Bearer <access_token>'

Response:

[    {        "id": "<number>",        "username": "<string>"    },    {        "id": "<number>",        "username": "<string>"    },    ...]

Session can be shared with selected user with the following request:

Request:

curl --location --request POST '<api-host>/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{    "userId" : <user-id>}'

Delete session share#

curl --location --request DELETE '<api-host>/api/client/sessions/shares/<share-id>' \--header 'Authorization: Bearer <access_token>' \

Dictionaries#

Speech to text models that are used in the Truebar STT services are based on dictionaries. Dictionaries keep data on terms that are used in speech and their pronunciations. Each term is further specified with a relative frequency telling how often the term is expected to occur in speech (this is only an estimation).

Dictionaries are domain-dependant as the terminology used in different domains is different. Medical doctors for instance use much different terminology in speech than lawyers. For this reason, dictionaries are bound to their models while the models fit to specific domains.

On the other hand, dictionaries are user independent. This means that when somebody makes a change in dictionary of a specific model, this change will be visible to all users with access to this model.

All models that are available in Truebar, e.g., MED, LAW, COL etc., come with prebuild dictionaries that already include majority of terms common for their domains. Nevertheless, there will always be terms that are missing. Truebar API offers a set of operations that can help in this regard. With specific endpoints you can add new terms and their pronunciations or update existing entries in the dictionary.

To use these features different roles are required. The role DICTIONARY_READER will allow you to read dictionary entries. To make changes this will not suffice. You will also need the role DICTIONARY_WRITER. See User roles section for more.

Below are the main endpoints the Truebar API offers for maintaining dictionary entries.

Inserting new terms to dictionary#

Required role: DICTIONARY_WRITER

This is the request syntax to add a new term with corresponding pronunciations and frequency classes to a dictionary. Make sure that for each term at least one pronunciation is provided otherwise the request will get rejected.

Request:

curl --location --request POST '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/words' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{    "text" : "<string>",    "frequencyClassId" : "<number>",     // For a list of available frequencyClasses, see /frequencyClasses endpoint    "pronunciations": [        {            "text" : "<string>"        }    ]}'
  • language-code: model language
  • domain-code: model domain
  • model-version: model version

Inserting and updating pronunciations#

Required role: DICTIONARY_WRITER

To add new or update existing pronunciations of dictionary terms, you can use the same endpoint as for inserting new terms. Existing pronunciations will be replaced with those you specify in the request.

Obtaining frequency classes#

Required role: DICTIONARY_READER

In order to specify a frequency class when inserting a new term to dictionary, use this endpoint to obtain the list of available frequency classes:

Request:

curl --location --request GET '<api-host>/api/client/dictionary/frequencyClasses' \--header 'Authorization: Bearer'

Response:

[    {        "id": "<number>",        "label": "<string>"    },]

Obtaining phonems#

Required role: DICTIONARY_READER

Pronunciations of terms must be specified with symbols that describe phonems. Phonems are small units of sound that distinguish words among each other. They are usually written with letters. There are two groups of phonems, silent and non-silent. The list of available phonems, silen and non-silent, is defined within the model. When providing new or updating existing pronunciations, only the allowed phonems may be used. To obtain the list of phonems, use the following endpoint:

Request:

curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/phone-set' \--header 'Authorization: Bearer <access_token>' \
  • language-code: model language
  • domain-code: model domain
  • model-version: model version

Response:

{    "nonsilentPhones": "<string>",    "silentPhones": "<string>"}

Removing terms from dictionary#

Required role: DICTIONARY_WRITER

To delete a specific term from a dictionary, use the following endpoint:

Request:

curl --location --request DELETE '<api-host>/api/client/dictionary/words/<word-id>' \--header 'Authorization: Bearer <access_token>'
  • word-id: Unique identifier of the term

Searching in dictionary#

Required role: DICTIONARY_READER

The request below demonstrates how to search for a specific term in a dictionary:

Request:

curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/words?text=<text>' \--header 'Authorization: Bearer <access_token>'
  • language-code: model language
  • domain-code: model domain
  • model-version: model version
  • text: text of a term to search for

Response:

{    "id": "<number>",    "frequencyClassId": "<number>",    "status": "NEW | IN_PROGRESS | IN_DICTIONARY",    "text": "<string>",    "pronunciations": [        {            "saved": "<boolean>",            "text": "<test>"        }    ]}

This endpoint allows users to search the dictionary for a given term. The expected response status is always 200 even if the term is not found in the dictionary. The property named status which can be found in the response, will tell you the status of the term. The status can have one of the following values:

  • NEW - the term is new, i.e. it is not yet in the dictionary;
  • IN_PROGRESS - the term has been added to the dictionary and will be included in the next version of the model;
  • IN_DICTIONARY - the term is in the dictionary and is included in the current version of the model.

Other properties are included in the response based on the status:

  • id - Unique word identifier (if the word is found in the dictionary);
  • frequencyClassId - identifier of the frequency class assigned to the term (if the term is found in dictionary);
  • text - label of the term, i.e. the way the term is written;
  • pronunciations - the list of pronunciations for the term. Each entry comes as a nested object and has a property saved that tells if the pronunciation has been assigned to the term manually or it has been generated automatically. For new terms all returned pronunciations will be automatically generated. In other cases, the response will potentially have both, manually added and automatically generated pronunciations.

Rebuilding models#

The changes in the dictionary will not make any effect on speech recognition until the model (the language part of the model) is rebuilt, i.e. updated to a new version. This is time consuming operation that can take an hour or more. During the update process, users can continue to work with the existing version of the model. Once a new version is created, it becomes available through the configuration update (see Configuration). To use the newly created version, close existing sessions, update the configuration to select the new model version, and start a new session.

For rebuilding models specific roles are required. To trigger the rebuilt process, you will need the role MODEL_UPDATER_WRITER. To check the status of rebuilding process that is in progress, you will also require the role MODEL_UPDATER_READER. See User roles section for more.

The request below shows how to trigger the rebuild process:

Request:

curl --location --request POST '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/regeneration/regenerate' \--header 'Authorization: Bearer <access_token>' \
  • language-code: model language
  • domain-code: model domain
  • model-version: model version

The operation is asynchronous. You can check the update status via /regeneration/status. See the following example:

Request:

curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/regeneration/status' \--header 'Authorization: Bearer <access_token>' \
  • language-code: model language
  • domain-code: model domain
  • model-version: model version

Response:

{    "statusCode": "NOT_RUNNING" | "RUNNING" | "FAILED",    "enableModelUpdating": "<boolean>",         // True if the user is granted the right to update models}

Advanced features#

Apart from the basic features that are exposed through Truebar API, there are also other, more advanced features that can be used to achieve highest possible accuracy of recognition and best possible user experience. These features include services to work with substitutes, replacements and prefixed tokens.

Replacements#

You can use replacements to make user-defined labels for terms in dictionary. When Truebar STT service recognizes a term that has a user defined replacement, it makes appropriate changes in the response. Replacements take effect immediately, i.e. without the need to rebuild the model.

The best way to fine tune the model is to take care it’s dictionary comprises all terms that are used in speech. Specific attention should be paid to abbreviations, acronyms and measurement units. Make sure that all required terms are included in the dictionary and have appropriate labels and pronunciations.

Sometimes however, we want to make specific changes that are user dependent. For instance, we want the measurement unit for meters per seconds to be written as m/sec while other users prefer to use m/s as the measurement label. Such user-dependent changes of the dictionary entries are handled through replacements. These are kept separately from the dictionary and thus have no effect on other users.

To retrieve the list of active replacements, you can use the following endpoint:

Request

curl --location --request GET '<api-host>/api/client/replacements' \--header 'Authorization: Bearer <access_token>' }

Response:

[    {        "id": "<number>",        "source": [                             // List of source words            {                "spaceBefore": "<boolean>",                "text": "<string>"            },            {                "spaceBefore": "<boolean>",                "text": "<string>"            },            ...        ],        "target": {                             // Target word that will replace all source words            "spaceBefore": "<boolean>",            "text": "<string>"        },    },    ...]

To get information on a specific replacement, use this endpoint:

Request:

curl --location --request GET '<api-host>/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' }

replacement-id: replacement identification number as returned when asking for a list of replacements

Response:

{    "id": "<number>",    "source": [                             // List of source words        {            "spaceBefore": "<boolean>",            "text": "<string>"        },        {            "spaceBefore": "<boolean>",            "text": "<string>"        },        ...    ],    "target": {                             // Target word that will replace all source words        "spaceBefore": "<boolean>",        "text": "<string>"    },}

To add new replacements, you can call:

curl --location --request POST '<api-host>/api/client/replacements' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{    "source": [            {                "spaceBefore": "<boolean>,                "text": "<string>"            },            {                "spaceBefore": "<boolean>,                "text": "<string>"            },            ...        ],    "target": {            "spaceBefore": "<boolean>",            "text": "<string>"        }}'

Finally, to delete a specific replacement, call:

curl --location --request DELETE '<api-host>/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' \

replacement-id: replacement identification number as returned when asking for a list of replacements

Substitutes#

Substituter is a service that returns a list of substitutes for a given term. Substitutes of a term are other dictionary terms that are similarly written or similarly pronounced as the given term. You can use the substituter to provide the user with a list of substitutes.

Required role: SUBSTITUTOR_READER

Request:

curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/word_substitutes?text=<word_string>' \--header 'Authorization: Bearer <access_token>' \
  • language-code: model language
  • domain-code: model domain
  • model-version: model version
  • word_string: the term to find substitutes for

Response:

{    "1": "<string>",    "2": "<string>",    "3": "<string>"    ...}

Prefixed tokens#

Truebar API exposes a service that can return dictionary terms that have a specific prefix. This service can be useful when spelling terms letter by letter, to provide the user with a list of terms that correspond to the spelled letters.

Required role: SUBSTITUTOR_READER

curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/prefixed_words?prefix=<prefix_string>' \--header 'Authorization: Bearer <access_token>' \
  • language-code: model language
  • domain-code: model domain
  • model-version: model version
  • prefix_string: prefix for which we want to retrieve terms

Response:

{    "content": [        "<string>",        "<string>",        "<string>",        ...    ]}

Automatic speaker detection (experimental)#

This function can be enabled by setting transcriptionDoDiarization option inside user's configuration. It can only be enabled when the selected model supports automatic speaker detection (diarizationSupport field inside models's information returned along user's configuration).

When the option is enabled the words inside each received transcript will contain the field named speakerCode. The field hold's label ( speaker0, speaker1, etc ) witch is automatically assigned to the detected speaker.

Response statuses and errors#

HTTP API#

This section describes expected HTTP response statuses for endpoints described in this document.

Response statuses for various HTTP methods, when operation is successful:

  • GET : response status 200 OK along with the response body;
  • POST : response status 201 CREATED and automatically generated unique resource identifier in the response body;
  • PATCH : response status 204 NO_CONTENT without response body;
  • DELETE: response status 204 NO_CONTENT without response body;

Response statuses for errors:

  • 400 BAD_REQUEST Indicates that the request could not be processed due to something that is perceived to be a client error;
  • 401 UNAUTHORIZED indicates that the request has not been applied because it lacks valid authentication credentials (invalid JWT or not present);
  • 403 FORBIDDEN indicates that the server understood the request but refuses to authorize it (Insufficient permissions);
  • 404 NOT_FOUND indicates that the requested resource was not found on the server;
  • 405 METHOD NOT ALLOWED indicates that the resource does not support given request method (eg. POST request on URL that only supports GET);
  • 409 CONFLICT indicates that the resource that is being added or modified already exists on the server;
  • 415 UNSUPPORTED MEDIA TYPE indicates that the uploaded media file format is not supported;
  • 500 SERVER_ERROR indicates that the request could not be processed because of internal server error.

If not otherwise noted then any request described in this document should return one of the response statuses described above. In case there was an error processing the request, then the response body will be returned in the following format:

{    "id": "<string>",    "timestamp": "<string>",    "message": "<string>"}

The returned object contains message with short error description. In general the message together with the HTTP response status is sufficient for the client to know what has gone wrong. However that is not the case when error triggers response status 500 because the response body only contains generic error message. If this happens please contact our support with returned error id.

Websocket API#

Websocket standard already defines predefined codes that are returned to the client when Websocket connection is closed. True-Bar service implements following additional close codes:

  • 1001 GOING_AWAY
  • 1011 SERVER_ERROR
  • 4001 LANGUAGE_NOT_AVAILABLE
  • 4002 DOMAIN_NOT_AVAILABLE
  • 4003 MODEL_NOT_AVAILABLE
  • 4004 MODEL_NOT_ALLOWED
  • 4005 WORKERS_NOT_AVAILABLE
  • 4500 NORMAL

Libraries and examples#

This chapter lists available libraries that can be used to integrate with Truebar service. Currently we provide libraries for Java an Javascript programming languages.

Java library#

Java library is written in Java 11. It uses built in HttpClient to minimize number of external dependencies. It is packaged as a single .jar file that can be imported into the target project.