Truebar API v2.8
#
New in this version
#
IntroductionThis guide covers basic functionality of the Truebar service. It provides an overview of the authentication process followed by the steps required to perform speech recognition.
The Truebar service provides both, online (streaming) and offline transcription API. The online API is based on bidirectional communication between a client and server, and is implemented with web socket protocol. Clients, using this protocol, stream audio to the server, while the server is responding by sending back associated transcripts. The online API is useful for realtime transcription of live audio stream from sources such as microphone. More information about the online transcription can be found here.
In contrast to the online API, the offline API is based on the use of conventional HTTP and provides an easy way for transcribing audio data when immediate results are not needed. The offline API is particularly useful for transcribing audio files. More information can be found in this section.
For an initial test of your user account and associated privileges, you can use the Truebar, dictation and transcript editor, contact support at support@true-bar.si. Please make sure to first read the QuickStart Guide (Eng, Slo).
#
AuthenticationAll requests that are sent to the Truebar service must contain a valid access token, otherwise they will get denied with HTTP 401 UNAUTHORIZED status code. Access tokens can be acquired by sending the following HTTP request to the authentication server:
curl --location --request POST 'https://<auth-host>/auth/realms/truebar/protocol/openid-connect/token' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=password' \ --data-urlencode 'username={username}' \ --data-urlencode 'password={password}' \ --data-urlencode 'client_id=truebar-client'
A response is received in JSON format with the following structure:
{ "access_token": "<string>", "expires_in": "<number>", "refresh_expires_in": "<number>", "refresh_token": "<number>", "token_type": "<string>", "not-before-policy": "<number>", "session_state": "<string>", "scope": "<string>"}
Each request that is sent to the Truebar service, must be authenticated via an access_token
. The token can be specified as a Bearer token inside the HTTP request header or as an URL parameter.
It is recommended that it is specified via a Bearer to avoid being caught into server logs or browser's history.
The following code demonstrates how to send a request to the backend and pass a token via an Authorization header:
curl --location --request GET 'https://<api-host>/...' \--header 'Authorization: Bearer {access_token}'
Despite the above reasons to pass tokens via Authorization headers, it sometimes becomes necessary to use URL parameters instead. Such example would be an application that does not allow to change request headers. In such cases, access tokens might be sent to the backend over URL parameters:
curl --location --request GET 'https://<api-host>/...?access_token={access_token}'
access_token
is valid for the duration of expires_in
seconds. After that time, it expires. To be used further, it must be refreshed using the refresh_token
:
curl --location --request POST 'https://<auth-host>/auth/realms/truebar/protocol/openid-connect/token' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=refresh_token' \ --data-urlencode 'refresh_token={refresh_token}' \ --data-urlencode 'client_id=truebar-client'
refresh_token
is valid for the duration of refresh_expires_in
seconds. When this time period elapses, the refresh token becoms invalid and the only way to acquire a new token is by using user credentials (i.e. username and password).
#
User rolesTo use the services that are available through the Truebar API, different levels of permissions are required. These are binded to user roles which are encoded within a JWT token. If the supplied JWT token does not contain user roles that are required by the service, the request will get denied with 403 FORBIDDEN response code.
There are currently five user roles available to a regular user:
- LIVE_STREAMER,
- DICTIONARY_READER,
- DICTIONARY_WRITER,
- MODEL_UPDATER_READER,
- MODEL_UPDATER_WRITER.
To check out which roles are associated with your user account, a JWT parser is needed. Use any of the free libraries that are available for this purpose or try with the following example:
let token = JSON.parse(atob(token.split('.')[1]));/*token = { "realm_access": { "roles": ["LIVE_STREAMER", "DICTIONARY_READER", etc] } ...other properties omitted for readability}*/
#
ConfigurationEach user has his/her own configuration that is stored on the server and is persisted between sessions. The configuration contains all the information required to start a transcription session. The steps below are optional, i.e. they are only required if user wants to read or modify existing configuration.
#
Reading configurationThe following example shows how to check the configuration settings that are currently active:
Request:
curl --location --request GET 'https://<api-host>/api/client/configuration' \--header 'Authorization: Bearer <access_token>'
Response:
{ "nlp": { "enableDenormalization": { "isAllowed": "<boolean>", "value": "<boolean>" }, "enableTruecasing": { "isAllowed": "<boolean>", "value": "<boolean>" }, "punctuation": { "enabled": { "isAllowed": "<boolean>", "value": "<boolean>" }, "model": { "isAllowed": "<boolean>", "value": "string" }, "enableOnInterimTranscripts": { "isAllowed": "<boolean>", "value": "<boolean>" }, "enableRealFinals": { "isAllowed": "<boolean>", "value": "<boolean>" }, }, "translation": { "enabled": { "isAllowed": "<boolean>", "value": "<boolean>" }, "language": { "isAllowed": "<boolean>" } } }, "stt": { "framework": { "isAllowed": "<boolean>", "value": "string" }, "language": { "isAllowed": "<boolean>", "value": "string" }, "domain": { "isAllowed": "<boolean>", "value": "string" }, "model": { "isAllowed": "<boolean>", "value": "string" }, "enableDiarization": { "isAllowed": "<boolean>", "value": "<boolean>" }, "enableDictatedCommands": { "isAllowed": "<boolean>", "value": "<boolean>" }, "enableDictatedPunctuations": { "isAllowed": "<boolean>", "value": "<boolean>" }, "enableInterimTranscripts": { "isAllowed": "<boolean>", "value": "<boolean>" }, "enableUnks": { "isAllowed": "<boolean>", "value": "<boolean>" } }}
The above response contains two sections, one that pertains to the speech to text options (stt) and the other that includes various settings related to the pre/post processing (nlp).
The model that is used when the service is invoked, is defined by the following keys in the stt section:
framework
: it tells which framework the model belongs to (e.g. KALDI, NEMO, etc.),language
: it tells the language the model supports (e.g. sl-SI, en-US, etc.),domain
: it tells the domain the model is specialized for (e.g. MED, LAW, COL, etc.), andmodel
: it tells the identification of the model. Usually in format like: 20220829-0914.
The other settings that can be found within the stt section, include the following features:
enableDiarization
: it tells whether to detect speakers within the resulting transcript,enableDictatedCommands
: it tells whether the dictation of commands is enabled,enableDictatedPunctuations
: it tells, whether the dictation of punctuations is enabled,enableInterimTranscripts
: it tells whether the service should send back interim transcripts or just finals,enableUnks
: it tells, whether to identify unkrecognised tokens in speech with the<unk>
symbol.
The nlp section contains four additional features that support post processing of transcripts. These are:
enableDenormalization
: it tells whether to apply inverse normalization on the resulting transcript,enableTruecasing
: it tells whether to perform trucasing on the resulting transcript,punctuation
: it tells whether to auto punctuate the resulting transcript andtranslation
: it tells whether to perform translation on the resulting transcript.
The punctuation
and translation
features provide additional subkeys that allow for the configuration of the postprocessing in the way the user requires.
Each of the above described features come with two subkeys, isAllowed
and value
. The isAllowed
key tells whether the feature is allowed for the user,
while the value
key determines the current setting of the feature.
The user can change the value if and only if the isAllowed
key is set to True. The isAllowed
keys are preset and cannot be changed by the user!
#
Updating configurationIf the client wants to update his configuration, he can do that by sending PATCH request to the same endpoint as above.
The request body must be JSON formatted and must contain properties matching those returned by the request (see the example above).
Options that are not allowed to be changed by the user, will be ignored (e.g. if enableDictatedCommands
is false
, then setting its value will have no effect).
Properties that are not included in the request body, will not be modified.
The below example shows an example of request to modify the configuration so that the punctuation feature will be turned on.
curl --location --request PATCH 'https://<api-host>/api/client/configuration' \--header 'Authorization: Bearer {access_token}' \--header 'Content-Type: application/json' \--data-raw '{ "nlp": { "punctuation": { "enabled": { "value": true } } }}'
#
Reading available options and featuresIf the user wants to check the STT models that are available within his/her user account, the following endpoint can be used:
curl -X 'GET' \ 'https://<api-host>/api/stt/status' \ -H 'accept: */*'
In the response, all available models will be listed. The response comes as a nested JSON object that includes: for each framework, all supported languages, for each language all supported domains, and for each domain, all supported models.
{ "frameworks": [ { "code": "string", "isAllowed": true, "languages": [ { "code": "string", "isAllowed": true, "domains": [ { "code": "string", "isAllowed": true, "models": [ { "code": "string", "isAllowed": true, "isAvailable": true, "isRealtime": true, "dictationCommandsSupport": true, "diarizationSupport": true, "metadata": { "workers": { "available": 0, "active": 0 }, "info": { "am": "string", "lx": "string", "lm": "string", "src": "string", "framework": "string" } } } ] } ] } ] } ]}
#
Online transcription#
Creating connectionThe Truebar service provides a real time transcription of audio data. This is achieved by sending small pieces of audio data (chunks) trough a websocket connection. Given that websocket is an asynchronous bidirectional protocol, the transcripts can be returned as soon as they are available.
The following example shows how to establish a Websocket connection from javascript:
let ws = new Websocket("wss://<api-host>/ws?access_token={access_token}&session_id={session_id}")
Both URL parameters in the above example are optional. Despite that, access_token
must be included in the connection attempt in one way or another.
If Websocket client supports modifying HTTP headers, then access_token
can be sent as a bearer token via an Authorization header. Otherwise it must be sent using an access_token
URL parameter.
If session_id
parameter is given, the server will search for a session with the given id and resume it, in case it exists. Otherwise, if the parameter is not given, then request will create a new session on the fly.
Each created session can be resumed multiple times if session_id
parameter is given to websocket connection request. Resuming existing session creates a new recording entity on server but it does not create a new session entity.
All recordings created as part of existing session or new session, can be later accessed individually or as a part of the session to which they belong.
If the connection attempt is denied or an error happens before the connection is actually upgraded to Websocket, then the response will be returned as a HTTP response with appropriate error code and description.
In case an error happens after establishing Websocket connection, the connection will automatically be closed by the server with a close status 1011
an error description.
After connection is successfully established, the server will send send a message containing basic information about the session:
{ "sessionId" : "<number>", "isNew" : "<boolean>", "previousRecordings" : "<number>", "totalRecordedSeconds" : "<number>"}
In case session_id
parameter was given in the connection attempt then sessionId
will be the same, with isNew
flag set to false
.
Otherwise if session_id
parameter was not given then isNew
will be set to true
with sessionId
representing automatically generated unique session identifier,
that can later be used to resume that session.
In case session_id
parameter is specified in the connection attempt, sessionId
will remain unchanged with isNew
flag set to False
. Otherwise, if the session_id
parameter is not provided, isNew
will be set to True
with an unique auto-generated session identifier sessionId
for further reference – i.e. to resume the session.
previousRecordings
and totalRecordedSeconds
tell how many times the session was resumed in the past and a total length of all recordings associated with given session, respectively.
#
Streaming audio dataAfter a Websocket connection has been successfully established, the user can begin sending audio chunks. Audio chunks must be sent as binary websocket messages with a maximum size of 64KB. Each chunk must contain an array of 16 bit little-endian encoded integers, sampled at 16 kHz, which is also known as S16LE PCM encoding. Only single channel audio is supported.
To ensure realtime transcription, transcripts are asynchronously returned to the client as soon as they are available. They are formatted as JSON encoded text messages:
{ "decoded" : "<number>", "isFinal" : "<boolean>", "transcript" : { "id" : "<number>", "content" : "[ { "text" : "<string>", "startTime" : "<number>", "endTime" : "<number>", "confidence" : "<number>", "speakerCode" : "<string>" // Present only if automatic speaker detection is enabled }, { "text" : "<string>", "startTime" : "<number>", "endTime" : "<number>", "confidence" : "<number>", "speakerCode" : "<string>" // Present only if automatic speaker detection is enabled } , ... ]" }}
Note that content
is JSON encoded string and not object.
There are basically two types of transcripts, interims and finals, the information which are marked with isFinal
flag.
While chunks of audio are being decoded on the backend, the transcript is getting updated with new content.
Every time the transcript is updated, it is returned as an interim transcript to the client.
When a certain transcript length is reached or when there are no new words recognized for some time, the transcript is returned as final.
When transcript is returned as final, it means that it's content will not update anymore.
Instead, a new content will be returned as an interim transcript, which again will be incrementally updated until it becomes final.
The following example illustrates the above procedure:
- Interim : Da
- Interim : Danes
- Interim : Danes je
- Interim : Danes je lepo
- Interim : Danes je lepo da
- Interim : Danes je lep dan
- Final : Danes je lep dan.
- Interim : Jutri
- Interim : Jutri pa
- Interim : Jutri pa bo
- Interim : Jutri pa bo še lepši
- Final : Jutri pa bo še lepši.
#
Punctuations and CommandsThe Truebar transcription service can handle punctuations and commands. The punctuation is supported in two ways: a) Using an automatic punctuation service – in this case, the punctuations like comma, period, question mark etc. will be set in the transcript automatically by the punctuation service. b) By dictation – when this option is selected, the punctuation symbols are expected to be dictated by the user (to check how to pronounce a specific punctuation, refer to the QuickStart Guide (Eng, Slo).
Which option the user wants to use can be specified through the Configuration. The two options are not mutually exclusive, i.e. they can be both used at the same time.
In the transcript returned by the Truebar transcription service, the punctuation symbols, either set by the automatic service or dictated, are treated as stand-alone objects and returned as symbols, closed in sharp brackets, e.g. <.>
for a full stop or <?>
for a question mark.
Apart from the punctuations, the Truebar transcription service can handle specific commands that can be user-dictated. For example, a command for making a new line within the transcript or new paragraph. Other commands include uppercase, lowercase, capital letter, delectation, substitution etc. For the full list please check the QuickStart Guide (Eng, Slo).
Similarly, to the punctuation symbols, all the supported commands are returned as stand-alone objects, closed in sharp parenthesis. For example <nl>
for a new line, <np>
for a new paragraph, <uc>
for the beginning of all upper cased transcript, </uc>
for the end of upper transcript etc.
#
Closing connectionTranscription can be completed in either of the following two ways: a) by closing websocket from the client side or b) by sending an empty binary message.
When closing websocket connection from the client side, it may happen that a short audio chunk at the end will not get transcribed. This will happen when some audio chunks the backend has already received, were not yet processed at the time the client closed the connection. Nevertheless, these chunks will be stored in the database and will later be accessible by an API call.
In case the client can wait until the transcription is complete, the second approach is more appropriate. With this approach, the client sends an empty audio chunk indicating that there is no more data to be sent. From this time on no more audio chunks are accepted by the server, but transcription of already received audio chunks will continue until all audio chunks are processed. When all transcripts are sent to the client, the server will automatically close the websocket session with status code 4500 indicating that the session was completed without error.
#
Offline transcriptionTo transcribe an audio file offline, the following HTTP API is available:
Request:
curl --location --request POST 'https://<api-host>/api/client/upload?async=<true|false>' \--header 'Authorization: Bearer <access_token>'--form 'file=@"/path/to/audio.wav"'
The endpoint accepts WAVE formatted audio files that must be encoded as S16LE PCM, 16bit, 16kHz.
There are two different modes of operation available, synchronous and asynchronous, which can be selected via the async
request parameter.
#
Synchronous modeWhen using the synchronous mode, the request is blocked until the transcription result is ready. The response will contain the following data:
{ "sessionId": <number>, // Session identifier "transcripts": [ // List of transcripts { "id": <number>, // Transcript identifier "content": <string> // JSON encoded transcript content } ]}
Synchronous calls require that underlying TCP connection is kept open until the transcription result is ready. When uploading large audio files, transcription times can exceed network timeouts and can cause TCP connection to be dropped before the transcription is complete and the result is send back to the client. The synchronous mode should thus be used for short audio files only.
#
Asynchronous modeThe asynchronous operation mode can be selected with the async
request parameter set to true
.
When using this mode, the response is returned immediately after the file has been successfully uploaded.
The response contains the session identifier which can be further used to pull the session status from the server.
The following endpoint is available for this purpose:
Request:
curl --location --request GET '<api-host>/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'
Response:
{ "id": "<number>", // Session identifier "name": "<string>", // Name of the session "status" : "ERROR | INITIALIZING | IN_PROGRESS | FINISHING | FINISHED", "numRecordings": "<number>", // Number of recordings created under the same session "recordedSeconds": "<number>", // Total seconds recorded "processedSeconds": "<number>", // Total seconds recorded (transcribed) "createdAt": "<string>", // Creation date and time "updatedAt": "<string>", // Date and time of last update (eg. new recording) "createdBy": { // Username under which the session was created "id": "<number>", "username": "<string>" }, "isSaved": "<boolean>", // Not relevant for API users - always false "isDiscarded": "<boolean>", // Not relevant for normal users - always false "allRecordingsDiscarded": "<boolean>", // Not relevant for normal users - always false "notes": "<string>", // Session notes "labels": [ // List of labels assigned to session { "isEnabled": "<boolean>", "label": { "id": "<number>", "code": "<string>", "color": "<string>", "isDefault": "<boolean>", "isAssigned": "<boolean>" } } ] }
The status property of the above object can be used to determine the transcription status.
Statuses INITIALIZING
, IN_PROGRESS
, FINISHING
are transient and can change while uploaded audio file is still processing.
After the transcription has been completed, the session gets one of the following statuses: FINISHED
or ERROR
.
Transcription progress in seconds of processed audio can alo be monitored with processedSeconds
field.
The transcript of the uploaded audio file can be accessed trough the endpoint described here and can be invoked even while the transcription is still in progress. In the later case, the service will return a partial transcript of the uploaded audio file.
#
History#
List of sessionsAll sessions created trough Websocket or HTTP API can later be accessed trough HTTP API endpoints described in this section. All endpoints described here are protected using JWT token. For details about obtaining a valid JWT token, please refer to the section Authentication. After a valid token has been acquired, it has to be included in all request that follow.
The following example shows how to obtain a list of past sessions:
curl --location --request GET '<api-host>/api/client/sessions' \--header 'Authorization: Bearer <access_token>'
Response:
{ "totalPages": "<number>", // Total number of pages available "totalElements": "<number>", // Total number of elements available on all pages "first": "<boolean>", // True if this is the first page "last": "<boolean>", // True if this is the last page "number": "<number>", // Current page number "size": "<number>", // Number of elements on current page "empty": "<boolean>", // True if this page is empty // ...Some page properties omitted for readability "content": [ // List of sessions { "id": "<number>", // Session identifier "name": "<string>", // Name of the session "status" : "ERROR | INITIALIZING | IN_PROGRESS | FINISHING | FINISHED", "numRecordings": "<number>", // Number of recordings created under the same session "recordedSeconds": "<number>", // Total recorded seconds "processedSeconds": "<number>", // Total seconds recorded (transcribed) "createdAt": "<string>", // Creation date and time "updatedAt": "<string>", // Date and time of last update (eg. new recording) "createdBy": { // Username under which the session was created "id": "<number>", "username": "<string>" }, "isSaved": "<boolean>", // Not relevant for API users - always false "isDiscarded": "<boolean>", // Not relevant for normal users - always false "allRecordingsDiscarded": "<boolean>", // Not relevant for normal users - always false "notes": "<string>", // Session notes "labels": [ // List of labels assigned to session { "isEnabled": "<boolean>", "label": { "id": "<number>", "code": "<string>", "color": "<string>", "isDefault": "<boolean>", "isAssigned": "<boolean>" } } ] } ]}
The endpoint implements a pageable interface which means that the response will contain only a single page of results and not the entire list. By default the page size is set to 20 elements.
A specific page size and page number can be requested by setting custom page
and size
request parameters. Results can also be sorted by using sort
request parameter
and by providing one of the following properties:
- id
- name
- createdAt,
- updatedAt,
- numRecordings
- recordedSeconds
Sessions can also be filtered by providing any of the following optional request parameters:
- name : Return only sessions with names containing the given value,
- label : Return only sessions that are labeled with the given label,
- created-after : Return only sessions that were created after the given date and time,
- created-before : Return only sessions that were created before the given date and time.
The following example shows the request that returns first 30 sessions containing word "Test" in their names and sorts the results by creation time ascending:
curl --location --request GET '<api-host>/api/client/sessions?page=0&size=30&name=Test&sort=createdAt,asc' \--header 'Authorization: Bearer <access_token>'
#
Specific session detailsRequest:
curl --location --request GET '<api-host>/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
Response:
Same as single entry from the list of returned sessions described above.
#
Audio fileReturns audio file created within the given session. The request supports HTTP resource region specification which allows the file to be transferred by smaller pieces.
This allows audio players to stream audio file instead of transferring the whole file before playing it. Like any other request, this request also supports passing access_token
as a request parameter instead of passing it as a Bearer token inside Authorization header.
This can be useful if requesting the audio file directly from audio player which does not support modification of request headers.
Request:
curl --location --request GET '<api-host>/api/client/sessions/<session-id>/audio.wav' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
#
Session transcriptsReturns list of transcripts generated within the given session.
Request:
curl --location --request GET '<api-host>/api/client/sessions/<session-id>/transcripts' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
Response:
[ { "id":"<number>", // Transcript identifier "content":"<JSON string>" // JSON encoded transcript content with the same structure as transcripts received trough Websocket API. }, ...]
#
Session recordingsReturns list of recordings created within the given session. The request supports a pageable interface. For the list of available request parameters regarding pageable interface check List of sessions section.
Request:
curl --location --request GET '<api-host>/api/client/sessions/<session-id>/recordings' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
Response:
{ "totalPages": "<number>", // Total number of pages available "totalElements": "<number>", // Total number of elements available on all pages "first": "<boolean>", // True if this is the first page "last": "<boolean>", // True if this is the last page "number": "<number>", // Current page number "size": "<number>", // Number of elements on current page "empty": "<boolean>", // True if this page is empty "content": [ { "id": "<number", // Recording identifier "duration": "<number>", // Length of the recording in seconds "isDiscarded": "<boolean>", // Should always be true for normal users "transcriptionLanguage": "<string", // Selected settings when recording was created... "transcriptionDomain": "<string", "transcriptionModelVersion": "<string", "transcriptionEndpointingType": "<string", "transcriptionDoInterim": "<boolean>", "transcriptionDoPunctuation": "<boolean>", "transcriptionDoInterimPunctuation": "<boolean>", "transcriptionShowUnks": "<boolean>", "transcriptionDoDictation": "<boolean>", "transcriptionDoNormalisation": "<boolean>", "translationEnabled": "<boolean>" } ] }
#
Session sharingSessions are by default only visible to the user that created them. Session sharing mechanism allows the user to share his sessions with other users that belong to the same group. There are several endpoints that can be used to manage session shares.
#
List session sharesRequest:
curl --location --request GET '<api-host>/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>'
Response:
[ { "id": "<number>", // share id "createdAt": "<string>", "sharedWith": { "id": "<number>", // user id "username": "<string>" } }, ...]
#
Adding session shareList of users whom the session can be shared with can be obtained by the following request.
Request:
curl --location --request GET '<api-host>/api/client/users' \--header 'Authorization: Bearer <access_token>'
Response:
[ { "id": "<number>", "username": "<string>" }, { "id": "<number>", "username": "<string>" }, ...]
Session can be shared with selected user with the following request:
Request:
curl --location --request POST '<api-host>/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{ "userId" : <user-id>}'
#
Delete session sharecurl --location --request DELETE '<api-host>/api/client/sessions/shares/<share-id>' \--header 'Authorization: Bearer <access_token>' \
#
DictionariesSpeech to text models that are used in the Truebar STT services are based on dictionaries. Dictionaries keep data on terms that are used in speech and their pronunciations. Each term is further specified with a relative frequency telling how often the term is expected to occur in speech (this is only an estimation).
Dictionaries are domain-dependant as the terminology used in different domains is different. Medical doctors for instance use much different terminology in speech than lawyers. For this reason, dictionaries are bound to their models while the models fit to specific domains.
On the other hand, dictionaries are user independent. This means that when somebody makes a change in dictionary of a specific model, this change will be visible to all users with access to this model.
All models that are available in Truebar, e.g., MED, LAW, COL etc., come with prebuild dictionaries that already include majority of terms common for their domains. Nevertheless, there will always be terms that are missing. Truebar API offers a set of operations that can help in this regard. With specific endpoints you can add new terms and their pronunciations or update existing entries in the dictionary.
To use these features different roles are required. The role
DICTIONARY_READER
will allow you to read dictionary entries. To make changes this will not suffice. You will also need the roleDICTIONARY_WRITER
. See User roles section for more.
Below are the main endpoints the Truebar API offers for maintaining dictionary entries.
#
Inserting new terms to dictionaryRequired role: DICTIONARY_WRITER
This is the request syntax to add a new term with corresponding pronunciations and frequency classes to a dictionary. Make sure that for each term at least one pronunciation is provided otherwise the request will get rejected.
Request:
curl --location --request POST '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/words' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{ "text" : "<string>", "frequencyClassId" : "<number>", // For a list of available frequencyClasses, see /frequencyClasses endpoint "pronunciations": [ { "text" : "<string>" } ]}'
language-code
: model languagedomain-code
: model domainmodel-version
: model version
#
Inserting and updating pronunciationsRequired role: DICTIONARY_WRITER
To add new or update existing pronunciations of dictionary terms, you can use the same endpoint as for inserting new terms. Existing pronunciations will be replaced with those you specify in the request.
#
Obtaining frequency classesRequired role: DICTIONARY_READER
In order to specify a frequency class when inserting a new term to dictionary, use this endpoint to obtain the list of available frequency classes:
Request:
curl --location --request GET '<api-host>/api/client/dictionary/frequencyClasses' \--header 'Authorization: Bearer'
Response:
[ { "id": "<number>", "label": "<string>" },]
#
Obtaining phonemsRequired role: DICTIONARY_READER
Pronunciations of terms must be specified with symbols that describe phonems. Phonems are small units of sound that distinguish words among each other. They are usually written with letters. There are two groups of phonems, silent and non-silent. The list of available phonems, silen and non-silent, is defined within the model. When providing new or updating existing pronunciations, only the allowed phonems may be used. To obtain the list of phonems, use the following endpoint:
Request:
curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/phone-set' \--header 'Authorization: Bearer <access_token>' \
language-code
: model languagedomain-code
: model domainmodel-version
: model version
Response:
{ "nonsilentPhones": "<string>", "silentPhones": "<string>"}
#
Removing terms from dictionaryRequired role: DICTIONARY_WRITER
To delete a specific term from a dictionary, use the following endpoint:
Request:
curl --location --request DELETE '<api-host>/api/client/dictionary/words/<word-id>' \--header 'Authorization: Bearer <access_token>'
word-id
: Unique identifier of the term
#
Searching in dictionaryRequired role: DICTIONARY_READER
The request below demonstrates how to search for a specific term in a dictionary:
Request:
curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/words?text=<text>' \--header 'Authorization: Bearer <access_token>'
language-code
: model languagedomain-code
: model domainmodel-version
: model versiontext
: text of a term to search for
Response:
{ "id": "<number>", "frequencyClassId": "<number>", "status": "NEW | IN_PROGRESS | IN_DICTIONARY", "text": "<string>", "pronunciations": [ { "saved": "<boolean>", "text": "<test>" } ]}
This endpoint allows users to search the dictionary for a given term. The expected response status is always 200 even if the term is not found in the dictionary.
The property named status
which can be found in the response, will tell you the status of the term. The status can have one of the following values:
NEW
- the term is new, i.e. it is not yet in the dictionary;IN_PROGRESS
- the term has been added to the dictionary and will be included in the next version of the model;IN_DICTIONARY
- the term is in the dictionary and is included in the current version of the model.
Other properties are included in the response based on the status:
id
- Unique word identifier (if the word is found in the dictionary);frequencyClassId
- identifier of the frequency class assigned to the term (if the term is found in dictionary);text
- label of the term, i.e. the way the term is written;pronunciations
- the list of pronunciations for the term. Each entry comes as a nested object and has a propertysaved
that tells if the pronunciation has been assigned to the term manually or it has been generated automatically. For new terms all returned pronunciations will be automatically generated. In other cases, the response will potentially have both, manually added and automatically generated pronunciations.
#
Rebuilding modelsThe changes in the dictionary will not make any effect on speech recognition until the model (the language part of the model) is rebuilt, i.e. updated to a new version. This is time consuming operation that can take an hour or more. During the update process, users can continue to work with the existing version of the model. Once a new version is created, it becomes available through the configuration update (see Configuration). To use the newly created version, close existing sessions, update the configuration to select the new model version, and start a new session.
For rebuilding models specific roles are required. To trigger the rebuilt process, you will need the role
MODEL_UPDATER_WRITER
. To check the status of rebuilding process that is in progress, you will also require the roleMODEL_UPDATER_READER
. See User roles section for more.
The request below shows how to trigger the rebuild process:
Request:
curl --location --request POST '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/regeneration/regenerate' \--header 'Authorization: Bearer <access_token>' \
language-code
: model languagedomain-code
: model domainmodel-version
: model version
The operation is asynchronous. You can check the update status via /regeneration/status
. See the following example:
Request:
curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/regeneration/status' \--header 'Authorization: Bearer <access_token>' \
language-code
: model languagedomain-code
: model domainmodel-version
: model version
Response:
{ "statusCode": "NOT_RUNNING" | "RUNNING" | "FAILED", "enableModelUpdating": "<boolean>", // True if the user is granted the right to update models}
#
Advanced featuresApart from the basic features that are exposed through Truebar API, there are also other, more advanced features that can be used to achieve highest possible accuracy of recognition and best possible user experience. These features include services to work with substitutes, replacements and prefixed tokens.
#
ReplacementsYou can use replacements to make user-defined labels for terms in dictionary. When Truebar STT service recognizes a term that has a user defined replacement, it makes appropriate changes in the response. Replacements take effect immediately, i.e. without the need to rebuild the model.
The best way to fine tune the model is to take care it’s dictionary comprises all terms that are used in speech. Specific attention should be paid to abbreviations, acronyms and measurement units. Make sure that all required terms are included in the dictionary and have appropriate labels and pronunciations.
Sometimes however, we want to make specific changes that are user dependent. For instance, we want the measurement unit for meters per seconds to be written as m/sec
while other users prefer to use
m/s
as the measurement label. Such user-dependent changes of the dictionary entries are handled through replacements. These are kept separately from the dictionary and thus have no effect on other users.
To retrieve the list of active replacements, you can use the following endpoint:
Request
curl --location --request GET '<api-host>/api/client/replacements' \--header 'Authorization: Bearer <access_token>' }
Response:
[ { "id": "<number>", "source": [ // List of source words { "spaceBefore": "<boolean>", "text": "<string>" }, { "spaceBefore": "<boolean>", "text": "<string>" }, ... ], "target": { // Target word that will replace all source words "spaceBefore": "<boolean>", "text": "<string>" }, }, ...]
To get information on a specific replacement, use this endpoint:
Request:
curl --location --request GET '<api-host>/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' }
replacement-id
: replacement identification number as returned when asking for a list of replacements
Response:
{ "id": "<number>", "source": [ // List of source words { "spaceBefore": "<boolean>", "text": "<string>" }, { "spaceBefore": "<boolean>", "text": "<string>" }, ... ], "target": { // Target word that will replace all source words "spaceBefore": "<boolean>", "text": "<string>" },}
To add new replacements, you can call:
curl --location --request POST '<api-host>/api/client/replacements' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{ "source": [ { "spaceBefore": "<boolean>, "text": "<string>" }, { "spaceBefore": "<boolean>, "text": "<string>" }, ... ], "target": { "spaceBefore": "<boolean>", "text": "<string>" }}'
Finally, to delete a specific replacement, call:
curl --location --request DELETE '<api-host>/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' \
replacement-id
: replacement identification number as returned when asking for a list of replacements
#
SubstitutesSubstituter is a service that returns a list of substitutes for a given term. Substitutes of a term are other dictionary terms that are similarly written or similarly pronounced as the given term. You can use the substituter to provide the user with a list of substitutes.
Required role: SUBSTITUTOR_READER
Request:
curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/word_substitutes?text=<word_string>' \--header 'Authorization: Bearer <access_token>' \
language-code
: model languagedomain-code
: model domainmodel-version
: model versionword_string
: the term to find substitutes for
Response:
{ "1": "<string>", "2": "<string>", "3": "<string>" ...}
#
Prefixed tokensTruebar API exposes a service that can return dictionary terms that have a specific prefix. This service can be useful when spelling terms letter by letter, to provide the user with a list of terms that correspond to the spelled letters.
Required role: SUBSTITUTOR_READER
curl --location --request GET '<api-host>/api/client/dictionary/languages/<language-code>/domains/<domain-code>/dmodels/<model-version>/prefixed_words?prefix=<prefix_string>' \--header 'Authorization: Bearer <access_token>' \
language-code
: model languagedomain-code
: model domainmodel-version
: model versionprefix_string
: prefix for which we want to retrieve terms
Response:
{ "content": [ "<string>", "<string>", "<string>", ... ]}
#
Automatic speaker detection (experimental)This function can be enabled by setting transcriptionDoDiarization
option inside user's configuration.
It can only be enabled when the selected model supports automatic speaker detection (diarizationSupport
field inside models's information returned along user's configuration).
When the option is enabled the words inside each received transcript will contain the field named speakerCode
.
The field hold's label ( speaker0
, speaker1
, etc ) witch is automatically assigned to the detected speaker.
#
Response statuses and errors#
HTTP APIThis section describes expected HTTP response statuses for endpoints described in this document.
Response statuses for various HTTP methods, when operation is successful:
GET
: response status 200 OK along with the response body;POST
: response status 201 CREATED and automatically generated unique resource identifier in the response body;PATCH
: response status 204 NO_CONTENT without response body;DELETE
: response status 204 NO_CONTENT without response body;
Response statuses for errors:
400 BAD_REQUEST
Indicates that the request could not be processed due to something that is perceived to be a client error;401 UNAUTHORIZED
indicates that the request has not been applied because it lacks valid authentication credentials (invalid JWT or not present);403 FORBIDDEN
indicates that the server understood the request but refuses to authorize it (Insufficient permissions);404 NOT_FOUND
indicates that the requested resource was not found on the server;405 METHOD NOT ALLOWED
indicates that the resource does not support given request method (eg. POST request on URL that only supports GET);409 CONFLICT
indicates that the resource that is being added or modified already exists on the server;415 UNSUPPORTED MEDIA TYPE
indicates that the uploaded media file format is not supported;500 SERVER_ERROR
indicates that the request could not be processed because of internal server error.
If not otherwise noted then any request described in this document should return one of the response statuses described above. In case there was an error processing the request, then the response body will be returned in the following format:
{ "id": "<string>", "timestamp": "<string>", "message": "<string>"}
The returned object contains message with short error description. In general the message together with the HTTP response status is sufficient for the client to know what has gone wrong. However that is not the case when error triggers response status 500 because the response body only contains generic error message. If this happens please contact our support with returned error id.
#
Websocket APIWebsocket standard already defines predefined codes that are returned to the client when Websocket connection is closed. True-Bar service implements following additional close codes:
1001 GOING_AWAY
1011 SERVER_ERROR
4001 LANGUAGE_NOT_AVAILABLE
4002 DOMAIN_NOT_AVAILABLE
4003 MODEL_NOT_AVAILABLE
4004 MODEL_NOT_ALLOWED
4005 WORKERS_NOT_AVAILABLE
4500 NORMAL
#
Libraries and examplesThis chapter lists available libraries that can be used to integrate with Truebar service. Currently we provide libraries for Java an Javascript programming languages.
#
Java libraryJava library is written in Java 11. It uses built in HttpClient to minimize number of external dependencies. It is packaged as a single .jar
file that can be imported into the target project.