Skip to main content

Truebar API v3.0

New in this version#

Introduction#

Welcome to the Truebar service guide! This document covers the fundamental features of Truebar service.

The Truebar service exposes HTTP API, offering a straightforward method for accessing a range of features. Primarily focused on speech processing, it facilitates various operations including transcription, punctuation, normalization, synthesis, and more.

For an initial test of your user account and associated privileges, you can use the Truebar, dictation and transcript editor. Please make sure to read the QuickStart Guide (Eng, Slo).

The subsequent chapters of this guide serve as a step-by-step tutorial for leveraging basic API functionalities. For those seeking more advanced insights, our Swagger documentation provides more detailed information.

Accessing the service#

All examples in this guide presume that the service is accessible at 'true-bar.si'. This implies following URLs for various components:

  • Truebar editor: true-bar.si
  • Authentication service: auth.true-bar.si
  • Truebar API: api.true-bar.si

Truebar editor is always accessible at the base domain, while the Truebar API and authentication service are available at their respective subdomains: 'api' and 'auth'.

The Truebar API provides an endpoint to check its status:

curl --location --request GET 'https://true-bar.si/api/info'

The response contains basic API information such as version, build date, and the path to OpenAPI specifications.

For users interested in testing WebSocket connections, there's also an WebSocket endpoint available at 'https://true-bar.si/api/ws-info'. This endpoint operates by accepting incoming WebSocket connections, sending a single text message with a content identical to the response above, and then closing the connection.

Authentication#

All requests that are sent to the Truebar service must contain a valid access token, otherwise they will get denied with HTTP 401 UNAUTHORIZED status code. Access tokens can be acquired by sending the following HTTP request to the authentication service:

curl --location --request POST 'https://auth.true-bar.si/auth/realms/truebar/protocol/openid-connect/token' \    --header 'Content-Type: application/x-www-form-urlencoded' \    --data-urlencode 'grant_type=password' \    --data-urlencode 'username={username}' \    --data-urlencode 'password={password}' \    --data-urlencode 'client_id=truebar-client'

A response is received in JSON format with the following structure:

{    "access_token": "<string>",    "expires_in": "<number>",    "refresh_expires_in": "<number>",    "refresh_token": "<number>",    "token_type": "<string>",    "not-before-policy": "<number>",    "session_state": "<string>",    "scope": "<string>"}

Each request that is sent to the Truebar service, must be authenticated via an access_token. The token can be given as a Bearer token inside the HTTP request header or as an URL parameter. It is recommended to use the first option to avoid access_token from being caught into server logs or browser history. The following code demonstrates how to send a request to the backend and pass a token via an Authorization header:

curl --location --request GET 'https://api.true-bar.si/...' \--header 'Authorization: Bearer {access_token}'

Despite the above mentioned reasons to pass access_token via Authorization headers, it sometimes becomes necessary to use URL parameter option instead. Such example would be an application that does not allow modification of request headers (e.g. audio player). The following code demonstrates how to send a request to the backend and pass a token via an URL parameter:

curl --location --request GET 'https://api.true-bar.si/...?access_token={access_token}'

access_token is valid for the duration of expires_in seconds. After that time, it expires and must therefore be refreshed using provided refresh_token:

curl --location --request POST 'https://auth.true-bar.si/auth/realms/truebar/protocol/openid-connect/token' \    --header 'Content-Type: application/x-www-form-urlencoded' \    --data-urlencode 'grant_type=refresh_token' \    --data-urlencode 'refresh_token={refresh_token}' \    --data-urlencode 'client_id=truebar-client'

refresh_token is valid for the duration of refresh_expires_in seconds. When this time period elapses, the refresh token becomes invalid and the only way to acquire a new token is by using user credentials (i.e. username and password).

User roles#

To use various services provided by Truebar API, multiple different permissions are available. Those are represented by different roles (authorities) encoded within JWT token. If the supplied JWT token does not contain permissions required by operation being executed, the request will get denied with 403 FORBIDDEN response code.

Following permissions are currently available:

  • API Basic authority that is required, to use any operation provided by Truebar API.

  • PIPELINE_ONLINE_API Permission to use Websocket pipeline API

  • PIPELINE_OFFLINE_API permission to use file upload & JSON HTTP pipeline API

  • STAGE_ASR, // Speech to text

  • STAGE_NLP_PC, // Punctuation stage

  • STAGE_NLP_TC, // Truecasing stage

  • STAGE_NLP_TN, // Text normalization stage

  • STAGE_NLP_ITN, // Inverse text normalization (denormalization) stage

  • STAGE_NLP_G2A, // Accentuation stage

  • STAGE_NLP_NMT, // Machine translation stage

  • STAGE_NLP_AC, // Autocorrect stage

  • STAGE_TTS, // Automatic speech synthesis

  • DICTIONARY_READ, // Dictionary read operations

  • DICTIONARY_WRITE, // Dictionary write operations

  • MODEL_UPDATER_READ, // ModelUpdater read operations

  • MODEL_UPDATER_WRITE, // ModelUpdater write operations

  • SESSION_READ_DISCARDED, // Read access to discarded sessions

  • SESSION_WRITE_DISCARDED, // Write access to discarded sessions

  • SESSION_HARD_DELETE, // Permission to hard delete sessions and connected recordings/session-contents

  • GROUP_READ, // Read access to resources belonging to user's group

  • GROUP_WRITE, // Write access to resources belonging to user's group

  • ALL_READ, // Read access to all resources

  • ALL_WRITE; // Write access to all resources

To check out which roles are associated with your user account, a JWT parser is needed. Use any of the free libraries that are available for this purpose or try with the following example:

let token = JSON.parse(atob(token.split('.')[1]));/*token = { "realm_access": {    "roles": ["LIVE_STREAMER", "DICTIONARY_READER", etc]  }  ...other properties omitted for readability}*/

Pipelines#

Introduction#

Pipelines are core concept of Truebar API. They consists of a series of stages that are executed one after another. Single stage represents single NLP operation. Is accepts input and provides output data that may be in a form of text or audio, depending on operation it performs. Therefore we can divide stages to three distinct groups:

  • speech to text - accepts audio data as input and returns text as output. Example of such stage is ASR - automatic speech recognition;
  • text to text - accepts text as input and returns text as output. Most operations fall into this category, such as for example punctuation, text normalization, etc;
  • text to speech - accepts text as input and returns audio data as output. Example of such stage is speech synthesis.

Pipeline can simply be built by taking one or many (compatible) stages and chain them together. Two stages can be chained if their outputs and inputs are compatible; stage output must be of the same type an input to the next stage. Data always flows from the first stage to the last which means that pipeline can not contain loops, but doesn't mean that single pipeline can not contain multiple stages performing same task.

Stages#

Stage is defined by the task it performs. Following is a list of stages implemented by Truebar service:

  • STT
    • asr - Automatic Speech Recognition
  • TTT (NLP)
    • NLP_pc - Automatic Punctuation
    • NLP_tc - Automatic Truecasing
    • NLP_tn - Text Normalization
    • NLP_itn - Inverse Text Normalization
    • NLP_ac - Auto Correct
    • NLP_g2a - Accentuator
    • NLP_nmt - Natural Language Translation
  • TTS
    • tts - Text To Speech

Not all of the above stages may actually be be available to all user accounts. In addition, each stage can also contain set of configuration options that are dependent on selected model, user permissions etc.. Before trying to build pipeline with any of the above stages it is recommended to request stage information from Truebar service with the following API call:

curl --location --request POST 'https://api.true-bar.si/api/pipelines/stages' \--header 'Authorization: Bearer <access_token>'

Response is returned as JSON encoded list of available stages:

[    {        "task": "<string>",        "type": "<string>",        "config": [            {                "tag": "<string>",                "features": [                    "<string>",                    ...                ],                "parameters": {                    "parameterName": {                        "type": "<string>",                        "values" : ["<object>", ...],                        "defaultValue": "<object>"                    },                    ...                }            }        ]    }    ```]

Each stage has following properties:

  • task defines a task that is performed by current stage. Can be any value listed at the beginning of this section;
  • type defines a type of operation that the stage performs. This in turn defines input and output data type. Possible values are: STT, TTT, TTS;
  • config contains a list of available config options.
    • tag is used as a "label" representing specific model. It consists of four separate fields service-provider:language-code:domain-code:version. Each part can either be specified by alphanumeric code or * character representing generic part. For example tag KALDI:sl-SI:*:* represents model for slovenian language with unspecified (generic) domain and version.
    • features are model specific features that can not be modified.
    • parameters is a dictionary of parameters supported by model. Each parameter has following properties:
      • name is parameter name that is used for specifying parameter,
      • type represents parameter data type,
      • values is a list of valid values. It is present for parameters with predefined set of available options.

ASR (Automatic Speech Recognition)#

  • Input: Audio data
  • Output: Text segments (transcripts)
  • Features:
    • onlineAsr - model supports online transcription and is therefore more appropriate for use with streaming API,
    • offlineAsr - model supports offline transcription and is therefore more appropriate for use with offline (REST, file upload) API,
    • dictatedCommands - model recognizes dictated commands and returns special tokens representing spoken command,
    • dictatedPunctuations - model recognizes dictated punctuations and returns special tokens representing spoken punctuation.
  • Parameters:
    • enableUnks - if enabled, then model returns <unk> symbol for tokens which could not be recognized,
    • enableInterims - if enabled then model returns interim (partial) responses,
    • enableDiarization - if enabled then model returns information about detected speaker.

PC (Automatic Punctuation)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters:
    • enableSplitToSentences - if enabled, then resulting text segments will be split to sentences. If text segment does not end with one of final punctuations (.?!) then it is not possible to make sentences from a whole text segment. In this case the last part of segment (from the last final punctuation to the end) is returned as interim text segment. If a stage is used inside a streaming session the last interim part is also cached and prepended to the next received text segment.

TC (Automatic Truecasing)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters: /

TN (Text Normalization)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters: /

ITN (Inverse Text Normalization)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters: /

AC (Auto Correct)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters: /

G2A (Accentuator)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters: /

NMT (Neural Mmachine Translation)#

  • Input: Text segments
  • Output: Text segments
  • Features: /
  • Parameters: /

Pipeline examples#

Simple pipeline for basic audio transcription which only uses asr stage that is configured to use model KALDI:sl-SI:COL:20221208-0800 with enableInterims option enabled:

[    {        "task": "asr",        "config": {            "tag": "KALDI:sl-SI:COL:20221208-0800",            "parameters": {                "enableInterims": true            }        }    }]

More complex pipeline that first uses asr stage to transcribe input audio data and then pc and itn stages to automatically insert punctuations and denormalize result:

[    {        "task": "asr",        "config": {            "tag": "KALDI:sl-SI:COL:20221208-0800",            "parameters": {                "enableInterims": true            }        }    },    {        "task": "pc",        "config": {            "tag": "NEMO_PUNCTUATOR:sl-SI:*:*",            "parameters": {                "enableSplitToSentences": true            }        }    },    {        "task": "itn",        "config": {            "tag": "DEFAULT_DENORMALIZER:sl-SI:*:*"        }    }]

An example of invalid pipeline because of mismatch between first stage output type and second stage input type:

[    {        "task": "pc",        "config": {            "tag": "NEMO_PUNCTUATOR:sl-SI:*:*",        }    },    {        "task": "asr",        "config": {            "tag": "KALDI:sl-SI:COL:20221208-0800"        }    }]

Pipeline execution#

Pipelines can be executed with either offline (HTTP) or online (Websocket) API.

Offline API#

The Offline API is implemented as a REST service, providing straightforward methods for pipeline invocation. As previously mentioned, each pipeline accepts a specific type of data defined by its initial stage and produces an output data type defined by its final stage. Consequently, the API endpoint is capable of accepting and generating arbitrary data types as defined by the associated pipeline. Because pipelines are not predefined, the endpoint also requires pipeline definition to be included along with input data.

The endpoint for pipeline invocation is accessible at https://api.true-bar.si/api/pipelines/process. It accepts a multipart formatted body, allowing the inclusion of both JSON (application/json) and binary (application/octet-stream) formatted body parts. JSON formatted parts are used for passing pipeline definition and text-segments in case that pipeline requires text-based input while binary body parts are used for passing audio data when pipeline requires audio-based input. Response can be either in a JSON or binary format, depending on pipeline output.

Data types and formats#

Text data is always given as JSON encoded list of text-segments. This holds true for both, data send to and received from the server. Each text segment contains a list of tokens. Below is the example of such structure with both required and optional properties.

[    {        "isFinal": "true | false",        "startTimeMs": "<number>",        "endTimeMs": "<number>",        "tokens": [            {                "startOffsetMs": "<number>",                "endOffsetMs": "<number>",                "isLeftHanded": "true | false",                "isRightHanded": "true | false",                "text": "<string>"            }        ]    },    ...]
  • isFinal - If this flag is set to true, than text-segment represents final hypothesis. For now, this is only relevant for live STT and is thus further described in next section.
  • startTimeMs - Represents beginning timestamp of text-segment in milliseconds relative to the start of session. For now, this is only relevant for STT, and enables alignment of text and audio.
  • endTimeMs - Represents ending timestamp of text-segment in milliseconds relative to the start of session. For now, this is only relevant for STT, and enables alignment of text and audio.
  • tokens - A list of tokens contained in text-segment.
    • startOffsetMs - An offset in milliseconds from the text-segment's startTimeMs property representing beginning timestamp of token.
    • endOffsetMs - An offset in milliseconds from the text-segment's startTimeMs property representing ending timestamp of token.
    • isLeftHanded - True if token is left handed, false otherwise.
    • isRightHanded - True if token is right handed, false otherwise.
    • text - Actual text content of a token.

The only required properties from the above list are 'tokens' property of 'text-segment' and 'text' property of each token respectively. All other properties are optional and depends on actual operation that is executed.

Audio data, in contrast, can come in various formats and be encoded using any codec, provided it can be decoded by the service. We extend support to a wide array of popular audio and video formats; however, for a comprehensive list, please reach out to our support team. It's worth noting that containers containing multiple audio tracks may pose challenges, as the service cannot determine which audio track to decode. As a rule of thumb for optimizing compatibility and ensuring high-quality results, we suggest utilizing uncompressed audio formats such as for instance WAV or FLAC.

Examples#

Following is the example of processing pipeline with single punctuation stage, which means that endpoint expects and produces text based data.

Curl command:

curl --location 'http://true-bar.si/api/pipelines/process' \--header 'Content-Type: multipart/form-data' \--header 'Authorization: Bearer <access_token>' \--form 'pipeline="[{\"task\":\"NLP_pc\",\"config\":{\"tag\":\"NEMO_PC:sl-SI:*:*\", \"parameters\":{\"enableSplitToSentences\" : \"false\"}}}]";type=application/json' \--form 'data="[{\"tokens\":[{\"text\":\"Danes\"},{\"text\":\"je\"},{\"text\":\"lep\"},{\"text\":\"dan\"}]}]";type=application/json'

The above command produces following HTTP request:

POST /api/pipelines/process HTTP/1.1Host: true-bar.siContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gWAuthorization: Bearer <access_token>Content-Length: 462
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="pipeline"Content-Type: application/json
[{"task":"NLP_pc","config":{"tag":"NEMO_PC:sl-SI:*:*", "parameters":{"enableSplitToSentences" : "false"}}}]------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="data"Content-Type: application/json
[{"tokens":[{"text":"Danes"},{"text":"je"},{"text":"lep"},{"text":"dan"}]}]------WebKitFormBoundary7MA4YWxkTrZu0gW--

The response for the above request is returned as JSON formated list of text segments:

[    {        "isFinal": true,        "tokens": [            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "Danes"            },            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "je"            },            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "lep"            },            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "dan"            },            {                "isLeftHanded": true,                "isRightHanded": false,                "text": "."            }        ]    }]

Execution of speech to text pipelines is the same but with different input data format. Lets see the following example of speech recognition.

curl --location 'http://true-bar.si/api/pipelines/process' \--header 'Content-Type: multipart/form-data' \--header 'Authorization: Bearer <acess_token>' \--form 'pipeline="[{\"task\":\"ASR\",\"config\":{\"tag\":\"KALDI:sl-SI:COL:20211214-1431\", \"parameters\":{\"enableUnks\":false, \"enableDiarization\" : false, \"enableInterims\": true}}}]";type=application/json' \--form 'data=@"/path/to/audio.wav"'

The above command results to following HTTP request:

POST /api/pipelines/process HTTP/1.1Host: true-bar.siContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gWAuthorization: Bearer <acess_token>Content-Length: 457
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="pipeline"Content-Type: application/json
[{"task":"ASR","config":{"tag":"KALDI:sl-SI:COL:20211214-1431", "parameters":{"enableUnks":false, "enableDiarization" : false, "enableInterims": true}}}]
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="data"; filename="1second.wav"Content-Type: application/octet-stream
<audio_data>------WebKitFormBoundary7MA4YWxkTrZu0gW--

The above examples cover all possible types of input while the output was was in both cases a list of text-segments. There are two more cases that produces audio data at the output. We wont be giving full examples here as the request looks the same as with the above examples. The only difference is that the result witch endpoint returns is not a list of text segments but rather a binary data representing wave encoded audio.

Online API#

Online API is currently only available for usage with STT pipelines (i.e. pipelines that accepts audio data as an input and returns text segments)

Protocol description#

The Truebar service provides a real time transcription of audio data. This is achieved by sending small pieces of audio data (chunks) trough a websocket connection. Given that websocket is an asynchronous bidirectional protocol, the transcripts can be returned as soon as they are available.

The following example shows how to establish a Websocket connection from javascript:

let ws = new Websocket("wss://api.true-bar.si/ws?access_token={access_token}")

'access_token' url parameters in the above example is optional which means that it can be omitted if the 'access_token' is provided by some other means. For example if websocket client supports modifying HTTP headers when performing UPGRADE request, then access_token can be sent as a bearer token via an Authorization header.

If the connection attempt is denied or an error happens before the connection is actually upgraded to Websocket, then the response will be returned as a HTTP response with appropriate error code and description. In case an error happens after establishing Websocket connection, it will be reported as websocket message.

There are two basic types of websocket messages: binary messages which are used for transferring audio data from a client to the server and text messages which are used for transferring JSON formatted objects in both directions. A JSON message always has following structure:

{    "messageType": "CONFIG | EOS | STATUS | TEXT_SEGMENT"}

Different message types has different sub-structure:

CLIENT -> SERVER messages

  • CONFIG - Message contains pipeline definition that is used to configure the session. If optional session_id parameter is given, the server will search for a session with the given id and resume it, in case it exists. Otherwise, if the parameter is not given, new session will be created on the fly. Each created session can be resumed multiple times. Resuming existing session creates a new recording entity on server but it does not create a new session entity. All recordings created as part of existing session or new session, can be later accessed individually or as a part of the session to which they belong.

    ```json{    "messageType": "CONFIG",    "sessionId" : <number>    "pipeline": "<List of stages representing pipeline definition as described in previous chapter.>"}```
  • EOS - End-Of-Stream message is used to request closing a session. It must be the last message sent to the server. Optional 'lockSession' attribute is used to request that the server locks session after it is finished. The key to unlock it is returned as a part of FINISHED status update from server.

    ```json{    "messageType": "EOS",    "lockSession" : true | false}```

SERVER -> CLIENT messages

  • STATUS - Message is sent by the server on every session state transition. There are several variants of this message that depends on specific state transition.
    • INITIALIZED - message indicates that the session has been successfully initialized. Client must wait until this message arrives before attempting to send any message trough opened websocket connection.
      {    "messageType": "STATUS",    "status": "INITIALIZED"}
    • CONFIGURED - message indicates that the session has started. It is received after the server successfully processes CONFIG message and reserves all necessary resources.
         {    "messageType": "STATUS",    "status": "CONFIGURED",    "sessionId" : "<number>",      // Id of created session    "recordingId" : "<number>",    // Id of created recording   }
    • FINISHED - message indicates that the session has been successfully finished. At this point all results were successfully returned to the client so it is safe to close Websocket connection.
          {        "messageType": "STATUS",        "status": "FINISHED"        "lengthMs" "<number>"           // Length of created recording in milliseconds        "sessionLockKey" : "<string>"   // The key to unlock a session if it was requested to be locked in EOS message.      }
  • TEXT_SEGMENT - Message represent result in a form of text segment.
    {        "messageType": "TEXT_SEGMENT",        "textSegment": {        // ... same structure as described in previous chapter        }    }

Typical connection flow can be represented by the following diagram:

Docusaurus with Keytar
Establishing and configuring the session#

When websocket connection is established, the client must first await for status message with status INITIALIZED that indicate that the server is ready to process additional messages. After receiving the message client must configure the session with "CONFIG" message and await CONFIGURED status response. Awaiting CONFIGURED status response can also be omitted but it should be noted that there is no guaranty that the configuration will succeed (e.g. invalid pipeline, busy workers, etc). Therefore it is advised that the client awaits CONFIGURED message from the server before trying to send any audio data.

Streaming audio data#

After a Websocket connection has been successfully established, the user can begin sending audio chunks. Audio chunks must be sent as binary websocket messages with a maximum size of 64KB. Each chunk must contain an array of 16 bit little-endian encoded integers, sampled at 16 kHz, which is also known as S16LE PCM encoding. Only single channel audio is supported.

To ensure realtime transcription, transcripts are asynchronously returned to the client as soon as they are available. Transcripts are returned in following format:

{    "messageType": "TEXT_SEGMENT",    "textSegment": {        "isFinal": "<boolean>",         // Determines if transcript is final or interim        "startTimeMs": "<number>",      // Segment start time in milliseconds relative to the beginning of the session        "endTimeMs": "<number>",        // Segment end time in milliseconds relative to the beginning of the session        "tokens": [            {                "isLeftHanded" : "<boolean>",   // Determines if the token is left handed (implies a space before a token)                "isRightHanded" : "<boolean>",  // Determines if the token is right handed (implies a space after a token)                "startOffsetMs": "<number>",    // Token start time relative to the beginning of the segment                "endOffsetMs": "<number>",      // Token end time relative to the beginning of the segment                "text": "<string>"              // Token content            }        ]    }}

There are basically two types of transcripts, interims and finals, the information which are marked with isFinal flag. While chunks of audio are being decoded on the backend, the transcript is getting updated with new content. Every time the transcript is updated, it is returned as an interim transcript to the client. When a certain transcript length is reached or when there are no new words recognized for some time, the transcript is returned as final. When transcript is returned as final, it means that it's content will not update anymore. Instead, a new content will be returned as an interim transcript, which again will be incrementally updated until it becomes final. The following example illustrates the above procedure:

  • Interim : Da
  • Interim : Danes
  • Interim : Danes je
  • Interim : Danes je lepo
  • Interim : Danes je lepo da
  • Interim : Danes je lep dan
  • Final : Danes je lep dan.
  • Interim : Jutri
  • Interim : Jutri pa
  • Interim : Jutri pa bo
  • Interim : Jutri pa bo še lepši
  • Final : Jutri pa bo še lepši.
Punctuations and Commands#

The Truebar transcription service can handle punctuations and commands. The punctuation is supported in two ways: a) Using an automatic punctuation service – in this case, the punctuations like comma, period, question mark etc. will be set in the transcript automatically by the punctuation service. b) By dictation – when this option is selected, the punctuation symbols are expected to be dictated by the user (to check how to pronounce a specific punctuation, refer to the QuickStart Guide (Eng, Slo).

Which option the user wants to use can be specified through the Configuration. The two options are not mutually exclusive, i.e. they can be both used at the same time. In the transcript returned by the Truebar transcription service, the punctuation symbols, either set by the automatic service or dictated, are treated as stand-alone objects and returned as symbols, closed in sharp brackets, e.g. <.> for a full stop or <?> for a question mark. Apart from the punctuations, the Truebar transcription service can handle specific commands that can be user-dictated. For example, a command for making a new line within the transcript or new paragraph. Other commands include uppercase, lowercase, capital letter, delectation, substitution etc. For the full list please check the QuickStart Guide (Eng, Slo). Similarly, to the punctuation symbols, all the supported commands are returned as stand-alone objects, closed in sharp parenthesis. For example <nl> for a new line, <np> for a new paragraph, <uc> for the beginning of all upper cased transcript, </uc> for the end of upper transcript etc.

Closing session#

Transcription can be completed in either of the following two ways: a) by closing websocket from the client side or b) by sending an EOS message.

When closing websocket connection from the client side, it may happen that a short audio chunk at the end will not get transcribed. This will happen when some audio chunks the backend has already received, were not yet processed at the time the client closed the connection. Nevertheless, these chunks will be stored in the database and will later be accessible by an API call.

In case the client can wait until the transcription is complete, the second approach is more appropriate. With this approach, the client sends an EOS message indicating that there is no more data to be sent. From this time on no more audio chunks are accepted by the server, but transcription of already received audio chunks will continue until all audio chunks are processed. When all transcripts are sent to the client, the server will respond with "FINISHED" status. After that websocket connection can safely be closed.

History#

List of sessions#

All sessions created trough Websocket or HTTP API can later be accessed trough HTTP API endpoints described in this section. All endpoints described here are protected using JWT token. For details about obtaining a valid JWT token, please refer to the section Authentication. After a valid token has been acquired, it has to be included in all request that follow.

The following example shows how to obtain a list of past sessions:

curl --location --request GET 'api.true-bar.si/api/client/sessions' \--header 'Authorization: Bearer <access_token>'

Response:

{  "cursor" : "<corsor_value>"             // Cursor pointing to a location of session entry  "content": [                            // List of sessions    {      "id": "<number>",                   // Session identifier      "name": "<string>",                 // Name of the session      "status" : "INITIALIZING | IN_QUEUE | IN_PROGRESS | FINISHED | CANCELED | ERROR",      "numRecordings": "<number>",        // Number of recordings created under the same session      "recordedMs": "<number>",           // Total recorded milliseconds      "processedMs": "<number>",          // Total processed milliseconds      "createdAt": "<string>",            // Creation date and time      "updatedAt": "<string>",            // Date and time of last update (eg. new recording)      "createdByUser": "<string>",        // Username under which the session was created      "createdByGroup" : "<string>",      // Group under which the session was created      "isLocked" : "<boolean>",           // True if session is locked.      "isDiscarded" : "<boolean>",        // True if session is discarded.      "notes": "<string>",                // Session notes      "labels": [                         // List of labels assigned to session        {          "isEnabled": "<boolean>",          "label": {            "id": "<number>",            "code": "<string>",            "color": "<string>",            "isDefault": "<boolean>",            "isAssigned": "<boolean>"          }        }        // Some fields not relevant for normal users are omitted.      ]    }  ]}

Because a collection of sessions is typically much larger that what can be returned in single response, the endpoint implements a method of retrieving sessions in smaller chunks. Normally the endpoint would implement pageable interface by providing offset and limit parameters. This approach works well on static collections, whose elements dont change often. A collection of session on other hand is quite dynamic, as sessions gets created, deleted, updated, etc. This makes standard pagination hard to implement as sessions may be skipped when transitioning between pages. If one wants to implement infinite scroll mechanism instead of pagination then it is even harder.

Instead of using standard offset and limit parameters out endpoint uses cursors. A cursor determines a unique position in a collection taking into account user provided sorting and filtering. It can be used instead of offset to specify start of slice the we want. The benefit of using cursors over offset is that they always point to the same entry in a collection, regardless if a collection is modified. There are two URL parameters used for specifying a slice:

  • slice-cursor - Cursor to the beginning of a slice.
  • slice-length - Number of sessions to be included in a slice. If positive value is given, then slice will contain n session starting at given slice-cursor. If negative value is given then a slice will contain n session preceding given slice-cursor.

Endpoint also supports sorting the collection by using `sort` request parameter with following values::- _id_- _name_- _createdAt_,- _updatedAt_,- _numRecordings_- _recordedSeconds_
Sessions can also be filtered by providing any of the following optional request parameters:- _name_ : Return only sessions with names containing the given value,- _label_ : Return only sessions that are labeled with the given label,- _created-after_ : Return only sessions that were created after the given date and time,- _created-before_ : Return only sessions that were created before the given date and time.

The following example shows the request that returns first 30 sessions containing word "Test" in their names and sorts the results by creation time ascending:```bashcurl --location --request GET 'api.true-bar.si/api/client/sessions?page=0&size=30&name=Test&sort=createdAt,asc' \--header 'Authorization: Bearer <access_token>'

Specific session details#

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Response:

Same as single entry from the list of returned sessions described above.

Audio file#

Returns audio file created within the given session. The request supports HTTP resource region specification which allows the file to be transferred by smaller pieces. This allows audio players to stream audio file instead of transferring the whole file before playing it. Like any other request, this request also supports passing access_token as a request parameter instead of passing it as a Bearer token inside Authorization header. This can be useful if requesting the audio file directly from audio player which does not support modification of request headers.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/audio.wav' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Session transcripts#

Returns list of transcripts generated within the given session.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/transcripts' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Response:

[    {        "id":"<number>",            // Transcript identifier        "content":"<JSON string>"   // JSON encoded transcript content with the same structure as transcripts received trough Websocket API.    },    ...]

Session recordings#

Returns list of recordings created within the given session. The request supports a pageable interface. For the list of available request parameters regarding pageable interface check List of sessions section.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/recordings' \--header 'Authorization: Bearer <access_token>'
  • session-id: Unique session identifier

Response:

{  "totalPages": "<number>",       // Total number of pages available  "totalElements": "<number>",    // Total number of elements available on all pages  "first": "<boolean>",           // True if this is the first page  "last": "<boolean>",            // True if this is the last page  "number": "<number>",           // Current page number  "size": "<number>",             // Number of elements on current page  "empty": "<boolean>",           // True if this page is empty  "content": [        {            "id": "<number",                                    // Recording identifier            "duration": "<number>",                             // Length of the recording in seconds            "isDiscarded": "<boolean>",                         // Should always be true for normal users            "transcriptionLanguage": "<string",                 // Selected settings when recording was created...            "transcriptionDomain": "<string",            "transcriptionModelVersion": "<string",            "transcriptionEndpointingType": "<string",            "transcriptionDoInterim": "<boolean>",            "transcriptionDoPunctuation": "<boolean>",            "transcriptionDoInterimPunctuation": "<boolean>",            "transcriptionShowUnks": "<boolean>",            "transcriptionDoDictation": "<boolean>",            "transcriptionDoNormalisation": "<boolean>",            "translationEnabled": "<boolean>"        }    ]  }

Session sharing#

Sessions are by default only visible to the user that created them. Session sharing mechanism allows the user to share his sessions with other users that belong to the same group. There are several endpoints that can be used to manage session shares.

List session shares#

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>'

Response:

[    {        "id": "<number>",           // share id        "createdAt": "<string>",        "sharedWith": {            "id": "<number>",       // user id            "username": "<string>"        }    },    ...]

Adding session share#

List of users whom the session can be shared with can be obtained by the following request.

Request:

curl --location --request GET 'api.true-bar.si/api/client/users' \--header 'Authorization: Bearer <access_token>'

Response:

[    {        "id": "<number>",        "username": "<string>"    },    {        "id": "<number>",        "username": "<string>"    },    ...]

Session can be shared with selected user with the following request:

Request:

curl --location --request POST 'api.true-bar.si/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{    "userId" : <user-id>}'

Delete session share#

curl --location --request DELETE 'api.true-bar.si/api/client/sessions/shares/<share-id>' \--header 'Authorization: Bearer <access_token>' \

Advanced features#

Apart from the basic features that are exposed through Truebar API, there are also other, more advanced features that can be used to achieve highest possible accuracy of recognition and best possible user experience. These features include services to work with substitutes, replacements and prefixed tokens.

Replacements#

You can use replacements to make user-defined labels for terms in dictionary. When Truebar STT service recognizes a term that has a user defined replacement, it makes appropriate changes in the response. Replacements take effect immediately, i.e. without the need to rebuild the model.

The best way to fine tune the model is to take care it’s dictionary comprises all terms that are used in speech. Specific attention should be paid to abbreviations, acronyms and measurement units. Make sure that all required terms are included in the dictionary and have appropriate labels and pronunciations.

Sometimes however, we want to make specific changes that are user dependent. For instance, we want the measurement unit for meters per seconds to be written as m/sec while other users prefer to use m/s as the measurement label. Such user-dependent changes of the dictionary entries are handled through replacements. These are kept separately from the dictionary and thus have no effect on other users.

To retrieve the list of active replacements, you can use the following endpoint:

Request

curl --location --request GET 'api.true-bar.si/api/client/replacements' \--header 'Authorization: Bearer <access_token>' }

Response:

[    {        "id": "<number>",        "source": [                             // List of source words            {                "spaceBefore": "<boolean>",                "text": "<string>"            },            {                "spaceBefore": "<boolean>",                "text": "<string>"            },            ...        ],        "target": {                             // Target word that will replace all source words            "spaceBefore": "<boolean>",            "text": "<string>"        },    },    ...]

To get information on a specific replacement, use this endpoint:

Request:

curl --location --request GET 'api.true-bar.si/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' }

replacement-id: replacement identification number as returned when asking for a list of replacements

Response:

{    "id": "<number>",    "source": [                             // List of source words        {            "spaceBefore": "<boolean>",            "text": "<string>"        },        {            "spaceBefore": "<boolean>",            "text": "<string>"        },        ...    ],    "target": {                             // Target word that will replace all source words        "spaceBefore": "<boolean>",        "text": "<string>"    },}

To add new replacements, you can call:

curl --location --request POST 'api.true-bar.si/api/client/replacements' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{    "source": [            {                "spaceBefore": "<boolean>,                "text": "<string>"            },            {                "spaceBefore": "<boolean>,                "text": "<string>"            },            ...        ],    "target": {            "spaceBefore": "<boolean>",            "text": "<string>"        }}'

Finally, to delete a specific replacement, call:

curl --location --request DELETE 'api.true-bar.si/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' \

replacement-id: replacement identification number as returned when asking for a list of replacements

Automatic speaker detection (experimental)#

This function can be enabled by setting enableSd option for AST stage. It can only be enabled when the selected model supports automatic speaker detection.

When the option is enabled the words inside each received transcript will contain the field named speakerCode. The field hold's label ( speaker0, speaker1, etc ) witch is automatically assigned to the detected speaker.

Response statuses and errors#

HTTP API#

This section describes expected HTTP response statuses for endpoints described in this document.

Response statuses for various HTTP methods, when operation is successful:

  • GET : response status 200 OK along with the response body;
  • POST : response status 201 CREATED and automatically generated unique resource identifier in the response body;
  • PATCH : response status 204 NO_CONTENT without response body;
  • DELETE: response status 204 NO_CONTENT without response body;

Response statuses for errors:

  • 400 BAD_REQUEST Indicates that the request could not be processed due to something that is perceived to be a client error;
  • 401 UNAUTHORIZED indicates that the request has not been applied because it lacks valid authentication credentials (invalid JWT or not present);
  • 403 FORBIDDEN indicates that the server understood the request but refuses to authorize it (Insufficient permissions);
  • 404 NOT_FOUND indicates that the requested resource was not found on the server;
  • 405 METHOD NOT ALLOWED indicates that the resource does not support given request method (eg. POST request on URL that only supports GET);
  • 409 CONFLICT indicates that the resource that is being added or modified already exists on the server;
  • 415 UNSUPPORTED MEDIA TYPE indicates that the uploaded media file format is not supported;
  • 500 SERVER_ERROR indicates that the request could not be processed because of internal server error.

If not otherwise noted then any request described in this document should return one of the response statuses described above. In case there was an error processing the request, then the response body will be returned in the following format:

{    "id": "<string>",    "timestamp": "<string>",    "message": "<string>"}

The returned object contains message with short error description. In general the message together with the HTTP response status is sufficient for the client to know what has gone wrong. However that is not the case when error triggers response status 500 because the response body only contains generic error message. If this happens please contact our support with returned error id.

Migrating from 2.x.x to 3.x.x#

True-Bar service version 3.x.x comes with a new future called pipelines. Pipelines are powerful concept that enable full customization of data flow trough NLP operations. Unfortunately they also break compatibility with previous API versions. This section is here to provide overview of the steps required to migrate existing integrations to new API version.

Although not visible to the end user, previous API version has also used some primitive concept of the pipeline internally. This pipeline was static; i.e: the user could not add, remove or change the order in which its stages were executed. The only configuration that was exposed to the end user was those available at api/client/configuration endpoint which were then internally mapped to pipeline parameters. This configuration offered only subset of available options. Because it was stored on the server it also caused some issues with concurrent sessions created by the same user.

True-Bar service version 3.x.x does not store user configuration anymore. Instead the configuration must be passed each time new session is crated or along every NLP request, as a pipeline definition; i.e: sequence of stages and their configuration. For more information on how to create pipeline definition please see chapter on pipelines.

Following subsections provide a quick overview on how to migrate different clients.

Offline (REST) file upload clients#

Endpoint for offline file processing has been moved to new location and is now available at /api/pipelines/process instead of /api/client/upload. Because user configuration is no longer stored on the server endpoint now accepts additional form field named pipeline that contains pipeline definition. Pipeline definition must be supplied as a serialized JSON string representing list of stages and their configuration options.

Example of request to API V2.x.x that firs patches user configuration and then uploads a file:

curl --location --request PATCH 'https://api.true-bar.si/api/client/configuration' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{                "stt": {                    "framework": "NEMO",                    "language": "sl-SI",                    "domain": "COL",                    "model": "20221208-0800",                    "enableInterims": true                },                "nlp": {                    "punctuation": {                        "enabled": {                            "value": true,                            "enableRealFinals": true                        }                    }                }            }'
curl --location --request POST 'https://api.true-bar.si/api/client/upload?async=<true|false>' \--header 'Authorization: Bearer <access_token>'--form 'file=@"/path/to/audio.wav"'

Example of request to API V3.x.x that uploads a file along with pipeline configuration:

curl --location --request POST 'https://api.true-bar.si/api/pipelines/process?async=<true|false' \--header 'Authorization: Bearer <access_token>' \--form 'pipeline= [                    {                        "task": "asr",                        "config": {                            "tag": "NEMO_ASR:sl-SI:COL:20221208-0800",                            "parameters": {                                "enableInterims": false                            }                        }                    },                    {                        "task": "pc",                        "config": {                            "tag": "NEMO_PUNCTUATOR:sl:SI:*:*",                            "parameters": {                                "enableSplitToSentences": true                            }                        }                    }                  ]' \--form 'file=@"/path/to/file"'

Streaming (Websocket) clients#

Following changes were made to websocket API:

  • Clients are allowed to send text and binary websocket messages instead of only binary messages.
  • First message after connection is established is expected to be text message containing pipeline configuration.
  • Instead of empty packet that was previously used to indicate end of stream, there is now a special EOS text message that must be sent instead. Empty binary messages are now considered as invalid.
  • After the session is finished websocket is not automatically closed by the server. Client is notified about success/failure with appropriate status/error message. Logical session is therefore decoupled from underlying transport protocol session.
  • Non critical errors and warnings are reported as a text messages and do not cause the session to be terminated as before.
  • Different structure for messages returned by the server.

Detailed description of protocol used for establishing streaming session can be found in this chapter. Examples of client implementation can also be found under Libraries and examples section.

Libraries and examples#

This chapter lists available libraries that can be used to integrate with Truebar service. Currently we provide libraries for Java an Javascript programming languages.

Java library#

Java library is written in Java 11. It uses built in HttpClient to minimize number of external dependencies. It is packaged as a single .jar file that can be imported into the target project.

Javascript library#

Javascript library is written as a ES6 module that can be imported into the target project. It is compatible with all major web browsers that have support for WebAudio and Websockets.