Truebar API

New in version 3.0#
Pipelines
Redesigned HTTP (offline) API
Redesigned streaming API
Redesigned history API
Migrating from 2.x.x to 3.x.x

Introduction#

Welcome to Truebar API documentation.

Truebar service provides HTTP and WebSocket API for speech processing tasks, including:

Transcription
Punctuation
Text normalization
Speech synthesis
And more

This guide introduces the core features of Truebar API and helps developers understand basic concepts required to integrate with the service. If you're looking for advanced usage or full endpoint details, refer to our Swagger documentation. For instructions on how to access it please check next section. For a more hands-on approach, check out the use-case-specific guides available in the menu.

If you are new to the Truebar service, we strongly recommend reading through this document as it covers the fundamental concepts you'll need to get started.

To verify your user account and permissions, you can start by using the Truebar Dictation and Transcript Editor. Please make sure to read the QuickStart Guide (Eng, Slo).

Accessing the service#

All examples in this guide assume that the service is accessible at true-bar.si. This implies following URLs for various components:

Truebar editor: true-bar.si
Authentication service: auth.true-bar.si
Truebar API: api.true-bar.si

Truebar editor is always accessible at the base domain, while the Truebar API and authentication service are available at their respective subdomains: api and auth.

We offer our services on different hosting environments. Specific environments can be accessed by prepending {env} prefix:
Truebar editor: {env}.true-bar.si
Authentication service: {env}-auth.true-bar.si
Truebar API: {env}-api.true-bar.si

The Truebar API provides an endpoint to check its status:

curl --location --request GET 'https://api.true-bar.si/api/info'

The response includes basic API metadata such as version, build timestamp, and a path to the OpenAPI (Swagger) specification.

To test WebSocket connectivity, you can connect to: 'https://api.true-bar.si/api/ws-info'. This endpoint accepts a WebSocket connection, sends a single text message containing the same metadata as the /api/info HTTP endpoint, and then closes the connection.

⚠️ Those two endpoints (/api/info and /api/ws-info) are the only ones that are publicly accessible without authentication.

Authentication#

All requests sent to the Truebar service must contain a valid access token, otherwise they will get rejected with HTTP 401 UNAUTHORIZED status code.

Obtaining the access token#

Access tokens are issued by the authentication service. To obtain a token, send the following POST request:

curl --location --request POST 'https://auth.true-bar.si/realms/truebar/protocol/openid-connect/token' \    --header 'Content-Type: application/x-www-form-urlencoded' \    --data-urlencode 'grant_type=password' \    --data-urlencode 'username={username}' \    --data-urlencode 'password={password}' \    --data-urlencode 'client_id=truebar-client'

A response is received in JSON format with the following structure:

{    "access_token": "<string>",    "expires_in": "<number>",    "refresh_expires_in": "<number>",    "refresh_token": "<number>",    "token_type": "<string>",    "not-before-policy": "<number>",    "session_state": "<string>",    "scope": "<string>"}

access_token is valid for the duration of expires_in seconds. After that time, it expires and must therefore be refreshed using provided refresh_token:

curl --location --request POST 'https://auth.true-bar.si/realms/truebar/protocol/openid-connect/token' \    --header 'Content-Type: application/x-www-form-urlencoded' \    --data-urlencode 'grant_type=refresh_token' \    --data-urlencode 'refresh_token={refresh_token}' \    --data-urlencode 'client_id=truebar-client'

refresh_token is valid for the duration of refresh_expires_in seconds. When this time period elapses, the refresh token becomes invalid and the only way to acquire a new token is by using user credentials (i.e. username and password).

Using the access token#

Access token must be included in all requests to the Truebar API. There are two supported methods to pass the token:

1. Authorization Header (Recommended)#

Include the token in the Authorization header as a Bearer token:

curl --location --request GET 'https://api.true-bar.si/...' \--header 'Authorization: Bearer {access_token}'

✅ This is the recommended approach, as it keeps tokens out of URLs and logs.

2. URL Parameter (Not Recommended)#

If setting headers is not possible (e.g., when using certain web clients or embedded media players), you can provide the token as a query parameter:

curl --location --request GET 'https://api.true-bar.si/...?access_token={access_token}'

⚠️ Use this method only when necessary, as tokens in URLs may be logged or cached by browsers and servers.

User roles#

To use various services provided by Truebar API, multiple different permissions are available. Those are represented by different roles (authorities) encoded within JWT token. If the supplied JWT token does not contain permissions required by operation being executed, the request will get denied with 403 FORBIDDEN response code.

Following permissions are currently available:

API Basic authority that is required, to use any operation provided by Truebar API.
PIPELINE_ONLINE_API Permission to use Websocket pipeline API
PIPELINE_OFFLINE_API permission to use file upload & JSON HTTP pipeline API
STAGE_ASR, // Speech to text
STAGE_NLP_PC, // Punctuation stage
STAGE_NLP_TC, // Truecasing stage
STAGE_NLP_TN, // Text normalization stage
STAGE_NLP_ITN, // Inverse text normalization (denormalization) stage
STAGE_NLP_G2A, // Accentuation stage
STAGE_NLP_NMT, // Machine translation stage
STAGE_NLP_AC, // Autocorrect stage
STAGE_TTS, // Automatic speech synthesis
DICTIONARY_READ, // Dictionary read operations
DICTIONARY_WRITE, // Dictionary write operations
MODEL_UPDATER_READ, // ModelUpdater read operations
MODEL_UPDATER_WRITE, // ModelUpdater write operations
SESSION_READ_DISCARDED, // Read access to discarded sessions
SESSION_WRITE_DISCARDED, // Write access to discarded sessions
SESSION_HARD_DELETE, // Permission to hard delete sessions and connected recordings/session-contents
GROUP_READ, // Read access to resources belonging to user's group
GROUP_WRITE, // Write access to resources belonging to user's group
ALL_READ, // Read access to all resources
ALL_WRITE; // Write access to all resources

To check out which roles are associated with your user account, a JWT parser is needed. Use any of the free libraries that are available for this purpose or try with the following example:

let token = JSON.parse(atob(token.split('.')[1]));/*token = { "realm_access": {    "roles": ["LIVE_STREAMER", "DICTIONARY_READER", etc]  }  ...other properties omitted for readability}*/

Pipelines#

Introduction#

Pipelines are a core concept of the Truebar API. A pipeline is a sequence of processing stages, where each stage performs a specific natural language or speech processing task. Data flows through the pipeline from stage to stage, with the output of one stage becoming the input to the next.

Each stage performs a well-defined operation and works with either text or audio data. Stages can be grouped into the following categories:

Speech-to-Text (STT)
- Input: audio
- Output: text
- Example: ASR (Automatic Speech Recognition)
Text-to-Text (NLP)
- Input: text
- Output: text
- Examples: punctuation, text normalization, truecasing, translation, autocorrection
Text-to-Speech (TTS)
- Input: text
- Output: audio
- Example: TTS (Speech Synthesis)

Pipelines are built by chaining one or more compatible stages. Two stages are compatible if the output type of the first matches the input type of the next.

✅ Valid: ASR (audio → text) → Punctuation (text → text) → TTS (text → audio)
❌ Invalid: ASR (audio → text) → TTS (text → audio) → Truecasing (text → text) — this is invalid because TTS outputs audio, but Truecasing expects text.

Pipelines must be linear, which means the data flows from the first stage to the last without loops or branches. However, a pipeline can include multiple stages that perform the same task. The following example demonstrates a pipeline with limited practical use, but it shows that the ASR stage can appear multiple times:

✅ Valid: ASR (audio → text) → Punctuation (text → text) → TTS (text → audio) → ASR (audio → text)

Stages#

A stage is defined by the specific task it performs within a pipeline. Following is a list of stages offered by Truebar service:

STT (Speech-to-Text)
asr - Automatic Speech Recognition
TTT (Text-to-Text / NLP)
NLP_pc - Automatic Punctuation
NLP_tc - Automatic Truecasing
NLP_tn - Text Normalization
NLP_itn - Inverse Text Normalization
NLP_ac - Auto Correct
NLP_g2a - Accentuation
NLP_nmt - Natural Language Translation
NLP_ts - Text Summarizing
TTS (Text-to-Speech)
tts - Text To Speech

⚠️ Note: that not all stages are available to every user account. Additionally, each stage may support a set of configuration options that vary depending on the selected model, user permissions, and system capabilities.

Before building a pipeline, it is recommended to query the Truebar service for currently available stages and their configuration options using the following API call:

curl --location --request GET 'https://api.true-bar.si/api/pipelines/stages' \--header 'Authorization: Bearer <access_token>'

Response is returned as JSON encoded list of available stages:

[    {        "task": "<string>",        "type": "<string>",        "config": [            {                "tag": "<string>",                "features": [                    "<string>",                    ...                ],                "parameters": {                    "parameterName": {                        "type": "<string>",                        "values" : ["<object>", ...],                        "defaultValue": "<object>"                    },                    ...                }            }        ]    }    ```]

Each stage has the following properties:

task - Specifies the task performed by the stage. This corresponds to one of the values listed in the Stages section above (e.g., ASR, NLP_pc, tts, etc.).
type - Specifies a type of operation that the stage performs. This in turn defines input and output data type. Possible values are: STT, TTT, TTS;
config - Describes the configuration options available for the stage. This section include:
tag - A unique identifier for a model configuration. It consists of four parts in the format: framework:language-code:domain-code:version. Each part can either be specified by alphanumeric code or * character representing generic part. For example the tag KALDI:sl-SI:*:* refers to a Slovenian model on KALDI framework, with unspecified domain and version.
features - A list of model-specific features that cannot be changed by the user.
parameters - A dictionary of configurable parameters supported by the model. Each parameter has the following properties:
name - Name of the parameter.
type - Data type of the parameter (e.g., string, boolean, integer, dictionary.).
values - (optional) A list of allowed values for parameters that have a predefined set of options.

Following is a detailed description of stages with a description of features and parameters available as a part of configuration option.

ASR (Automatic Speech Recognition)#

The ASR stage performs automatic transcription of spoken language into text and can be configured for advanced use cases such as speaker diarization and recognition of dictated commands or punctuation.

Features:
onlineAsr - model supports online transcription and is therefore more appropriate for use with streaming API,
offlineAsr - model supports offline transcription and is therefore more appropriate for use with HTTP API,
dictatedCommands - model recognizes dictated commands and returns special tokens representing spoken command,
dictatedPunctuations - model recognizes dictated punctuations and returns special tokens representing spoken punctuation.
Parameters:
enableUnks - if enabled, then model returns <unk> symbol for tokens which could not be recognized,
enableInterims - If enabled, the model returns interim (partial) transcriptions before the final result is available. Useful for real-time applications,
enableSd - Enables speaker diarization, allowing the model differentiate between multiple speakers in the audio,
sdSpeakerSet - A predefined set of speaker identifiers. If provided, the model attempts to map detected speakers to known names from this set.
sdMinSpeakers - Minimum number of speakers to detect during diarization.
sdMaxSpeakers - Maximum number of speakers to detect during diarization.

PC (Automatic Punctuation)#

The PC stage restores missing punctuation in a text, making it easier to read and suitable for further natural language processing tasks. It can also optionally split text into individual sentences.

Features: None
Parameters:
enableSplitToSentences - When enabled, the output text is split into sentences based on predicted punctuation marks. A sentence is considered complete only if the segment ends with a final punctuation mark (., ?, !).
If the segment does not end with a final punctuation, the remaining part after the last such punctuation is treated as an interim segment.
During a session, the interim part is cached and automatically prepended to the next incoming text segment to ensure continuity.

TC (Automatic Truecasing)#

The TC stage restores proper capitalization (truecasing) in a text. It adjusts the case of words, typically capitalizing the beginning of sentences and proper nouns, improving readability and preparing the text for further processing or display.

Features: None
Parameters: None

TN (Text Normalization)#

The TN stage transforms text from its spoken form into a more formal, standardized written representation. This may include expanding abbreviations, numbers, dates, or currency expressions into their full written forms to improve readability and downstream processing.

Features: None
Parameters:
- customConfig A dictionary representing custom normalizer configuration. Note: For more information on available options, please contact support.

ITN (Inverse Text Normalization)#

The ITN stage transforms standardized written text into a form closer to how it would naturally be spoken. This typically includes converting numerals, dates, and other formal expressions into their spoken equivalents (e.g., "2025" → "two thousand twenty-five").

Features: None
Parameters: None

AC (Auto Correct)#

The AC stage automatically corrects text by applying a set of predefined or user-defined replacements. It can be used to fix common typing or transcription errors, or to standardize terminology.

Features: None
Parameters
- replacementSets A list of replacement sets to apply. These may include:
System-defined replacements (common corrections),
Shared sets (Shared by other users and available to the group),
User-defined sets (custom corrections created by the user).

G2A (Accentuator)#

The G2A stage applies accentuation to the text. This is particularly useful for preparing text for speech synthesis.

Features: None
Parameters:
fastEnabled - Enables faster accentuation at the cost of reduced accuracy.
customTransformations - A list of user-defined transformations that can be applied to adjust accentuation as needed.

NMT (Neural Mmachine Translation)#

The NMT stage performs automatic translation of text from one language to another. It leverages advanced neural machine translation models to ensure accurate and fluent translations.

Features: None
Parameters: None

TS (Text Summarizing)#

The TS stage condenses long pieces of text into shorter summaries while retaining the essential information. It can generate both short and long summaries and optionally highlight key points and keywords.

Features: None
Parameters:
- summaryType - Determines the length of the summary. Possible values:
  - LONG: Generates a more detailed summary.
  - SHORT: Generates a concise summary.
- keywordsEnabled — If enabled, the stage returns a list of key terms or keywords extracted from the text.
- keypointsEnabled — If enabled, the stage highlights key points from the text, providing important insights or conclusions.
- diarizationEnabled — If enabled, the summary is organized by speakers, showing who said what, making it useful for summarizing dialogues or meetings.

TTS (Text To Speech)#

The TTS stage converts written text into spoken audio. This is useful for generating synthesized speech from text-based content.

Features
Parameters

Building a pipeline#

As mentioned earlier, pipelines are built by connecting multiple stages together. A pipeline definition is simply a JSON structure that defines the list of stages contained within the pipeline. Each stage has to be configured by setting the following attributes:

task - Defines the task that will be performed within the stage.
exceptionHandlingPolicy - Defines the policy to apply if an error occurs during data processing within the stage. Possible values are: SKIP (ignore the error and continue) or THROW (stop the pipeline and propagate the error).
config - Defines the stage's configuration, including:
tag - A tag representing the specific model to be used within the stage. The tag consists of the framework, language code, domain code, and version, separated by colons.
parameters - Additional parameters to configure the stage. These parameters depend on the selected tag. For each stage, you can set parameters as returned in the stage metadata, as described earlier.

Examples#

Simple pipeline for basic audio transcription which only uses asr stage that is configured to use model KALDI:sl-SI:COL:20221208-0800 with enableInterims option enabled:

[    {        "task": "ASR",        "exceptionHandlingPolicy" : "THROW",        "config": {            "tag": "KALDI:sl-SI:COL:20221208-0800",            "parameters": {                "enableInterims": true            }        }    }]

More complex pipeline that first uses ASR stage to transcribe input audio data and then pc and itn stages to automatically insert punctuations and denormalize result:

[    {        "task": "ASR",        "exceptionHandlingPolicy" : "THROW",        "config": {            "tag": "KALDI:sl-SI:COL:20221208-0800",            "parameters": {                "enableInterims": true            }        }    },    {        "task": "NLP_pc",        "exceptionHandlingPolicy" : "SKIP",        "config": {            "tag": "NEMO_PUNCTUATOR:sl-SI:*:*",            "parameters": {                "enableSplitToSentences": true            }        }    },    {        "task": "NLP_itn",        "exceptionHandlingPolicy" : "SKIP",        "config": {            "tag": "DEFAULT_DENORMALIZER:sl-SI:*:*"        }    }]

An example of invalid pipeline because of mismatch between first stage output type and second stage input type:

[    {        "task": "NLP_pc",        "exceptionHandlingPolicy" : "THROW",        "config": {            "tag": "NEMO_PUNCTUATOR:sl-SI:*:*"        }    },    {        "task": "ASR",        "exceptionHandlingPolicy" : "THROW",        "config": {            "tag": "KALDI:sl-SI:COL:20221208-0800"        }    }]

Pipeline execution#

Pipelines can be executed using either the offline HTTP API or the streaming (WebSocket) API.

Offline (HTTP) API#

The Offline API is exposed via an HTTP endpoint and provides a simple, synchronous or asynchronous interface for pipeline invocation. As previously mentioned, each pipeline accepts a specific type of input (e.g., audio or text) defined by its initial stage and produces a specific output type defined by its final stage. Therefore, the API endpoint is designed to support a multiple input/output data formats as determined by the pipeline definition provided at runtime.

Since pipelines are defined dynamically by the client, the endpoint requires the pipeline definition to be submitted together with the input data. The endpoint for pipeline execution is:

https://api.true-bar.si/api/pipelines/process

This endpoint accepts a multipart HTTP request body, which allows combining:

JSON (application/json) parts for the pipeline definition and text data (if applicable),
Binary (application/octet-stream) parts for audio input (if applicable).

The response format, JSON or binary, depends on the output type of the final stage in the pipeline.

Data types and formats#

Text data is always given as JSON encoded list of text-segments. This holds true for both, data send to and received from the server. Each text segment contains a list of tokens. Below is the example of such structure with both required and optional properties.

[    {        "isFinal": "true | false",        "startTimeMs": "<number>",        "endTimeMs": "<number>",        "tokens": [            {                "startOffsetMs": "<number>",                "endOffsetMs": "<number>",                "isLeftHanded": "true | false",                "isRightHanded": "true | false",                "text": "<string>"                "speakerCode" : <string>                "confidence" . <number>            }        ]    },    ...]

isFinal - If this flag is set to true, then text-segment represents final hypothesis.
startTimeMs - Represents beginning timestamp of text-segment in milliseconds relative to the start of session.
endTimeMs - Represents ending timestamp of text-segment in milliseconds relative to the start of session.
tokens - A list of tokens contained in text-segment.
- startOffsetMs - An offset in milliseconds from the text-segment's startTimeMs property representing beginning timestamp of token.
- endOffsetMs - An offset in milliseconds from the text-segment's startTimeMs property representing ending timestamp of token.
- isLeftHanded - True if token is left-handed, false otherwise.
- isRightHanded - True if token is right-handed, false otherwise.
- text - Actual text content of a token.
- speakerCode - Identifier or name of the speaker who uttered the token.
- confidence Represents the confidence score for the token, indicating how likely it is that the text is correct.

The only required properties from the above list are 'tokens' property of 'text-segment' and 'text' property of each token respectively. All other properties are optional and depends on actual operation that is executed.

The minimal example to pass a sentence to the aoi would be:

[    {        "tokens": [            { "text": "Today" },            { "text": "is" },            { "text": "a" },            { "text": "beautiful" },            { "text": "day" }            { "text": "." }        ]    },]

Audio data can be passed to API in various formats, as long as the format is supported by the decoding infrastructure on the server. We support a broad range of popular audio and video container formats and codecs. However, if you're unsure whether your format is supported, or you experience issues, we recommend reaching out to our support team for a definitive list of compatible formats.

For best results, use uncompressed audio formats such as WAV or FLAC.
Avoid using media containers with multiple audio tracks, as the service currently cannot automatically select the correct one for decoding.
Make sure the audio is clearly recorded and has minimal background noise to ensure accurate recognition and processing.

Examples#

Following is the example of processing pipeline with single punctuation stage, which means that endpoint expects and produces text based data.

Curl command:

curl --location 'http://true-bar.si/api/pipelines/process' \--header 'Content-Type: multipart/form-data' \--header 'Authorization: Bearer <access_token>' \--form 'pipeline="[{\"task\":\"NLP_pc\",\"exceptionHandlingPolicy\":\"THROW\",\"config\":{\"tag\":\"NEMO_PC:sl-SI:*:*\", \"parameters\":{\"enableSplitToSentences\" : \"false\"}}}]";type=application/json' \--form 'data="[{\"tokens\":[{\"text\":\"Danes\"},{\"text\":\"je\"},{\"text\":\"lep\"},{\"text\":\"dan\"}]}]";type=application/json'

The above command produces following HTTP request:

POST /api/pipelines/process HTTP/1.1Host: true-bar.siContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gWAuthorization: Bearer <access_token>Content-Length: 462
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="pipeline"Content-Type: application/json
[{"task":"NLP_pc","exceptionHandlingPolicy": "THROW","config":{"tag":"NEMO_PC:sl-SI:*:*", "parameters":{"enableSplitToSentences" : "false"}}}]------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="data"Content-Type: application/json
[{"tokens":[{"text":"Danes"},{"text":"je"},{"text":"lep"},{"text":"dan"}]}]------WebKitFormBoundary7MA4YWxkTrZu0gW--

The response for the above request is returned as JSON formatted list of text segments:

[    {        "isFinal": true,        "tokens": [            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "Danes"            },            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "je"            },            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "lep"            },            {                "isLeftHanded": false,                "isRightHanded": false,                "text": "dan"            },            {                "isLeftHanded": true,                "isRightHanded": false,                "text": "."            }        ]    }]

Execution of speech to text pipelines is the same but with different input data format. Let's see the following example of speech recognition.

curl --location 'http://true-bar.si/api/pipelines/process' \--header 'Content-Type: multipart/form-data' \--header 'Authorization: Bearer <acess_token>' \--form 'pipeline="[{\"task\":\"ASR\",\"exceptionHandlingPolicy\":\"THROW\",\"config\":{\"tag\":\"KALDI:sl-SI:COL:20211214-1431\", \"parameters\":{\"enableUnks\":false, \"enableSd\" : false, \"enableInterims\": true}}}]";type=application/json' \--form 'data=@"/path/to/audio.wav"'

The above command results to following HTTP request:

POST /api/pipelines/process HTTP/1.1Host: true-bar.siContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gWAuthorization: Bearer <acess_token>Content-Length: 457
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="pipeline"Content-Type: application/json
[{"task":"ASR","exceptionHandlingPolicy": "THROW","config":{"tag":"KALDI:sl-SI:COL:20211214-1431", "parameters":{"enableUnks":false, "enableSd" : false, "enableInterims": true}}}]
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="data"; filename="1second.wav"Content-Type: application/octet-stream
<audio_data>------WebKitFormBoundary7MA4YWxkTrZu0gW--

The above examples cover all possible types of input while the output was in both cases a list of text-segments. There are two more cases that produces audio data at the output. We wont be giving full examples here as the request looks the same as with the above examples. The only difference is that the result that is returned is not a list of text segments but rather a binary data representing wave encoded audio.

Streaming API#

The Streaming API is primarily designed for real-time transcription of audio data. It can also be used with TTS (text-to-speech) or STS (speech-to-speech) pipelines, but users should be aware of certain limitations: while data is streamed in both directions, synthesis does not occur in true real-time as speech recognition does.

Protocol description#

The Streaming API is implemented using the WebSocket protocol. It operates by sending small data chunks over an opened WebSocket connection. Due to WebSocket's bidirectional and asynchronous nature, responses can also be streamed back incrementally as soon as they become available.

The following example demonstrates how to establish a WebSocket connection using JavaScript:

let ws = new Websocket("wss://api.true-bar.si/api/pipelines/stream?access_token={access_token}")

In this example, the access_token URL parameter is optional. It may be omitted if the token is provided via other means. For instance, if the WebSocket client supports modifying HTTP headers during the WebSocket upgrade request, the access_token can be sent as a Bearer token in the Authorization header.

If the connection is denied or an error occurs before the WebSocket upgrade completes, an HTTP error response will be returned with the appropriate status code and description. If an error occurs after the WebSocket connection is established, it will be reported as a WebSocket message.

There are two basic types of websocket messages: binary messages which are used for transferring audio data from a client to the server and text messages which are used for transferring JSON formatted objects in both directions. All text messages are valid JSON objects and share a common structure:

{    "type": "CONFIG | EOS | STATUS | TEXT_SEGMENT"}

The substructure depends on the 'type" property.

CLIENT -> SERVER messages

CONFIG - Message contains pipeline definition that is used to configure the session. If optional sessionId parameter is given, the server will search for a session with the given id and resume it, in case it exists. Otherwise, if the parameter is not given, new session will be created on the fly. Each created session can be resumed multiple times. Resuming existing session creates a new recording entity on server but it does not create a new session entity. All recordings created as part of existing session or new session, can be later accessed individually or as a part of the session to which they belong.

{    "type": "CONFIG",    "sessionId" : <number>    "pipeline": "<Pipeline definition as described in previous chapter.>"}

EOS - End-Of-Stream message is used to request closing a session. This must be the last message sent to the server. Optional 'lockSession' attribute is used to request that the server locks session after it is finished. The key to unlock it is returned as a part of FINISHED status update from server.

{    "type": "EOS",    "lockSession" : true | false}

SERVER -> CLIENT messages

STATUS - Message is sent by the server on every session state transition. There are several variants of this message that depends on specific state transition.
INITIALIZED - message indicates that the session has been successfully initialized. Client must wait until this message arrives before attempting to send any message trough opened websocket connection.

{    "type": "STATUS",    "status": "INITIALIZED"}

CONFIGURED - Message indicates that the session has started. It is received after the server successfully processes CONFIG message and reserves all necessary resources for the session.

   {    "type": "STATUS",    "status": "CONFIGURED",    "sessionId" : "<number>",      // Id of created session    "recordingId" : "<number>"    // Id of created recording   }

FINISHED - Message indicates that the session has been successfully finished. At this point all results were successfully returned to the client so it is safe to close Websocket connection.

    {        "type": "STATUS",        "status": "FINISHED"        "lengthMs" "<number>"           // Length of created recording in milliseconds        "sessionLockKey" : "<string>"   // The key to unlock a session if it was requested to be locked in EOS message.      }

WARNING - Non-critical issues such as soft quota limits or recoverable errors. The session remains usable.

    {        "type": "WARNING",        "message": <string>    }

ERROR - Indicates a critical error while processing the stream. The session is no longer usable, and the client should disconnect. Errors may be triggered based on pipeline configuration (e.g., exceptionPolicy settings).

    {        "type": "ERROR",        "error" : {            "id": "<string>",            "timestamp": "<string>",            "message": "<string>"        }    }

SERVER <-> CLIENT messages

TEXT_SEGMENT - Message represent result in a form of text segment.

    {        "type": "TEXT_SEGMENT",        "textSegment": {        // ... same structure as described in previous chapter        }    }

Typical connection flow can be represented by the following diagram:

For a more in depth description of specific use cases please see following tutorials:
Transcribing live audio stream .
Streaming based speech synthesis.

History#

List of sessions#

All STT and TTS sessions created trough Websocket or HTTP API can be accessed trough HTTP API endpoints described in this section.

The following example shows how to obtain a list of all sessions the user can read:

curl --location --request GET 'api.true-bar.si/api/client/sessions' \--header 'Authorization: Bearer <access_token>'

Response:

{  "cursor" : "<corsor_value>",            // Cursor pointing to a location of session entry  "content": [                            // List of sessions    {      "id": "<number>",                   // Session identifier      "name": "<string>",                 // Name of the session      "status" : "INITIALIZING | IN_QUEUE | IN_PROGRESS | FINISHED | CANCELED | ERROR",      "numRecordings": "<number>",        // Number of recordings created under the same session      "recordedMs": "<number>",           // Total recorded milliseconds      "processedMs": "<number>",          // Total processed milliseconds      "createdAt": "<string>",            // Creation date and time      "updatedAt": "<string>",            // Date and time of last update (eg. new recording)      "createdByUser": "<string>",        // Username under which the session was created      "createdByGroup" : "<string>",      // Group under which the session was created      "isLocked" : "<boolean>",           // True if session is locked.      "isDiscarded" : "<boolean>",        // True if session is discarded.      "notes": "<string>",                // Session notes      "labels": [                         // List of labels assigned to session        {          "isEnabled": "<boolean>",          "label": {            "id": "<number>",            "code": "<string>",            "color": "<string>",            "isDefault": "<boolean>",            "isAssigned": "<boolean>"          }        }        // Some fields not relevant for normal users are omitted.      ]    }  ]}

Cursor-based slicing#

Instead of traditional pagination (offset, limit`), this endpoint supports cursor-based slicing, which is more suitable for dynamic collections like session history that frequently change (sessions being added, deleted, or updated). Following parameters are provided for working with slices:

slice-cursor - Cursor to the beginning of a slice. Points to a location in a collection.
slice-length - The number of sessions to return:
- Positive value: returns n sessions after slice-cursor.
- Negative value: returns n sessions before slice-cursor.

If slice-cursor is not provided, the API returns the first 20 sessions by default, along with cursor values for each session.

To fetch the next slice, use the slice-cursor of the last session received in the previous response. This approach ensures no sessions are skipped or duplicated, even if the collection changes in between requests.

Sorting#

Endpoint also supports sorting the collection by using sort request parameter with following values::

id
name
createdAt,
updatedAt,
numRecordings
recordedSeconds

Filtering#

Optional filters can be provided to narrow down results:

name : Return only sessions with names containing the given value,
label : Return only sessions that are labeled with the given label,
created-after : Return only sessions that were created after the given date and time,
created-before : Return only sessions that were created before the given date and time.

Examples#

The following example shows the request that returns first 30 sessions containing word "Test" in their names and sorts the results by creation time ascending:

curl --location --request GET 'api.true-bar.si/api/client/sessions?page=0&size=30&name=Test&sort=createdAt,asc' \--header 'Authorization: Bearer <access_token>'

Specific session details#

This endpoint provides detailed information about a single session. If the session identifier is known in advance, this method is more efficient than querying the entire session collection, as it directly retrieves data for that session.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'

session-id: Unique session identifier

Response:

The response will contain the same details as a single entry from the session history list, including information like:

Audio file#

This endpoint allows accessing the audio file created within a given session. It supports HTTP resource region specification, enabling the audio file to be transferred in smaller pieces. This feature is useful for streaming audio files instead of downloading the entire file before playback.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/audio.wav' \--header 'Authorization: Bearer <access_token>'

session-id: Unique session identifier
HTTP Range Header (optional): You can specify a range for the audio file to retrieve smaller pieces of it, enabling streaming playback.

If you're working with an audio player that doesn’t support modifying request headers (like adding an Authorization header), you can pass the access_token directly as a query parameter instead of using the Bearer token in the header.

Session text-segments#

Returns list of text-segment associated with the session.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/transcripts' \--header 'Authorization: Bearer <access_token>'

session-id: Unique session identifier

Response:

[    {        "id":"<number>",            // Text-segment identifier        "content":"<JSON string>"   // JSON encoded transcript content with the same structure as transcripts received trough Websocket API.    },    ...]

Session recordings#

Returns list of recordings created within the session. The request supports a pageable interface.

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/recordings' \--header 'Authorization: Bearer <access_token>'

session-id: Unique session identifier
page (optional) - Page number
size (optional) - Page size

Response:

{  "totalPages": "<number>",       // Total number of pages available  "totalElements": "<number>",    // Total number of elements available on all pages  "first": "<boolean>",           // True if this is the first page  "last": "<boolean>",            // True if this is the last page  "number": "<number>",           // Current page number  "size": "<number>",             // Number of elements on current page  "empty": "<boolean>",           // True if this page is empty  "content": [        {            "id": "<number",                                    // Recording identifier            "duration": "<number>",                             // Length of the recording in seconds            "isDiscarded": "<boolean>",                         // Should always be true for normal users            "pipeline" <json>                                   // A pipelines definition that was used with recording.        }    ]  }

Session sharing#

Sessions are by default only visible to the user that created them and a group admin. Session sharing mechanism allows the user to share his sessions with other group members. There are several endpoints that can be used to manage session shares.

List session shares#

Request:

curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>'

Response:

[    {        "id": "<number>",           // share id        "createdAt": "<string>",        "sharedWith": {            "id": "<number>",       // user id            "username": "<string>"        }    },    ...]

Adding session share#

List of users whom the session can be shared with can be obtained by the following request.

Request:

curl --location --request GET 'api.true-bar.si/api/client/users' \--header 'Authorization: Bearer <access_token>'

Response:

[    {        "id": "<number>",        "username": "<string>"    },    {        "id": "<number>",        "username": "<string>"    },    ...]

Session can be shared with selected user with the following request:

Request:

curl --location --request POST 'api.true-bar.si/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{    "userId" : <user-id>}'

Delete session share#

curl --location --request DELETE 'api.true-bar.si/api/client/sessions/shares/<share-id>' \--header 'Authorization: Bearer <access_token>' \

Advanced features#

Apart from the basic features that are exposed through Truebar API, there are also other, more advanced features that can be used to achieve highest possible accuracy of recognition and best possible user experience.

Replacements#

Response statuses and errors#

HTTP API#

This section describes expected HTTP response statuses for endpoints described in this document.

Response statuses for various HTTP methods, when operation is successful:

GET : response status 200 OK along with the response body;
POST : response status 201 CREATED and automatically generated unique resource identifier in the response body or 202 ACCEPTED for async operations;
PATCH : response status 204 NO_CONTENT without response body;
DELETE: response status 204 NO_CONTENT without response body;

Response statuses for errors:

400 BAD_REQUEST Indicates that the request could not be processed due to something that is perceived to be a client error;
401 UNAUTHORIZED indicates that the request has not been applied because it lacks valid authentication credentials (invalid JWT or not present);
403 FORBIDDEN indicates that the server understood the request but refuses to authorize it (e.g. insufficient permissions);
404 NOT_FOUND indicates that the requested resource was not found on the server;
405 METHOD NOT ALLOWED indicates that the resource does not support given request method (eg. POST request on URL that only supports GET);
409 CONFLICT indicates that the resource that is being added or modified already exists on the server;
415 UNSUPPORTED MEDIA TYPE indicates that the uploaded media file format is not supported;
500 SERVER_ERROR indicates that the request could not be processed because of internal server error.

If not otherwise noted then any request described in this document should return one of the response statuses described above. In case there was an error processing the request, then the response body will be returned in the following format:

{    "id": "<string>",    "timestamp": "<string>",    "message": "<string>"}

The returned object contains message with short error description. In general the message together with the HTTP response status is sufficient for the client to know what has gone wrong. However that is not the case when error triggers response status 500 because the response body only contains generic error message. If this happens please contact our support with returned error id.

Migrating from 2.x.x to 3.x.x#

True-Bar service version 3.x.x comes with a new future called pipelines. Pipelines are powerful concept that enable full customization of data processing flow. Unfortunately they also break compatibility with previous API versions. This section is here to provide overview of the steps required to migrate existing integrations to new API version.

Although not visible to the end user, previous API version has also used some primitive concept of the pipeline internally. This pipeline was static; i.e: the user could not add, remove or change the order in which its stages were executed. The only configuration that was exposed to the end user was those available at api/client/configuration endpoint which were then internally mapped to pipeline parameters. This configuration offered only subset of available options. Because it was stored on the server it also caused some issues with concurrent sessions created by the same user.

True-Bar service version 3.x.x does not store user configuration anymore. Instead the configuration must be passed each time new session is created or along every NLP request, as a pipeline definition; i.e: sequence of stages and their configuration. For more information on how to create pipeline definition please see chapter on pipelines.

Following subsections provide a quick overview on how to migrate different clients.

Offline (REST) file upload clients#

Endpoint for offline file processing has been moved to new location and is now available at /api/pipelines/process instead of /api/client/upload. Because user configuration is no longer stored on the server, endpoint now accepts additional form field named pipeline that contains pipeline definition. Pipeline definition must be supplied as a serialized JSON string representing list of stages and their configuration options.

Example of a file upload procedure on API V2.x.x. First request patches user configuration:

curl --location --request PATCH 'https://api.true-bar.si/api/client/configuration' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{                "stt": {                    "framework": "NEMO",                    "language": "sl-SI",                    "domain": "COL",                    "model": "20221208-0800",                    "enableInterims": true                },                "nlp": {                    "punctuation": {                        "enabled": {                            "value": true,                            "enableRealFinals": true                        }                    }                }            }'

And then the second request uploads a file:

curl --location --request POST 'https://api.true-bar.si/api/client/upload?async=<true|false>' \--header 'Authorization: Bearer <access_token>'--form 'file=@"/path/to/audio.wav"'

And here is how a processing of the same file looks like when using API V3.x.x:

curl --location --request POST 'https://api.true-bar.si/api/pipelines/process?async=<true|false' \--header 'Authorization: Bearer <access_token>' \--form 'pipeline= [                    {                        "task": "ASR",                        "exceptionHandlingPolicy" : "THROW"                        "config": {                            "tag": "NEMO_ASR:sl-SI:COL:20221208-0800",                            "parameters": {                                "enableInterims": false                            }                        }                    },                    {                        "task": "NLP_pc",                        "exceptionHandlingPolicy" : "SKIP"                        "config": {                            "tag": "NEMO_PUNCTUATOR:sl:SI:*:*",                            "parameters": {                                "enableSplitToSentences": true                            }                        }                    }                  ]' \--form 'file=@"/path/to/file"'

Streaming (Websocket) clients#

Following changes were made to websocket API:

Clients are allowed to send text and binary websocket messages instead of only binary messages.
First message after connection is established is expected to be text message containing pipeline configuration.
Instead of empty packet that was previously used to indicate end of stream, there is now a special EOS text message that must be sent instead. Empty binary messages are now considered as invalid.
After the session is finished websocket connection is not automatically closed by the server. Client is notified about success/failure with appropriate status/error message. Logical session is therefore decoupled from underlying transport protocol session.
Non-critical errors and warnings are reported as a text messages and do not cause the session to be terminated as before.
Different structure for messages returned by the server.

Detailed description of protocol used for establishing streaming session can be found in this chapter.