Truebar API v3.0
#
New in this version
#
IntroductionWelcome to the Truebar service guide! This document covers the fundamental features of Truebar service.
The Truebar service exposes HTTP API, offering a straightforward method for accessing a range of features. Primarily focused on speech processing, it facilitates various operations including transcription, punctuation, normalization, synthesis, and more.
For an initial test of your user account and associated privileges, you can use the Truebar, dictation and transcript editor. Please make sure to read the QuickStart Guide (Eng, Slo).
The subsequent chapters of this guide serve as a step-by-step tutorial for leveraging basic API functionalities. For those seeking more advanced insights, our Swagger documentation provides more detailed information.
#
Accessing the serviceAll examples in this guide presume that the service is accessible at 'true-bar.si'. This implies following URLs for various components:
- Truebar editor: true-bar.si
- Authentication service: auth.true-bar.si
- Truebar API: api.true-bar.si
Truebar editor is always accessible at the base domain, while the Truebar API and authentication service are available at their respective subdomains: 'api' and 'auth'.
The Truebar API provides an endpoint to check its status:
curl --location --request GET 'https://true-bar.si/api/info'
The response contains basic API information such as version, build date, and the path to OpenAPI specifications.
For users interested in testing WebSocket connections, there's also an WebSocket endpoint available at 'https://true-bar.si/api/ws-info'. This endpoint operates by accepting incoming WebSocket connections, sending a single text message with a content identical to the response above, and then closing the connection.
#
AuthenticationAll requests that are sent to the Truebar service must contain a valid access token, otherwise they will get denied with HTTP 401 UNAUTHORIZED status code. Access tokens can be acquired by sending the following HTTP request to the authentication service:
curl --location --request POST 'https://auth.true-bar.si/auth/realms/truebar/protocol/openid-connect/token' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=password' \ --data-urlencode 'username={username}' \ --data-urlencode 'password={password}' \ --data-urlencode 'client_id=truebar-client'
A response is received in JSON format with the following structure:
{ "access_token": "<string>", "expires_in": "<number>", "refresh_expires_in": "<number>", "refresh_token": "<number>", "token_type": "<string>", "not-before-policy": "<number>", "session_state": "<string>", "scope": "<string>"}
Each request that is sent to the Truebar service, must be authenticated via an access_token
.
The token can be given as a Bearer token inside the HTTP request header or as an URL parameter.
It is recommended to use the first option to avoid access_token
from being caught into server logs or browser history.
The following code demonstrates how to send a request to the backend and pass a token via an Authorization header:
curl --location --request GET 'https://api.true-bar.si/...' \--header 'Authorization: Bearer {access_token}'
Despite the above mentioned reasons to pass access_token
via Authorization headers, it sometimes becomes necessary to use URL parameter option instead.
Such example would be an application that does not allow modification of request headers (e.g. audio player).
The following code demonstrates how to send a request to the backend and pass a token via an URL parameter:
curl --location --request GET 'https://api.true-bar.si/...?access_token={access_token}'
access_token
is valid for the duration of expires_in
seconds. After that time, it expires and must therefore be refreshed using provided refresh_token
:
curl --location --request POST 'https://auth.true-bar.si/auth/realms/truebar/protocol/openid-connect/token' \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=refresh_token' \ --data-urlencode 'refresh_token={refresh_token}' \ --data-urlencode 'client_id=truebar-client'
refresh_token
is valid for the duration of refresh_expires_in
seconds. When this time period elapses, the refresh token becomes invalid and the only way to acquire a new token is by using user credentials (i.e. username and password).
#
User rolesTo use various services provided by Truebar API, multiple different permissions are available. Those are represented by different roles (authorities) encoded within JWT token. If the supplied JWT token does not contain permissions required by operation being executed, the request will get denied with 403 FORBIDDEN response code.
Following permissions are currently available:
API Basic authority that is required, to use any operation provided by Truebar API.
PIPELINE_ONLINE_API Permission to use Websocket pipeline API
PIPELINE_OFFLINE_API permission to use file upload & JSON HTTP pipeline API
STAGE_ASR, // Speech to text
STAGE_NLP_PC, // Punctuation stage
STAGE_NLP_TC, // Truecasing stage
STAGE_NLP_TN, // Text normalization stage
STAGE_NLP_ITN, // Inverse text normalization (denormalization) stage
STAGE_NLP_G2A, // Accentuation stage
STAGE_NLP_NMT, // Machine translation stage
STAGE_NLP_AC, // Autocorrect stage
STAGE_TTS, // Automatic speech synthesis
DICTIONARY_READ, // Dictionary read operations
DICTIONARY_WRITE, // Dictionary write operations
MODEL_UPDATER_READ, // ModelUpdater read operations
MODEL_UPDATER_WRITE, // ModelUpdater write operations
SESSION_READ_DISCARDED, // Read access to discarded sessions
SESSION_WRITE_DISCARDED, // Write access to discarded sessions
SESSION_HARD_DELETE, // Permission to hard delete sessions and connected recordings/session-contents
GROUP_READ, // Read access to resources belonging to user's group
GROUP_WRITE, // Write access to resources belonging to user's group
ALL_READ, // Read access to all resources
ALL_WRITE; // Write access to all resources
To check out which roles are associated with your user account, a JWT parser is needed. Use any of the free libraries that are available for this purpose or try with the following example:
let token = JSON.parse(atob(token.split('.')[1]));/*token = { "realm_access": { "roles": ["LIVE_STREAMER", "DICTIONARY_READER", etc] } ...other properties omitted for readability}*/
#
Pipelines#
IntroductionPipelines are core concept of Truebar API. They consists of a series of stages that are executed one after another. Single stage represents single NLP operation. Is accepts input and provides output data that may be in a form of text or audio, depending on operation it performs. Therefore we can divide stages to three distinct groups:
- speech to text - accepts audio data as input and returns text as output. Example of such stage is ASR - automatic speech recognition;
- text to text - accepts text as input and returns text as output. Most operations fall into this category, such as for example punctuation, text normalization, etc;
- text to speech - accepts text as input and returns audio data as output. Example of such stage is speech synthesis.
Pipeline can simply be built by taking one or many (compatible) stages and chain them together. Two stages can be chained if their outputs and inputs are compatible; stage output must be of the same type an input to the next stage. Data always flows from the first stage to the last which means that pipeline can not contain loops, but doesn't mean that single pipeline can not contain multiple stages performing same task.
#
StagesStage is defined by the task it performs. Following is a list of stages implemented by Truebar service:
- STT
- asr - Automatic Speech Recognition
- TTT (NLP)
- NLP_pc - Automatic Punctuation
- NLP_tc - Automatic Truecasing
- NLP_tn - Text Normalization
- NLP_itn - Inverse Text Normalization
- NLP_ac - Auto Correct
- NLP_g2a - Accentuator
- NLP_nmt - Natural Language Translation
- TTS
- tts - Text To Speech
Not all of the above stages may actually be be available to all user accounts. In addition, each stage can also contain set of configuration options that are dependent on selected model, user permissions etc.. Before trying to build pipeline with any of the above stages it is recommended to request stage information from Truebar service with the following API call:
curl --location --request POST 'https://api.true-bar.si/api/pipelines/stages' \--header 'Authorization: Bearer <access_token>'
Response is returned as JSON encoded list of available stages:
[ { "task": "<string>", "type": "<string>", "config": [ { "tag": "<string>", "features": [ "<string>", ... ], "parameters": { "parameterName": { "type": "<string>", "values" : ["<object>", ...], "defaultValue": "<object>" }, ... } } ] } ```]
Each stage has following properties:
- task defines a task that is performed by current stage. Can be any value listed at the beginning of this section;
- type defines a type of operation that the stage performs. This in turn defines input and output data type. Possible values are: STT, TTT, TTS;
- config contains a list of available config options.
- tag is used as a "label" representing specific model. It consists of four separate fields
service-provider:language-code:domain-code:version
. Each part can either be specified by alphanumeric code or*
character representing generic part. For example tagKALDI:sl-SI:*:*
represents model for slovenian language with unspecified (generic) domain and version. - features are model specific features that can not be modified.
- parameters is a dictionary of parameters supported by model. Each parameter has following properties:
- name is parameter name that is used for specifying parameter,
- type represents parameter data type,
- values is a list of valid values. It is present for parameters with predefined set of available options.
- tag is used as a "label" representing specific model. It consists of four separate fields
#
ASR (Automatic Speech Recognition)- Input: Audio data
- Output: Text segments (transcripts)
- Features:
- onlineAsr - model supports online transcription and is therefore more appropriate for use with streaming API,
- offlineAsr - model supports offline transcription and is therefore more appropriate for use with offline (REST, file upload) API,
- dictatedCommands - model recognizes dictated commands and returns special tokens representing spoken command,
- dictatedPunctuations - model recognizes dictated punctuations and returns special tokens representing spoken punctuation.
- Parameters:
- enableUnks - if enabled, then model returns
<unk>
symbol for tokens which could not be recognized, - enableInterims - if enabled then model returns interim (partial) responses,
- enableDiarization - if enabled then model returns information about detected speaker.
- enableUnks - if enabled, then model returns
#
PC (Automatic Punctuation)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters:
- enableSplitToSentences - if enabled, then resulting text segments will be split to sentences. If text segment does not end with one of final punctuations (.?!) then it is not possible to make sentences from a whole text segment. In this case the last part of segment (from the last final punctuation to the end) is returned as interim text segment. If a stage is used inside a streaming session the last interim part is also cached and prepended to the next received text segment.
#
TC (Automatic Truecasing)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters: /
#
TN (Text Normalization)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters: /
#
ITN (Inverse Text Normalization)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters: /
#
AC (Auto Correct)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters: /
#
G2A (Accentuator)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters: /
#
NMT (Neural Mmachine Translation)- Input: Text segments
- Output: Text segments
- Features: /
- Parameters: /
#
Pipeline examplesSimple pipeline for basic audio transcription which only uses asr
stage that is configured to use model KALDI:sl-SI:COL:20221208-0800 with enableInterims
option enabled:
[ { "task": "asr", "config": { "tag": "KALDI:sl-SI:COL:20221208-0800", "parameters": { "enableInterims": true } } }]
More complex pipeline that first uses asr
stage to transcribe input audio data and then pc
and itn
stages to automatically insert punctuations and denormalize result:
[ { "task": "asr", "config": { "tag": "KALDI:sl-SI:COL:20221208-0800", "parameters": { "enableInterims": true } } }, { "task": "pc", "config": { "tag": "NEMO_PUNCTUATOR:sl-SI:*:*", "parameters": { "enableSplitToSentences": true } } }, { "task": "itn", "config": { "tag": "DEFAULT_DENORMALIZER:sl-SI:*:*" } }]
An example of invalid pipeline because of mismatch between first stage output type and second stage input type:
[ { "task": "pc", "config": { "tag": "NEMO_PUNCTUATOR:sl-SI:*:*", } }, { "task": "asr", "config": { "tag": "KALDI:sl-SI:COL:20221208-0800" } }]
#
Pipeline executionPipelines can be executed with either offline (HTTP) or online (Websocket) API.
#
Offline APIThe Offline API is implemented as a REST service, providing straightforward methods for pipeline invocation. As previously mentioned, each pipeline accepts a specific type of data defined by its initial stage and produces an output data type defined by its final stage. Consequently, the API endpoint is capable of accepting and generating arbitrary data types as defined by the associated pipeline. Because pipelines are not predefined, the endpoint also requires pipeline definition to be included along with input data.
The endpoint for pipeline invocation is accessible at https://api.true-bar.si/api/pipelines/process
.
It accepts a multipart formatted body, allowing the inclusion of both JSON (application/json) and binary (application/octet-stream) formatted body parts.
JSON formatted parts are used for passing pipeline definition and text-segments in case that pipeline requires text-based input while
binary body parts are used for passing audio data when pipeline requires audio-based input.
Response can be either in a JSON or binary format, depending on pipeline output.
#
Data types and formatsText data is always given as JSON encoded list of text-segments. This holds true for both, data send to and received from the server. Each text segment contains a list of tokens. Below is the example of such structure with both required and optional properties.
[ { "isFinal": "true | false", "startTimeMs": "<number>", "endTimeMs": "<number>", "tokens": [ { "startOffsetMs": "<number>", "endOffsetMs": "<number>", "isLeftHanded": "true | false", "isRightHanded": "true | false", "text": "<string>" } ] }, ...]
- isFinal - If this flag is set to true, than text-segment represents final hypothesis. For now, this is only relevant for live STT and is thus further described in next section.
- startTimeMs - Represents beginning timestamp of text-segment in milliseconds relative to the start of session. For now, this is only relevant for STT, and enables alignment of text and audio.
- endTimeMs - Represents ending timestamp of text-segment in milliseconds relative to the start of session. For now, this is only relevant for STT, and enables alignment of text and audio.
- tokens - A list of tokens contained in text-segment.
- startOffsetMs - An offset in milliseconds from the text-segment's startTimeMs property representing beginning timestamp of token.
- endOffsetMs - An offset in milliseconds from the text-segment's startTimeMs property representing ending timestamp of token.
- isLeftHanded - True if token is left handed, false otherwise.
- isRightHanded - True if token is right handed, false otherwise.
- text - Actual text content of a token.
The only required properties from the above list are 'tokens' property of 'text-segment' and 'text' property of each token respectively. All other properties are optional and depends on actual operation that is executed.
Audio data, in contrast, can come in various formats and be encoded using any codec, provided it can be decoded by the service. We extend support to a wide array of popular audio and video formats; however, for a comprehensive list, please reach out to our support team. It's worth noting that containers containing multiple audio tracks may pose challenges, as the service cannot determine which audio track to decode. As a rule of thumb for optimizing compatibility and ensuring high-quality results, we suggest utilizing uncompressed audio formats such as for instance WAV or FLAC.
#
ExamplesFollowing is the example of processing pipeline with single punctuation stage, which means that endpoint expects and produces text based data.
Curl command:
curl --location 'http://true-bar.si/api/pipelines/process' \--header 'Content-Type: multipart/form-data' \--header 'Authorization: Bearer <access_token>' \--form 'pipeline="[{\"task\":\"NLP_pc\",\"config\":{\"tag\":\"NEMO_PC:sl-SI:*:*\", \"parameters\":{\"enableSplitToSentences\" : \"false\"}}}]";type=application/json' \--form 'data="[{\"tokens\":[{\"text\":\"Danes\"},{\"text\":\"je\"},{\"text\":\"lep\"},{\"text\":\"dan\"}]}]";type=application/json'
The above command produces following HTTP request:
POST /api/pipelines/process HTTP/1.1Host: true-bar.siContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gWAuthorization: Bearer <access_token>Content-Length: 462
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="pipeline"Content-Type: application/json
[{"task":"NLP_pc","config":{"tag":"NEMO_PC:sl-SI:*:*", "parameters":{"enableSplitToSentences" : "false"}}}]------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="data"Content-Type: application/json
[{"tokens":[{"text":"Danes"},{"text":"je"},{"text":"lep"},{"text":"dan"}]}]------WebKitFormBoundary7MA4YWxkTrZu0gW--
The response for the above request is returned as JSON formated list of text segments:
[ { "isFinal": true, "tokens": [ { "isLeftHanded": false, "isRightHanded": false, "text": "Danes" }, { "isLeftHanded": false, "isRightHanded": false, "text": "je" }, { "isLeftHanded": false, "isRightHanded": false, "text": "lep" }, { "isLeftHanded": false, "isRightHanded": false, "text": "dan" }, { "isLeftHanded": true, "isRightHanded": false, "text": "." } ] }]
Execution of speech to text pipelines is the same but with different input data format. Lets see the following example of speech recognition.
curl --location 'http://true-bar.si/api/pipelines/process' \--header 'Content-Type: multipart/form-data' \--header 'Authorization: Bearer <acess_token>' \--form 'pipeline="[{\"task\":\"ASR\",\"config\":{\"tag\":\"KALDI:sl-SI:COL:20211214-1431\", \"parameters\":{\"enableUnks\":false, \"enableDiarization\" : false, \"enableInterims\": true}}}]";type=application/json' \--form 'data=@"/path/to/audio.wav"'
The above command results to following HTTP request:
POST /api/pipelines/process HTTP/1.1Host: true-bar.siContent-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gWAuthorization: Bearer <acess_token>Content-Length: 457
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="pipeline"Content-Type: application/json
[{"task":"ASR","config":{"tag":"KALDI:sl-SI:COL:20211214-1431", "parameters":{"enableUnks":false, "enableDiarization" : false, "enableInterims": true}}}]
------WebKitFormBoundary7MA4YWxkTrZu0gWContent-Disposition: form-data; name="data"; filename="1second.wav"Content-Type: application/octet-stream
<audio_data>------WebKitFormBoundary7MA4YWxkTrZu0gW--
The above examples cover all possible types of input while the output was was in both cases a list of text-segments. There are two more cases that produces audio data at the output. We wont be giving full examples here as the request looks the same as with the above examples. The only difference is that the result witch endpoint returns is not a list of text segments but rather a binary data representing wave encoded audio.
#
Online APIOnline API is currently only available for usage with STT pipelines (i.e. pipelines that accepts audio data as an input and returns text segments)
#
Protocol descriptionThe Truebar service provides a real time transcription of audio data. This is achieved by sending small pieces of audio data (chunks) trough a websocket connection. Given that websocket is an asynchronous bidirectional protocol, the transcripts can be returned as soon as they are available.
The following example shows how to establish a Websocket connection from javascript:
let ws = new Websocket("wss://api.true-bar.si/ws?access_token={access_token}")
'access_token' url parameters in the above example is optional which means that it can be omitted if the 'access_token' is provided by some other means.
For example if websocket client supports modifying HTTP headers when performing UPGRADE request, then access_token
can be sent as a bearer token via an Authorization header.
If the connection attempt is denied or an error happens before the connection is actually upgraded to Websocket, then the response will be returned as a HTTP response with appropriate error code and description. In case an error happens after establishing Websocket connection, it will be reported as websocket message.
There are two basic types of websocket messages: binary messages which are used for transferring audio data from a client to the server and text messages which are used for transferring JSON formatted objects in both directions. A JSON message always has following structure:
{ "messageType": "CONFIG | EOS | STATUS | TEXT_SEGMENT"}
Different message types has different sub-structure:
CLIENT -> SERVER messages
CONFIG
- Message contains pipeline definition that is used to configure the session. If optionalsession_id
parameter is given, the server will search for a session with the given id and resume it, in case it exists. Otherwise, if the parameter is not given, new session will be created on the fly. Each created session can be resumed multiple times. Resuming existing session creates a new recording entity on server but it does not create a new session entity. All recordings created as part of existing session or new session, can be later accessed individually or as a part of the session to which they belong.```json{ "messageType": "CONFIG", "sessionId" : <number> "pipeline": "<List of stages representing pipeline definition as described in previous chapter.>"}```
EOS
- End-Of-Stream message is used to request closing a session. It must be the last message sent to the server. Optional 'lockSession' attribute is used to request that the server locks session after it is finished. The key to unlock it is returned as a part of FINISHED status update from server.```json{ "messageType": "EOS", "lockSession" : true | false}```
SERVER -> CLIENT messages
STATUS
- Message is sent by the server on every session state transition. There are several variants of this message that depends on specific state transition.INITIALIZED
- message indicates that the session has been successfully initialized. Client must wait until this message arrives before attempting to send any message trough opened websocket connection.{ "messageType": "STATUS", "status": "INITIALIZED"}
CONFIGURED
- message indicates that the session has started. It is received after the server successfully processesCONFIG
message and reserves all necessary resources.{ "messageType": "STATUS", "status": "CONFIGURED", "sessionId" : "<number>", // Id of created session "recordingId" : "<number>", // Id of created recording }
FINISHED
- message indicates that the session has been successfully finished. At this point all results were successfully returned to the client so it is safe to close Websocket connection.{ "messageType": "STATUS", "status": "FINISHED" "lengthMs" "<number>" // Length of created recording in milliseconds "sessionLockKey" : "<string>" // The key to unlock a session if it was requested to be locked in EOS message. }
TEXT_SEGMENT
- Message represent result in a form of text segment.
{ "messageType": "TEXT_SEGMENT", "textSegment": { // ... same structure as described in previous chapter } }
Typical connection flow can be represented by the following diagram:
#
Establishing and configuring the sessionWhen websocket connection is established, the client must first await for status message with status INITIALIZED
that indicate that the server is ready to process additional messages.
After receiving the message client must configure the session with "CONFIG" message and await CONFIGURED
status response. Awaiting CONFIGURED
status response can also be omitted but
it should be noted that there is no guaranty that the configuration will succeed (e.g. invalid pipeline, busy workers, etc). Therefore it is advised that the client awaits CONFIGURED
message
from the server before trying to send any audio data.
#
Streaming audio dataAfter a Websocket connection has been successfully established, the user can begin sending audio chunks. Audio chunks must be sent as binary websocket messages with a maximum size of 64KB. Each chunk must contain an array of 16 bit little-endian encoded integers, sampled at 16 kHz, which is also known as S16LE PCM encoding. Only single channel audio is supported.
To ensure realtime transcription, transcripts are asynchronously returned to the client as soon as they are available. Transcripts are returned in following format:
{ "messageType": "TEXT_SEGMENT", "textSegment": { "isFinal": "<boolean>", // Determines if transcript is final or interim "startTimeMs": "<number>", // Segment start time in milliseconds relative to the beginning of the session "endTimeMs": "<number>", // Segment end time in milliseconds relative to the beginning of the session "tokens": [ { "isLeftHanded" : "<boolean>", // Determines if the token is left handed (implies a space before a token) "isRightHanded" : "<boolean>", // Determines if the token is right handed (implies a space after a token) "startOffsetMs": "<number>", // Token start time relative to the beginning of the segment "endOffsetMs": "<number>", // Token end time relative to the beginning of the segment "text": "<string>" // Token content } ] }}
There are basically two types of transcripts, interims and finals, the information which are marked with isFinal
flag.
While chunks of audio are being decoded on the backend, the transcript is getting updated with new content.
Every time the transcript is updated, it is returned as an interim transcript to the client.
When a certain transcript length is reached or when there are no new words recognized for some time, the transcript is returned as final.
When transcript is returned as final, it means that it's content will not update anymore.
Instead, a new content will be returned as an interim transcript, which again will be incrementally updated until it becomes final.
The following example illustrates the above procedure:
- Interim : Da
- Interim : Danes
- Interim : Danes je
- Interim : Danes je lepo
- Interim : Danes je lepo da
- Interim : Danes je lep dan
- Final : Danes je lep dan.
- Interim : Jutri
- Interim : Jutri pa
- Interim : Jutri pa bo
- Interim : Jutri pa bo še lepši
- Final : Jutri pa bo še lepši.
#
Punctuations and CommandsThe Truebar transcription service can handle punctuations and commands. The punctuation is supported in two ways: a) Using an automatic punctuation service – in this case, the punctuations like comma, period, question mark etc. will be set in the transcript automatically by the punctuation service. b) By dictation – when this option is selected, the punctuation symbols are expected to be dictated by the user (to check how to pronounce a specific punctuation, refer to the QuickStart Guide (Eng, Slo).
Which option the user wants to use can be specified through the Configuration. The two options are not mutually exclusive, i.e. they can be both used at the same time.
In the transcript returned by the Truebar transcription service, the punctuation symbols, either set by the automatic service or dictated, are treated as stand-alone objects and returned as symbols, closed in sharp brackets, e.g. <.>
for a full stop or <?>
for a question mark.
Apart from the punctuations, the Truebar transcription service can handle specific commands that can be user-dictated. For example, a command for making a new line within the transcript or new paragraph. Other commands include uppercase, lowercase, capital letter, delectation, substitution etc. For the full list please check the QuickStart Guide (Eng, Slo).
Similarly, to the punctuation symbols, all the supported commands are returned as stand-alone objects, closed in sharp parenthesis. For example <nl>
for a new line, <np>
for a new paragraph, <uc>
for the beginning of all upper cased transcript, </uc>
for the end of upper transcript etc.
#
Closing sessionTranscription can be completed in either of the following two ways: a) by closing websocket from the client side or b) by sending an EOS
message.
When closing websocket connection from the client side, it may happen that a short audio chunk at the end will not get transcribed. This will happen when some audio chunks the backend has already received, were not yet processed at the time the client closed the connection. Nevertheless, these chunks will be stored in the database and will later be accessible by an API call.
In case the client can wait until the transcription is complete, the second approach is more appropriate.
With this approach, the client sends an EOS
message indicating that there is no more data to be sent.
From this time on no more audio chunks are accepted by the server, but transcription of already received audio chunks will continue until all audio chunks are processed.
When all transcripts are sent to the client, the server will respond with "FINISHED" status. After that websocket connection can safely be closed.
#
History#
List of sessionsAll sessions created trough Websocket or HTTP API can later be accessed trough HTTP API endpoints described in this section. All endpoints described here are protected using JWT token. For details about obtaining a valid JWT token, please refer to the section Authentication. After a valid token has been acquired, it has to be included in all request that follow.
The following example shows how to obtain a list of past sessions:
curl --location --request GET 'api.true-bar.si/api/client/sessions' \--header 'Authorization: Bearer <access_token>'
Response:
{ "cursor" : "<corsor_value>" // Cursor pointing to a location of session entry "content": [ // List of sessions { "id": "<number>", // Session identifier "name": "<string>", // Name of the session "status" : "INITIALIZING | IN_QUEUE | IN_PROGRESS | FINISHED | CANCELED | ERROR", "numRecordings": "<number>", // Number of recordings created under the same session "recordedMs": "<number>", // Total recorded milliseconds "processedMs": "<number>", // Total processed milliseconds "createdAt": "<string>", // Creation date and time "updatedAt": "<string>", // Date and time of last update (eg. new recording) "createdByUser": "<string>", // Username under which the session was created "createdByGroup" : "<string>", // Group under which the session was created "isLocked" : "<boolean>", // True if session is locked. "isDiscarded" : "<boolean>", // True if session is discarded. "notes": "<string>", // Session notes "labels": [ // List of labels assigned to session { "isEnabled": "<boolean>", "label": { "id": "<number>", "code": "<string>", "color": "<string>", "isDefault": "<boolean>", "isAssigned": "<boolean>" } } // Some fields not relevant for normal users are omitted. ] } ]}
Because a collection of sessions is typically much larger that what can be returned in single response, the endpoint implements a method of retrieving sessions in smaller chunks. Normally the endpoint would implement pageable interface by providing offset and limit parameters. This approach works well on static collections, whose elements dont change often. A collection of session on other hand is quite dynamic, as sessions gets created, deleted, updated, etc. This makes standard pagination hard to implement as sessions may be skipped when transitioning between pages. If one wants to implement infinite scroll mechanism instead of pagination then it is even harder.
Instead of using standard offset and limit parameters out endpoint uses cursors. A cursor determines a unique position in a collection taking into account user provided sorting and filtering. It can be used instead of offset to specify start of slice the we want. The benefit of using cursors over offset is that they always point to the same entry in a collection, regardless if a collection is modified. There are two URL parameters used for specifying a slice:
slice-cursor
- Cursor to the beginning of a slice.slice-length
- Number of sessions to be included in a slice. If positive value is given, then slice will contain n session starting at givenslice-cursor
. If negative value is given then a slice will contain n session preceding givenslice-cursor
.
Endpoint also supports sorting the collection by using `sort` request parameter with following values::- _id_- _name_- _createdAt_,- _updatedAt_,- _numRecordings_- _recordedSeconds_
Sessions can also be filtered by providing any of the following optional request parameters:- _name_ : Return only sessions with names containing the given value,- _label_ : Return only sessions that are labeled with the given label,- _created-after_ : Return only sessions that were created after the given date and time,- _created-before_ : Return only sessions that were created before the given date and time.
The following example shows the request that returns first 30 sessions containing word "Test" in their names and sorts the results by creation time ascending:```bashcurl --location --request GET 'api.true-bar.si/api/client/sessions?page=0&size=30&name=Test&sort=createdAt,asc' \--header 'Authorization: Bearer <access_token>'
#
Specific session detailsRequest:
curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
Response:
Same as single entry from the list of returned sessions described above.
#
Audio fileReturns audio file created within the given session. The request supports HTTP resource region specification which allows the file to be transferred by smaller pieces.
This allows audio players to stream audio file instead of transferring the whole file before playing it. Like any other request, this request also supports passing access_token
as a request parameter instead of passing it as a Bearer token inside Authorization header.
This can be useful if requesting the audio file directly from audio player which does not support modification of request headers.
Request:
curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/audio.wav' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
#
Session transcriptsReturns list of transcripts generated within the given session.
Request:
curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/transcripts' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
Response:
[ { "id":"<number>", // Transcript identifier "content":"<JSON string>" // JSON encoded transcript content with the same structure as transcripts received trough Websocket API. }, ...]
#
Session recordingsReturns list of recordings created within the given session. The request supports a pageable interface. For the list of available request parameters regarding pageable interface check List of sessions section.
Request:
curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/recordings' \--header 'Authorization: Bearer <access_token>'
session-id
: Unique session identifier
Response:
{ "totalPages": "<number>", // Total number of pages available "totalElements": "<number>", // Total number of elements available on all pages "first": "<boolean>", // True if this is the first page "last": "<boolean>", // True if this is the last page "number": "<number>", // Current page number "size": "<number>", // Number of elements on current page "empty": "<boolean>", // True if this page is empty "content": [ { "id": "<number", // Recording identifier "duration": "<number>", // Length of the recording in seconds "isDiscarded": "<boolean>", // Should always be true for normal users "transcriptionLanguage": "<string", // Selected settings when recording was created... "transcriptionDomain": "<string", "transcriptionModelVersion": "<string", "transcriptionEndpointingType": "<string", "transcriptionDoInterim": "<boolean>", "transcriptionDoPunctuation": "<boolean>", "transcriptionDoInterimPunctuation": "<boolean>", "transcriptionShowUnks": "<boolean>", "transcriptionDoDictation": "<boolean>", "transcriptionDoNormalisation": "<boolean>", "translationEnabled": "<boolean>" } ] }
#
Session sharingSessions are by default only visible to the user that created them. Session sharing mechanism allows the user to share his sessions with other users that belong to the same group. There are several endpoints that can be used to manage session shares.
#
List session sharesRequest:
curl --location --request GET 'api.true-bar.si/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>'
Response:
[ { "id": "<number>", // share id "createdAt": "<string>", "sharedWith": { "id": "<number>", // user id "username": "<string>" } }, ...]
#
Adding session shareList of users whom the session can be shared with can be obtained by the following request.
Request:
curl --location --request GET 'api.true-bar.si/api/client/users' \--header 'Authorization: Bearer <access_token>'
Response:
[ { "id": "<number>", "username": "<string>" }, { "id": "<number>", "username": "<string>" }, ...]
Session can be shared with selected user with the following request:
Request:
curl --location --request POST 'api.true-bar.si/api/client/sessions/<session-id>/shares' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{ "userId" : <user-id>}'
#
Delete session sharecurl --location --request DELETE 'api.true-bar.si/api/client/sessions/shares/<share-id>' \--header 'Authorization: Bearer <access_token>' \
#
Advanced featuresApart from the basic features that are exposed through Truebar API, there are also other, more advanced features that can be used to achieve highest possible accuracy of recognition and best possible user experience. These features include services to work with substitutes, replacements and prefixed tokens.
#
ReplacementsYou can use replacements to make user-defined labels for terms in dictionary. When Truebar STT service recognizes a term that has a user defined replacement, it makes appropriate changes in the response. Replacements take effect immediately, i.e. without the need to rebuild the model.
The best way to fine tune the model is to take care it’s dictionary comprises all terms that are used in speech. Specific attention should be paid to abbreviations, acronyms and measurement units. Make sure that all required terms are included in the dictionary and have appropriate labels and pronunciations.
Sometimes however, we want to make specific changes that are user dependent. For instance, we want the measurement unit for meters per seconds to be written as m/sec
while other users prefer to use
m/s
as the measurement label. Such user-dependent changes of the dictionary entries are handled through replacements. These are kept separately from the dictionary and thus have no effect on other users.
To retrieve the list of active replacements, you can use the following endpoint:
Request
curl --location --request GET 'api.true-bar.si/api/client/replacements' \--header 'Authorization: Bearer <access_token>' }
Response:
[ { "id": "<number>", "source": [ // List of source words { "spaceBefore": "<boolean>", "text": "<string>" }, { "spaceBefore": "<boolean>", "text": "<string>" }, ... ], "target": { // Target word that will replace all source words "spaceBefore": "<boolean>", "text": "<string>" }, }, ...]
To get information on a specific replacement, use this endpoint:
Request:
curl --location --request GET 'api.true-bar.si/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' }
replacement-id
: replacement identification number as returned when asking for a list of replacements
Response:
{ "id": "<number>", "source": [ // List of source words { "spaceBefore": "<boolean>", "text": "<string>" }, { "spaceBefore": "<boolean>", "text": "<string>" }, ... ], "target": { // Target word that will replace all source words "spaceBefore": "<boolean>", "text": "<string>" },}
To add new replacements, you can call:
curl --location --request POST 'api.true-bar.si/api/client/replacements' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{ "source": [ { "spaceBefore": "<boolean>, "text": "<string>" }, { "spaceBefore": "<boolean>, "text": "<string>" }, ... ], "target": { "spaceBefore": "<boolean>", "text": "<string>" }}'
Finally, to delete a specific replacement, call:
curl --location --request DELETE 'api.true-bar.si/api/client/replacements/<replacement-id>' \--header 'Authorization: Bearer <access_token>' \
replacement-id
: replacement identification number as returned when asking for a list of replacements
#
Automatic speaker detection (experimental)This function can be enabled by setting enableSd
option for AST stage.
It can only be enabled when the selected model supports automatic speaker detection.
When the option is enabled the words inside each received transcript will contain the field named speakerCode
.
The field hold's label ( speaker0
, speaker1
, etc ) witch is automatically assigned to the detected speaker.
#
Response statuses and errors#
HTTP APIThis section describes expected HTTP response statuses for endpoints described in this document.
Response statuses for various HTTP methods, when operation is successful:
GET
: response status 200 OK along with the response body;POST
: response status 201 CREATED and automatically generated unique resource identifier in the response body;PATCH
: response status 204 NO_CONTENT without response body;DELETE
: response status 204 NO_CONTENT without response body;
Response statuses for errors:
400 BAD_REQUEST
Indicates that the request could not be processed due to something that is perceived to be a client error;401 UNAUTHORIZED
indicates that the request has not been applied because it lacks valid authentication credentials (invalid JWT or not present);403 FORBIDDEN
indicates that the server understood the request but refuses to authorize it (Insufficient permissions);404 NOT_FOUND
indicates that the requested resource was not found on the server;405 METHOD NOT ALLOWED
indicates that the resource does not support given request method (eg. POST request on URL that only supports GET);409 CONFLICT
indicates that the resource that is being added or modified already exists on the server;415 UNSUPPORTED MEDIA TYPE
indicates that the uploaded media file format is not supported;500 SERVER_ERROR
indicates that the request could not be processed because of internal server error.
If not otherwise noted then any request described in this document should return one of the response statuses described above. In case there was an error processing the request, then the response body will be returned in the following format:
{ "id": "<string>", "timestamp": "<string>", "message": "<string>"}
The returned object contains message with short error description. In general the message together with the HTTP response status is sufficient for the client to know what has gone wrong. However that is not the case when error triggers response status 500 because the response body only contains generic error message. If this happens please contact our support with returned error id.
#
Migrating from 2.x.x to 3.x.xTrue-Bar service version 3.x.x comes with a new future called pipelines
. Pipelines are powerful concept that enable full customization of data flow trough NLP operations.
Unfortunately they also break compatibility with previous API versions. This section is here to provide overview of the steps required to migrate existing integrations to new API version.
Although not visible to the end user, previous API version has also used some primitive concept of the pipeline internally. This pipeline was static; i.e: the user could not add, remove or change the order in which its stages were executed.
The only configuration that was exposed to the end user was those available at api/client/configuration
endpoint which were then internally mapped to pipeline parameters. This configuration offered only subset of available options.
Because it was stored on the server it also caused some issues with concurrent sessions created by the same user.
True-Bar service version 3.x.x does not store user configuration anymore. Instead the configuration must be passed each time new session is crated or along every NLP request, as a pipeline definition; i.e: sequence of stages and their configuration. For more information on how to create pipeline definition please see chapter on pipelines.
Following subsections provide a quick overview on how to migrate different clients.
#
Offline (REST) file upload clientsEndpoint for offline file processing has been moved to new location and is now available at /api/pipelines/process
instead of /api/client/upload
.
Because user configuration is no longer stored on the server endpoint now accepts additional form field named pipeline
that contains pipeline definition.
Pipeline definition must be supplied as a serialized JSON string representing list of stages and their configuration options.
Example of request to API V2.x.x that firs patches user configuration and then uploads a file:
curl --location --request PATCH 'https://api.true-bar.si/api/client/configuration' \--header 'Authorization: Bearer <access_token>' \--header 'Content-Type: application/json' \--data-raw '{ "stt": { "framework": "NEMO", "language": "sl-SI", "domain": "COL", "model": "20221208-0800", "enableInterims": true }, "nlp": { "punctuation": { "enabled": { "value": true, "enableRealFinals": true } } } }'
curl --location --request POST 'https://api.true-bar.si/api/client/upload?async=<true|false>' \--header 'Authorization: Bearer <access_token>'--form 'file=@"/path/to/audio.wav"'
Example of request to API V3.x.x that uploads a file along with pipeline configuration:
curl --location --request POST 'https://api.true-bar.si/api/pipelines/process?async=<true|false' \--header 'Authorization: Bearer <access_token>' \--form 'pipeline= [ { "task": "asr", "config": { "tag": "NEMO_ASR:sl-SI:COL:20221208-0800", "parameters": { "enableInterims": false } } }, { "task": "pc", "config": { "tag": "NEMO_PUNCTUATOR:sl:SI:*:*", "parameters": { "enableSplitToSentences": true } } } ]' \--form 'file=@"/path/to/file"'
#
Streaming (Websocket) clientsFollowing changes were made to websocket API:
- Clients are allowed to send text and binary websocket messages instead of only binary messages.
- First message after connection is established is expected to be text message containing pipeline configuration.
- Instead of empty packet that was previously used to indicate end of stream, there is now a special EOS text message that must be sent instead. Empty binary messages are now considered as invalid.
- After the session is finished websocket is not automatically closed by the server. Client is notified about success/failure with appropriate status/error message. Logical session is therefore decoupled from underlying transport protocol session.
- Non critical errors and warnings are reported as a text messages and do not cause the session to be terminated as before.
- Different structure for messages returned by the server.
Detailed description of protocol used for establishing streaming session can be found in this chapter. Examples of client implementation can also be found under Libraries and examples section.
#
Libraries and examplesThis chapter lists available libraries that can be used to integrate with Truebar service. Currently we provide libraries for Java an Javascript programming languages.
#
Java libraryJava library is written in Java 11. It uses built in HttpClient to minimize number of external dependencies. It is packaged as a single .jar
file that can be imported into the target project.
#
Javascript libraryJavascript library is written as a ES6 module that can be imported into the target project. It is compatible with all major web browsers that have support for WebAudio and Websockets.