Pipeline Stages Reference

This manual describes the pipeline stages available over the API and how to read the metadata returned by /api/pipelines/stages. Use it as the reference when composing a pipeline for /api/pipelines/process or for a streaming session.

All requests must include a bearer access token:

Authorization: Bearer <access-token>

All endpoints live under the API context path /api (e.g. https://<host>/api/pipelines/stages).

1. Concepts#

A pipeline is an ordered list of stages that transform data end‑to‑end. Each stage performs one task (e.g. speech recognition, translation, summarization) and passes its output to the next stage.

Every stage has:

Field	Meaning
`task`	What the stage does (e.g. `ASR`, `NLP_nmt`, `TTS`).
`type`	The data kind the stage handles: `STT`, `TTT`, `TTS`.
`configOptions`	The list of concrete model configurations you are allowed to use, each with its own features and tunable parameters.

1.1 Tags#

Each configuration option is identified by a tag of the form:

<provider>:<language>:<domain>:<version>

You pick one tag per stage when you compose a pipeline. Only tags returned by /api/pipelines/stages for your token are accepted; others are rejected.

1.2 Features vs. parameters#

Features are informational flags. They tell you what a particular model is capable of or what it does implicitly — you cannot turn them on or off.
Parameters are values you may set when you configure a pipeline. For each parameter the API returns a descriptor:
```
{ "type": "Boolean", "values": [...], "value": <default> }
```
- type — Boolean, Integer, String, Set, Map, or Enum.
- values (optional) — the allowed set, when the parameter is constrained.
- value (optional) — the default used when you do not set the parameter.

1.3 Data flow between stage types#

Stage type	Consumes	Produces
`STT`	audio	text segments
`TTT`	text segments	text segments
`TTS`	text	audio

You can chain any stages whose input/output types match. Typical shapes:

Transcribe audio: ASR → (optional text post‑processing)
Transcribe + translate: ASR → NLP_nmt
Speak text: NLP_st → TTS (tokenize first then synthesize)
Voice translation: ASR → NLP_nmt → TTS
Meeting summary: ASR → NLP_ts

2. Endpoints#

2.1 List available stages#

GET /api/pipelines/stages      ?type=STT|TTT|TTS           (optional — filter by stage type)

Response: 200 OK — an array of stages, each with its allowed configOptions.

2.2 Get a single stage#

GET /api/pipelines/stages/{task}

Returns the same shape as one element of the list. 404 Not Found is returned if the stage is not available to you.

2.3 Response shape#

{  "task": "ASR",  "type": "STT",  "configOptions": [    {      "tag": "RIVA:sl-SI:COL:20250929-1800",      "features": ["onlineAsr"],      "parameters": {        "enableUnks":        { "type": "Boolean" },        "enableInterims":    { "type": "Boolean" },        "enableSd":          { "type": "Boolean" },        "boostedPhraseSets": { "type": "Set" }      }    }  ]}

Only stages, tags, features and parameters permitted by your access token appear.

3. Stage reference#

The tables below list the features and parameters that can appear for each stage. Whether a specific option actually shows up in configOptions depends on the model behind a given tag — always read the response before composing a pipeline.

3.1 `ASR` — Automatic speech recognition#

Converts spoken audio into text segments (words with timing, optional speaker labels, optional interim results).

Input: raw audio.
Output: transcribed text segments.

Features

Feature	Meaning
`onlineAsr`	Supports real‑time streaming. Interim (partial) results become available.
`offlineAsr`	Non‑streaming; full audio is processed at once.
`dictatedCommands`	Recognizes spoken commands (e.g. "new line").
`dictatedPunctuations`	Recognizes spoken punctuation (e.g. "comma", "period").
`autoPc`	Automatic punctuation is applied implicitly.
`autoTc`	Automatic true‑casing is applied implicitly.
`autoTn`	Automatic text normalization is applied implicitly.
`autoItn`	Automatic inverse text normalization is applied implicitly.

onlineAsr and offlineAsr are mutually exclusive for a given tag.

Parameters

Parameter	Type	Description
`enableUnks`	Boolean	Emit `<unk>` tokens for unrecognized words instead of dropping them.
`enableInterims`	Boolean	Emit partial results while the user is still speaking.
`enablePc`	Boolean	Apply automatic punctuation. Only when the model lets you toggle it.
`enableItn`	Boolean	Apply automatic inverse text normalization. Only when toggleable.
`enableSd`	Boolean	Enable speaker diarization (label each token with a speaker).
`sdSpeakerIdEnabled`	Boolean	Enable speaker identification.
`sdMinSpeakers`	Integer	Lower bound on the expected number of speakers.
`sdMaxSpeakers`	Integer	Upper bound on the expected number of speakers.
`boostedPhraseSets`	Set of `BOOST` phrase‑set IDs	Phrase sets whose phrases should be boosted. See the phrase sets manual.
`additionalBoostedPhrases`	Set of strings	Ad‑hoc phrases to boost for this call only, without creating a phrase set.
`enableEndpointing`	Boolean	Cut segments on detected endpoints automatically.

3.2 `NLP_pc` — Automatic punctuation#

Inserts or fixes punctuation in plain text.

Input: unpunctuated / lightly punctuated text.
Output: same text with punctuation.

Parameters

Parameter	Type	Description
`enableSplitToSentences`	Boolean	Also split the result into sentence segments.
`enableTc`	Boolean	Apply true‑casing together with punctuation.

3.3 `NLP_tc` — True‑casing#

Restores proper capitalization (lowercased or all‑caps input → mixed case).

Input: text.
Output: text with corrected case.

No configurable parameters.

3.4 `NLP_tn` — Text normalization#

Expands written forms into their spoken equivalents — e.g. numbers, dates and abbreviations are rewritten in full (12 → twelve). Typical use: preparing text for TTS.

Input: written‑form text.
Output: spoken‑form text.

Parameters

Parameter	Type	Description
`customConfig`	Map	Free‑form, model‑specific configuration overrides. Consult the model's reference for available keys.

3.5 `NLP_itn` — Inverse text normalization#

The opposite of text normalization — turns spoken‑form text into written form (twelve → 12). Typical use: cleaning up ASR output.

Input: spoken‑form text.
Output: written‑form text.

Parameters

Parameter	Type	Description
`convertSmallNumbers`	Boolean	Also convert small numbers (e.g. `five` → `5`).

3.6 `NLP_st` — Sentence tokenization#

Splits a text stream into sentences.

Input: text.
Output: the same text, segmented into sentences.

Parameters

Parameter	Type	Description
`processSsml`	Boolean	Respect SSML markup in the input when detecting sentence boundaries.

3.7 `NLP_g2a` — Grapheme‑to‑accent#

Adds diacritics / accents to unaccented text (language‑dependent).

Input: unaccented text.
Output: accented text.

Parameters

Parameter	Type	Description
`fastEnabled`	Boolean	Use the fast (lower‑accuracy) inference path.
`customTransformationSets`	Set of `G2A` phrase‑set IDs	Phrase sets that override the default mapping. See the phrase sets manual.
`additionalCustomTransformations`	Map<string,string>	Ad‑hoc overrides provided inline (source → target), without creating a phrase set.

3.8 `NLP_nmt` — Neural machine translation#

Translates text between the source and target languages encoded in the tag.

Input: text in the source language.
Output: text in the target language.

No configurable parameters. Pick the language pair by choosing the right tag.

3.9 `NLP_ac` — Auto correct (replacements)#

Applies a user‑defined list of replacement rules to text (e.g. terminology normalization, brand‑name fixes).

Input: text.
Output: text with replacements applied.

Parameters

Parameter	Type	Description
`replacementSets`	Set of replacement‑set IDs	Which of your replacement sets to apply.

3.10 `NLP_lid` — Language identification#

Identifies the language of the input text.

Input: text.
Output: text (annotated with detected language).

No configurable parameters.

3.11 `NLP_ts` — Text summarization#

Summarizes a longer text. Depending on the model it can also extract keywords, key points, agenda items, etc.

Input: text (typically a full transcript).
Output: summary text plus the optional extracted artefacts you asked for.

Parameters

Parameter	Type	Description
`summaryType`	Enum — `SHORT` or `LONG`	Overall length of the generated summary.
`keywordsEnabled`	Boolean	Also extract keywords.
`keypointsEnabled`	Boolean	Also extract key points.
`diarizationEnabled`	Boolean	Treat the input as diarized and preserve speaker attribution in the summary.
`agendaItemsEnabled`	Boolean	Also extract agenda items.

3.12 `NLP_tf` — Transcript finalization#

Processes interim transcripts and finalizes them, guaranteeing that at least one final segment is emitted roughly every 2 seconds even when the upstream ASR stage has not yet produced one. Useful for keeping downstream stages (translation, summarization, etc.) fed on a steady cadence in real‑time pipelines.

Input: text segments (mix of interim and final).
Output: text segments, with finals emitted on a regular cadence.

3.13 `TTS` — Text‑to‑speech#

Synthesizes audio from text.

Input: text (plain or SSML, see features).
Output: audio.

Features

Feature	Meaning
`ssmlA`, `ssmlB`, `ssmlC`	The model accepts one of the supported SSML dialects. Use the dialect advertised on the tag you chose.
`voiceTags`	The model understands inline voice‑selection tags.

Parameters

Parameter	Type	Description
`sampleRate`	Integer (allowed set in `values`)	Sample rate of the produced audio, in Hz.

4. Composing a pipeline#

When you send a pipeline to /api/pipelines/process (or configure a streaming session), you submit a list of stages, each referencing one of the configOptions you received. For every parameter you want to set, send the value inline:

[  {    "task": "ASR",    "config": {      "tag": "RIVA:sl-SI:COL:20250929-1800",      "parameters": {        "enableInterims":    true,        "enableSd":          true,        "boostedPhraseSets": [42, 108]      }    }  },  {    "task": "NLP_nmt",    "config": {      "tag": "LPT:sl-SI|en-US:COL:v1",      "parameters": {}    }  }]

Rules:

The tag must come from the configOptions returned for the stage.
You may only set parameters that the chosen option advertises. Unknown or unauthorized parameters are rejected.
Features are not sent — they are read‑only metadata.
Omitted parameters fall back to the default advertised in configOptions, if any.

5. Errors#

Status	When
`400 Bad Request`	Malformed pipeline, unknown parameter, wrong parameter type, incompatible stage ordering.
`401 Unauthorized`	Missing or invalid bearer token.
`403 Forbidden`	The token does not carry the authority required to run pipelines.
`404 Not Found`	Requested stage or tag is not available to your token.

1. Concepts#

1.1 Tags#

1.2 Features vs. parameters#

1.3 Data flow between stage types#

2. Endpoints#

2.1 List available stages#

2.2 Get a single stage#

2.3 Response shape#

3. Stage reference#

3.1 ASR — Automatic speech recognition#

3.2 NLP_pc — Automatic punctuation#

3.3 NLP_tc — True‑casing#

3.4 NLP_tn — Text normalization#

3.5 NLP_itn — Inverse text normalization#

3.6 NLP_st — Sentence tokenization#

3.7 NLP_g2a — Grapheme‑to‑accent#

3.8 NLP_nmt — Neural machine translation#

3.9 NLP_ac — Auto correct (replacements)#

3.10 NLP_lid — Language identification#

3.11 NLP_ts — Text summarization#

3.12 NLP_tf — Transcript finalization#

3.13 TTS — Text‑to‑speech#

4. Composing a pipeline#

5. Errors#

3.1 `ASR` — Automatic speech recognition#

3.2 `NLP_pc` — Automatic punctuation#

3.3 `NLP_tc` — True‑casing#

3.4 `NLP_tn` — Text normalization#

3.5 `NLP_itn` — Inverse text normalization#

3.6 `NLP_st` — Sentence tokenization#

3.7 `NLP_g2a` — Grapheme‑to‑accent#

3.8 `NLP_nmt` — Neural machine translation#

3.9 `NLP_ac` — Auto correct (replacements)#

3.10 `NLP_lid` — Language identification#

3.11 `NLP_ts` — Text summarization#

3.12 `NLP_tf` — Transcript finalization#

3.13 `TTS` — Text‑to‑speech#