Pipeline Stages Reference
This manual describes the pipeline stages available over the API and how to read the metadata returned by /api/pipelines/stages. Use it as the reference when composing a pipeline for /api/pipelines/process or for a streaming session.
All requests must include a bearer access token:
Authorization: Bearer <access-token>All endpoints live under the API context path /api (e.g. https://<host>/api/pipelines/stages).
1. Concepts#
A pipeline is an ordered list of stages that transform data end‑to‑end. Each stage performs one task (e.g. speech recognition, translation, summarization) and passes its output to the next stage.
Every stage has:
| Field | Meaning |
|---|---|
task | What the stage does (e.g. ASR, NLP_nmt, TTS). |
type | The data kind the stage handles: STT, TTT, TTS. |
configOptions | The list of concrete model configurations you are allowed to use, each with its own features and tunable parameters. |
1.1 Tags#
Each configuration option is identified by a tag of the form:
<provider>:<language>:<domain>:<version>You pick one tag per stage when you compose a pipeline. Only tags returned by /api/pipelines/stages for your token are accepted; others are rejected.
1.2 Features vs. parameters#
Features are informational flags. They tell you what a particular model is capable of or what it does implicitly — you cannot turn them on or off.
Parameters are values you may set when you configure a pipeline. For each parameter the API returns a descriptor:
{ "type": "Boolean", "values": [...], "value": <default> }type—Boolean,Integer,String,Set,Map, orEnum.values(optional) — the allowed set, when the parameter is constrained.value(optional) — the default used when you do not set the parameter.
1.3 Data flow between stage types#
| Stage type | Consumes | Produces |
|---|---|---|
STT | audio | text segments |
TTT | text segments | text segments |
TTS | text | audio |
You can chain any stages whose input/output types match. Typical shapes:
- Transcribe audio:
ASR→ (optional text post‑processing) - Transcribe + translate:
ASR→NLP_nmt - Speak text:
NLP_st→TTS(tokenize first then synthesize) - Voice translation:
ASR→NLP_nmt→TTS - Meeting summary:
ASR→NLP_ts
2. Endpoints#
2.1 List available stages#
GET /api/pipelines/stages ?type=STT|TTT|TTS (optional — filter by stage type)Response: 200 OK — an array of stages, each with its allowed configOptions.
2.2 Get a single stage#
GET /api/pipelines/stages/{task}Returns the same shape as one element of the list. 404 Not Found is returned if the stage is not available to you.
2.3 Response shape#
{ "task": "ASR", "type": "STT", "configOptions": [ { "tag": "RIVA:sl-SI:COL:20250929-1800", "features": ["onlineAsr"], "parameters": { "enableUnks": { "type": "Boolean" }, "enableInterims": { "type": "Boolean" }, "enableSd": { "type": "Boolean" }, "boostedPhraseSets": { "type": "Set" } } } ]}Only stages, tags, features and parameters permitted by your access token appear.
3. Stage reference#
The tables below list the features and parameters that can appear for each stage. Whether a specific option actually shows up in configOptions depends on the model behind a given tag — always read the response before composing a pipeline.
3.1 ASR — Automatic speech recognition#
Converts spoken audio into text segments (words with timing, optional speaker labels, optional interim results).
- Input: raw audio.
- Output: transcribed text segments.
Features
| Feature | Meaning |
|---|---|
onlineAsr | Supports real‑time streaming. Interim (partial) results become available. |
offlineAsr | Non‑streaming; full audio is processed at once. |
dictatedCommands | Recognizes spoken commands (e.g. "new line"). |
dictatedPunctuations | Recognizes spoken punctuation (e.g. "comma", "period"). |
autoPc | Automatic punctuation is applied implicitly. |
autoTc | Automatic true‑casing is applied implicitly. |
autoTn | Automatic text normalization is applied implicitly. |
autoItn | Automatic inverse text normalization is applied implicitly. |
onlineAsr and offlineAsr are mutually exclusive for a given tag.
Parameters
| Parameter | Type | Description |
|---|---|---|
enableUnks | Boolean | Emit <unk> tokens for unrecognized words instead of dropping them. |
enableInterims | Boolean | Emit partial results while the user is still speaking. |
enablePc | Boolean | Apply automatic punctuation. Only when the model lets you toggle it. |
enableItn | Boolean | Apply automatic inverse text normalization. Only when toggleable. |
enableSd | Boolean | Enable speaker diarization (label each token with a speaker). |
sdSpeakerIdEnabled | Boolean | Enable speaker identification. |
sdMinSpeakers | Integer | Lower bound on the expected number of speakers. |
sdMaxSpeakers | Integer | Upper bound on the expected number of speakers. |
boostedPhraseSets | Set of BOOST phrase‑set IDs | Phrase sets whose phrases should be boosted. See the phrase sets manual. |
additionalBoostedPhrases | Set of strings | Ad‑hoc phrases to boost for this call only, without creating a phrase set. |
enableEndpointing | Boolean | Cut segments on detected endpoints automatically. |
3.2 NLP_pc — Automatic punctuation#
Inserts or fixes punctuation in plain text.
- Input: unpunctuated / lightly punctuated text.
- Output: same text with punctuation.
Parameters
| Parameter | Type | Description |
|---|---|---|
enableSplitToSentences | Boolean | Also split the result into sentence segments. |
enableTc | Boolean | Apply true‑casing together with punctuation. |
3.3 NLP_tc — True‑casing#
Restores proper capitalization (lowercased or all‑caps input → mixed case).
- Input: text.
- Output: text with corrected case.
No configurable parameters.
3.4 NLP_tn — Text normalization#
Expands written forms into their spoken equivalents — e.g. numbers, dates and abbreviations are rewritten in full (12 → twelve). Typical use: preparing text for TTS.
- Input: written‑form text.
- Output: spoken‑form text.
Parameters
| Parameter | Type | Description |
|---|---|---|
customConfig | Map | Free‑form, model‑specific configuration overrides. Consult the model's reference for available keys. |
3.5 NLP_itn — Inverse text normalization#
The opposite of text normalization — turns spoken‑form text into written form (twelve → 12). Typical use: cleaning up ASR output.
- Input: spoken‑form text.
- Output: written‑form text.
Parameters
| Parameter | Type | Description |
|---|---|---|
convertSmallNumbers | Boolean | Also convert small numbers (e.g. five → 5). |
3.6 NLP_st — Sentence tokenization#
Splits a text stream into sentences.
- Input: text.
- Output: the same text, segmented into sentences.
Parameters
| Parameter | Type | Description |
|---|---|---|
processSsml | Boolean | Respect SSML markup in the input when detecting sentence boundaries. |
3.7 NLP_g2a — Grapheme‑to‑accent#
Adds diacritics / accents to unaccented text (language‑dependent).
- Input: unaccented text.
- Output: accented text.
Parameters
| Parameter | Type | Description |
|---|---|---|
fastEnabled | Boolean | Use the fast (lower‑accuracy) inference path. |
customTransformationSets | Set of G2A phrase‑set IDs | Phrase sets that override the default mapping. See the phrase sets manual. |
additionalCustomTransformations | Map<string,string> | Ad‑hoc overrides provided inline (source → target), without creating a phrase set. |
3.8 NLP_nmt — Neural machine translation#
Translates text between the source and target languages encoded in the tag.
- Input: text in the source language.
- Output: text in the target language.
No configurable parameters. Pick the language pair by choosing the right tag.
3.9 NLP_ac — Auto correct (replacements)#
Applies a user‑defined list of replacement rules to text (e.g. terminology normalization, brand‑name fixes).
- Input: text.
- Output: text with replacements applied.
Parameters
| Parameter | Type | Description |
|---|---|---|
replacementSets | Set of replacement‑set IDs | Which of your replacement sets to apply. |
3.10 NLP_lid — Language identification#
Identifies the language of the input text.
- Input: text.
- Output: text (annotated with detected language).
No configurable parameters.
3.11 NLP_ts — Text summarization#
Summarizes a longer text. Depending on the model it can also extract keywords, key points, agenda items, etc.
- Input: text (typically a full transcript).
- Output: summary text plus the optional extracted artefacts you asked for.
Parameters
| Parameter | Type | Description |
|---|---|---|
summaryType | Enum — SHORT or LONG | Overall length of the generated summary. |
keywordsEnabled | Boolean | Also extract keywords. |
keypointsEnabled | Boolean | Also extract key points. |
diarizationEnabled | Boolean | Treat the input as diarized and preserve speaker attribution in the summary. |
agendaItemsEnabled | Boolean | Also extract agenda items. |
3.12 NLP_tf — Transcript finalization#
Processes interim transcripts and finalizes them, guaranteeing that at least one final segment is emitted roughly every 2 seconds even when the upstream ASR stage has not yet produced one. Useful for keeping downstream stages (translation, summarization, etc.) fed on a steady cadence in real‑time pipelines.
- Input: text segments (mix of interim and final).
- Output: text segments, with finals emitted on a regular cadence.
3.13 TTS — Text‑to‑speech#
Synthesizes audio from text.
- Input: text (plain or SSML, see features).
- Output: audio.
Features
| Feature | Meaning |
|---|---|
ssmlA, ssmlB, ssmlC | The model accepts one of the supported SSML dialects. Use the dialect advertised on the tag you chose. |
voiceTags | The model understands inline voice‑selection tags. |
Parameters
| Parameter | Type | Description |
|---|---|---|
sampleRate | Integer (allowed set in values) | Sample rate of the produced audio, in Hz. |
4. Composing a pipeline#
When you send a pipeline to /api/pipelines/process (or configure a streaming session), you submit a list of stages, each referencing one of the configOptions you received. For every parameter you want to set, send the value inline:
[ { "task": "ASR", "config": { "tag": "RIVA:sl-SI:COL:20250929-1800", "parameters": { "enableInterims": true, "enableSd": true, "boostedPhraseSets": [42, 108] } } }, { "task": "NLP_nmt", "config": { "tag": "LPT:sl-SI|en-US:COL:v1", "parameters": {} } }]Rules:
- The
tagmust come from theconfigOptionsreturned for the stage. - You may only set parameters that the chosen option advertises. Unknown or unauthorized parameters are rejected.
- Features are not sent — they are read‑only metadata.
- Omitted parameters fall back to the default advertised in
configOptions, if any.
5. Errors#
| Status | When |
|---|---|
400 Bad Request | Malformed pipeline, unknown parameter, wrong parameter type, incompatible stage ordering. |
401 Unauthorized | Missing or invalid bearer token. |
403 Forbidden | The token does not carry the authority required to run pipelines. |
404 Not Found | Requested stage or tag is not available to your token. |