Skip to main content

Pipeline Stages Reference

This manual describes the pipeline stages available over the API and how to read the metadata returned by /api/pipelines/stages. Use it as the reference when composing a pipeline for /api/pipelines/process or for a streaming session.

All requests must include a bearer access token:

Authorization: Bearer <access-token>

All endpoints live under the API context path /api (e.g. https://<host>/api/pipelines/stages).


1. Concepts#

A pipeline is an ordered list of stages that transform data end‑to‑end. Each stage performs one task (e.g. speech recognition, translation, summarization) and passes its output to the next stage.

Every stage has:

FieldMeaning
taskWhat the stage does (e.g. ASR, NLP_nmt, TTS).
typeThe data kind the stage handles: STT, TTT, TTS.
configOptionsThe list of concrete model configurations you are allowed to use, each with its own features and tunable parameters.

1.1 Tags#

Each configuration option is identified by a tag of the form:

<provider>:<language>:<domain>:<version>

You pick one tag per stage when you compose a pipeline. Only tags returned by /api/pipelines/stages for your token are accepted; others are rejected.

1.2 Features vs. parameters#

  • Features are informational flags. They tell you what a particular model is capable of or what it does implicitly — you cannot turn them on or off.

  • Parameters are values you may set when you configure a pipeline. For each parameter the API returns a descriptor:

    { "type": "Boolean", "values": [...], "value": <default> }
    • typeBoolean, Integer, String, Set, Map, or Enum.
    • values (optional) — the allowed set, when the parameter is constrained.
    • value (optional) — the default used when you do not set the parameter.

1.3 Data flow between stage types#

Stage typeConsumesProduces
STTaudiotext segments
TTTtext segmentstext segments
TTStextaudio

You can chain any stages whose input/output types match. Typical shapes:

  • Transcribe audio: ASR(optional text post‑processing)
  • Transcribe + translate: ASRNLP_nmt
  • Speak text: NLP_stTTS (tokenize first then synthesize)
  • Voice translation: ASRNLP_nmtTTS
  • Meeting summary: ASRNLP_ts

2. Endpoints#

2.1 List available stages#

GET /api/pipelines/stages      ?type=STT|TTT|TTS           (optional — filter by stage type)

Response: 200 OK — an array of stages, each with its allowed configOptions.

2.2 Get a single stage#

GET /api/pipelines/stages/{task}

Returns the same shape as one element of the list. 404 Not Found is returned if the stage is not available to you.

2.3 Response shape#

{  "task": "ASR",  "type": "STT",  "configOptions": [    {      "tag": "RIVA:sl-SI:COL:20250929-1800",      "features": ["onlineAsr"],      "parameters": {        "enableUnks":        { "type": "Boolean" },        "enableInterims":    { "type": "Boolean" },        "enableSd":          { "type": "Boolean" },        "boostedPhraseSets": { "type": "Set" }      }    }  ]}

Only stages, tags, features and parameters permitted by your access token appear.


3. Stage reference#

The tables below list the features and parameters that can appear for each stage. Whether a specific option actually shows up in configOptions depends on the model behind a given tag — always read the response before composing a pipeline.

3.1 ASR — Automatic speech recognition#

Converts spoken audio into text segments (words with timing, optional speaker labels, optional interim results).

  • Input: raw audio.
  • Output: transcribed text segments.

Features

FeatureMeaning
onlineAsrSupports real‑time streaming. Interim (partial) results become available.
offlineAsrNon‑streaming; full audio is processed at once.
dictatedCommandsRecognizes spoken commands (e.g. "new line").
dictatedPunctuationsRecognizes spoken punctuation (e.g. "comma", "period").
autoPcAutomatic punctuation is applied implicitly.
autoTcAutomatic true‑casing is applied implicitly.
autoTnAutomatic text normalization is applied implicitly.
autoItnAutomatic inverse text normalization is applied implicitly.

onlineAsr and offlineAsr are mutually exclusive for a given tag.

Parameters

ParameterTypeDescription
enableUnksBooleanEmit <unk> tokens for unrecognized words instead of dropping them.
enableInterimsBooleanEmit partial results while the user is still speaking.
enablePcBooleanApply automatic punctuation. Only when the model lets you toggle it.
enableItnBooleanApply automatic inverse text normalization. Only when toggleable.
enableSdBooleanEnable speaker diarization (label each token with a speaker).
sdSpeakerIdEnabledBooleanEnable speaker identification.
sdMinSpeakersIntegerLower bound on the expected number of speakers.
sdMaxSpeakersIntegerUpper bound on the expected number of speakers.
boostedPhraseSetsSet of BOOST phrase‑set IDsPhrase sets whose phrases should be boosted. See the phrase sets manual.
additionalBoostedPhrasesSet of stringsAd‑hoc phrases to boost for this call only, without creating a phrase set.
enableEndpointingBooleanCut segments on detected endpoints automatically.

3.2 NLP_pc — Automatic punctuation#

Inserts or fixes punctuation in plain text.

  • Input: unpunctuated / lightly punctuated text.
  • Output: same text with punctuation.

Parameters

ParameterTypeDescription
enableSplitToSentencesBooleanAlso split the result into sentence segments.
enableTcBooleanApply true‑casing together with punctuation.

3.3 NLP_tc — True‑casing#

Restores proper capitalization (lowercased or all‑caps input → mixed case).

  • Input: text.
  • Output: text with corrected case.

No configurable parameters.

3.4 NLP_tn — Text normalization#

Expands written forms into their spoken equivalents — e.g. numbers, dates and abbreviations are rewritten in full (12twelve). Typical use: preparing text for TTS.

  • Input: written‑form text.
  • Output: spoken‑form text.

Parameters

ParameterTypeDescription
customConfigMapFree‑form, model‑specific configuration overrides. Consult the model's reference for available keys.

3.5 NLP_itn — Inverse text normalization#

The opposite of text normalization — turns spoken‑form text into written form (twelve12). Typical use: cleaning up ASR output.

  • Input: spoken‑form text.
  • Output: written‑form text.

Parameters

ParameterTypeDescription
convertSmallNumbersBooleanAlso convert small numbers (e.g. five5).

3.6 NLP_st — Sentence tokenization#

Splits a text stream into sentences.

  • Input: text.
  • Output: the same text, segmented into sentences.

Parameters

ParameterTypeDescription
processSsmlBooleanRespect SSML markup in the input when detecting sentence boundaries.

3.7 NLP_g2a — Grapheme‑to‑accent#

Adds diacritics / accents to unaccented text (language‑dependent).

  • Input: unaccented text.
  • Output: accented text.

Parameters

ParameterTypeDescription
fastEnabledBooleanUse the fast (lower‑accuracy) inference path.
customTransformationSetsSet of G2A phrase‑set IDsPhrase sets that override the default mapping. See the phrase sets manual.
additionalCustomTransformationsMap<string,string>Ad‑hoc overrides provided inline (source → target), without creating a phrase set.

3.8 NLP_nmt — Neural machine translation#

Translates text between the source and target languages encoded in the tag.

  • Input: text in the source language.
  • Output: text in the target language.

No configurable parameters. Pick the language pair by choosing the right tag.

3.9 NLP_ac — Auto correct (replacements)#

Applies a user‑defined list of replacement rules to text (e.g. terminology normalization, brand‑name fixes).

  • Input: text.
  • Output: text with replacements applied.

Parameters

ParameterTypeDescription
replacementSetsSet of replacement‑set IDsWhich of your replacement sets to apply.

3.10 NLP_lid — Language identification#

Identifies the language of the input text.

  • Input: text.
  • Output: text (annotated with detected language).

No configurable parameters.

3.11 NLP_ts — Text summarization#

Summarizes a longer text. Depending on the model it can also extract keywords, key points, agenda items, etc.

  • Input: text (typically a full transcript).
  • Output: summary text plus the optional extracted artefacts you asked for.

Parameters

ParameterTypeDescription
summaryTypeEnum — SHORT or LONGOverall length of the generated summary.
keywordsEnabledBooleanAlso extract keywords.
keypointsEnabledBooleanAlso extract key points.
diarizationEnabledBooleanTreat the input as diarized and preserve speaker attribution in the summary.
agendaItemsEnabledBooleanAlso extract agenda items.

3.12 NLP_tf — Transcript finalization#

Processes interim transcripts and finalizes them, guaranteeing that at least one final segment is emitted roughly every 2 seconds even when the upstream ASR stage has not yet produced one. Useful for keeping downstream stages (translation, summarization, etc.) fed on a steady cadence in real‑time pipelines.

  • Input: text segments (mix of interim and final).
  • Output: text segments, with finals emitted on a regular cadence.

3.13 TTS — Text‑to‑speech#

Synthesizes audio from text.

  • Input: text (plain or SSML, see features).
  • Output: audio.

Features

FeatureMeaning
ssmlA, ssmlB, ssmlCThe model accepts one of the supported SSML dialects. Use the dialect advertised on the tag you chose.
voiceTagsThe model understands inline voice‑selection tags.

Parameters

ParameterTypeDescription
sampleRateInteger (allowed set in values)Sample rate of the produced audio, in Hz.

4. Composing a pipeline#

When you send a pipeline to /api/pipelines/process (or configure a streaming session), you submit a list of stages, each referencing one of the configOptions you received. For every parameter you want to set, send the value inline:

[  {    "task": "ASR",    "config": {      "tag": "RIVA:sl-SI:COL:20250929-1800",      "parameters": {        "enableInterims":    true,        "enableSd":          true,        "boostedPhraseSets": [42, 108]      }    }  },  {    "task": "NLP_nmt",    "config": {      "tag": "LPT:sl-SI|en-US:COL:v1",      "parameters": {}    }  }]

Rules:

  • The tag must come from the configOptions returned for the stage.
  • You may only set parameters that the chosen option advertises. Unknown or unauthorized parameters are rejected.
  • Features are not sent — they are read‑only metadata.
  • Omitted parameters fall back to the default advertised in configOptions, if any.

5. Errors#

StatusWhen
400 Bad RequestMalformed pipeline, unknown parameter, wrong parameter type, incompatible stage ordering.
401 UnauthorizedMissing or invalid bearer token.
403 ForbiddenThe token does not carry the authority required to run pipelines.
404 Not FoundRequested stage or tag is not available to your token.