Skip to main content

Streaming TTS Guide

Streaming TTS lets you turn text into speech with sub-second latency. This page follows the same structure as the main API quickstart: authenticate, open a WebSocket pipeline, stream text segments, and play back audio chunks as they arrive.

Prerequisites#

  • OAuth credentials (username/password or a refresh token) with PIPELINE_ONLINE_API and STAGE_TTS roles.
  • TRUEBAR_* environment variables from the Getting Started guide.
  • A 16 kHz mono PCM playback path (the API emits PCM16LE frames). For browsers this typically means using the Web Audio API; on the backend you can forward frames to FFmpeg or GStreamer.
  • Optional: one of the client libraries if you prefer SDK helpers over raw WebSocket code.
Stage tags

Set TRUEBAR_TTS_TAG to the voice you intend to use. When the tag belongs to the Riva family (RIVA:*, RIVA-STREAM:*, VIT:*), also export the TRUEBAR_TTS_NLP_* tags so the helper stages (sentence tokenization, text normalisation, grapheme-to-accentuation) are available.

Step 1 – Authenticate#

Use the OAuth password grant (or refresh grant) to obtain an access token. The helpers below match the snippets used throughout the quickstarts.

auth.mjs
import axios from "axios";
export async function fetchTruebarAccessToken() {  const form = new URLSearchParams({    grant_type: "password",    username: process.env.TRUEBAR_USERNAME,    password: process.env.TRUEBAR_PASSWORD,    client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client",  });
  const { data } = await axios.post(    process.env.TRUEBAR_AUTH_URL,    form,    { headers: { "Content-Type": "application/x-www-form-urlencoded" } },  );
  if (!data?.access_token) {    throw new Error("Missing access_token in response");  }
  return data.access_token;}

Tokens typically last 15 minutes. Store the refresh_token from the same response and repeat the call with grant_type=refresh_token before expiry.

Server-side quickstart (Node.js, Python, Java)#

The samples below authenticate, open the streaming pipeline, stream two text segments, and save the returned audio to disk.

tts-stream.mjs
import axios from "axios";import WebSocket from "ws";import { createWriteStream } from "node:fs";
const outputPath = process.env.TRUEBAR_TTS_OUTPUT ?? "speech.pcm";const text =  process.env.TRUEBAR_TTS_TEXT ??  "Hello from the Truebar streaming TTS API.\n\nWe generate speech with sub-second latency.";
const parseJsonEnv = (name) => {  const raw = process.env[name];  if (!raw) return {};  try {    const parsed = JSON.parse(raw);    return typeof parsed === "object" && parsed !== null ? parsed : {};  } catch (error) {    throw new Error(`Invalid JSON in ${name}: ${raw}`);  }};
const isRivaTag = (tag) => /^(RIVA|RIVA-STREAM|VIT):/i.test(tag);
const buildPipeline = () => {  const tag = process.env.TRUEBAR_TTS_TAG ?? "RIVA:en-US:*:*";  const stages = [];  if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_ST_TAG) {    stages.push({      task: "NLP_st",      exceptionHandlingPolicy: "THROW",      config: {        tag: process.env.TRUEBAR_TTS_NLP_ST_TAG,        parameters: parseJsonEnv("TRUEBAR_TTS_NLP_ST_PARAMETERS"),      },    });  }  if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_TN_TAG) {    stages.push({      task: "NLP_tn",      exceptionHandlingPolicy: "THROW",      config: {        tag: process.env.TRUEBAR_TTS_NLP_TN_TAG,        parameters: parseJsonEnv("TRUEBAR_TTS_NLP_TN_PARAMETERS"),      },    });  }  if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_G2A_TAG) {    stages.push({      task: "NLP_g2a",      exceptionHandlingPolicy: "THROW",      config: {        tag: process.env.TRUEBAR_TTS_NLP_G2A_TAG,        parameters: parseJsonEnv("TRUEBAR_TTS_NLP_G2A_PARAMETERS"),      },    });  }  stages.push({    task: "TTS",    exceptionHandlingPolicy: "THROW",    config: {      tag,      parameters: parseJsonEnv("TRUEBAR_TTS_PARAMETERS"),    },  });  return stages;};
async function fetchToken() {  const form = new URLSearchParams({    grant_type: "password",    username: process.env.TRUEBAR_USERNAME,    password: process.env.TRUEBAR_PASSWORD,    client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client",  });
  const { data } = await axios.post(process.env.TRUEBAR_AUTH_URL, form, {    headers: { "Content-Type": "application/x-www-form-urlencoded" },  });
  return data.access_token;}
const token = await fetchToken();const ws = new WebSocket(process.env.TRUEBAR_TTS_WS_URL, {  headers: { Authorization: `Bearer ${token}` },});const out = createWriteStream(outputPath);let sentText = false;let configSent = false;
const useSsml = () => {  const flag = (process.env.TRUEBAR_TTS_ENABLE_SSML ?? "").trim().toLowerCase();  const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim();  return Boolean(rate) || ["1", "true", "yes", "on"].includes(flag);};
const wrapText = (segment) => {  const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim();  const rateAttr = rate ? ` rate="${rate}"` : "";  const escaped = segment.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");  return `<speak><prosody${rateAttr}>${escaped}</prosody></speak>`;};
const sendSegments = () => {  const segments = text.split(/\n{2,}/).filter(Boolean);  const payloads = segments.length > 0 ? segments : ["Hello from Truebar."];  payloads.forEach((segment) => {    const payload = useSsml() ? wrapText(segment) : segment;    ws.send(      JSON.stringify({        type: "TEXT_SEGMENT",        textSegment: { text: payload },      }),    );  });  ws.send(JSON.stringify({ type: "EOS", lockSession: false }));};
ws.on("open", () => console.log("TTS stream connected"));ws.on("close", () => {  out.end();  console.log(`TTS stream closed – saved audio to ${outputPath}`);});ws.on("error", (err) => console.error("TTS stream error", err));
ws.on("message", (payload, isBinary) => {  if (isBinary) {    const buffer = Buffer.isBuffer(payload) ? payload : Buffer.from(payload);    out.write(buffer);    return;  }
  const msg = JSON.parse(payload.toString());  if (msg.type === "STATUS") {    console.log("STATUS:", msg.status);    if ((msg.status === "INITIALIZED" && !configSent) {      configSent = true;      ws.send(JSON.stringify({ type: "CONFIG", pipeline: buildPipeline() }));    }    if (msg.status === "CONFIGURED" && !sentText) {      sentText = true;      sendSegments();    }    if (msg.status === "FINISHED") {      ws.close();    }    return;  }
  if (msg.type === "WARNING") {    console.warn("TTS warning", msg);  }
  if (msg.type === "ERROR") {    console.error("Streaming error", msg);    ws.close();  }});
## Step 2 – Open a WebSocket session
  • Connect to the URL stored in TRUEBAR_TTS_WS_URL.
  • Send the bearer token in the Authorization header so that credentials do not leak through logs or caches.
  • The first message you receive is a STATUS update (INITIALIZED). Reply with your CONFIG payload at that point.

Step 3 – Configure the pipeline#

Send a CONFIG message containing the pipeline definition. Riva-family voices normally run the semantic tagging (NLP_st), text normalisation (NLP_tn), and grapheme-to-accentuation (NLP_g2a) stages before the final TTS task. Single-stage voices only require the TTS entry. The server-side quickstart above builds the pipeline dynamically; the Example pipeline later in this guide shows the JSON structure in full.

Step 4 – Stream text segments#

  • Send only one TEXT_SEGMENT message per sentence. Keep segments short (1–2 sentences) for lower latency.
  • Wrap the text in SSML (<speak>…</speak>) when you set TRUEBAR_TTS_ENABLE_SSML=true or when the stage lists processSsml support.
  • Finish the session with {"type":"EOS","lockSession":false} so the platform can flush the pipeline.
TEXT_SEGMENT schema
{  "type": "TEXT_SEGMENT",  "textSegment": {    "text": "Hello from Truebar!",  }}

Please note that there are also other methods available for finishing the session besides sending EOS:

  • Closing the WebSocket directly: This will immediately terminate the session on the API side. Any unprocessed data at the time of closure will be discarded. The session will be marked as CANCELED and cannot be resumed.
  • Canceling the session without closing the WebSocket: This method is useful when the client does not need to wait for all results but wants to keep the WebSocket connection open. Instead of sending an EOS message, the client can send a CANCEL message. After canceling the session, the client will receive a message with status = CANCELED. The WebSocket connection will then return to the same state as when it was first INITIALIZED. To start a new session over the same WebSocket, the client should send a CONFIG message and begin streaming data messages again. Using this method, the session is again marked as CANCELED and can not be resumed.

Discover voices and preprocessing stages#

Use /api/pipelines/stages to list the TTS voices available to your account. When the selected tag begins with RIVA, RIVA-STREAM, or VIT, the response also includes the NLP helper stages (NLP_st, NLP_tn, NLP_g2a) that should precede the TTS stage. Export those tags (and their optional parameter JSON) via environment variables such as TRUEBAR_TTS_NLP_ST_TAG and TRUEBAR_TTS_NLP_ST_PARAMETERS. For single-stage voices, you only need the TTS tag.

Step 5 – Play back audio#

Binary messages from the socket contain PCM16LE audio chunks (mono, 16 kHz). Forward them directly to your playback pipeline.

ttsSocket.on("message", (event, isBinary) => {  if (isBinary) {    const pcmChunk = Buffer.from(event as ArrayBuffer);    audioPlayback.enqueue(pcmChunk); // Write to speaker, file, or RTP stream.    return;  }
  const payload = JSON.parse(event.toString());  if (payload.type === "STATUS" && payload.status === "FINISHED") {    console.log("TTS session complete");  }  if (payload.type === "ERROR") {    console.error("TTS error", payload);  }});

Record local timestamps when you send CONFIG, when you receive STATUS: CONFIGURED, and when STATUS: FINISHED arrives to track end-to-end latency. The payloads include server-side timestamps that you can compare against your own logs if you need deeper diagnostics.

When the service sends an ERROR message, log it (the payload contains a descriptive message field), close the socket, and reconnect once the underlying issue is resolved or after a short backoff if you suspect a transient failure.

Browser playback example#

Modern browsers support AudioWorklet for low-latency playback. Load a small worklet that accepts PCM16 frames and posts them to the audio graph:

const audioContext = new AudioContext({ sampleRate: 16_000 });await audioContext.audioWorklet.addModule('/audio/pcm-player.worklet.js');
const player = new AudioWorkletNode(audioContext, 'pcm-player');player.connect(audioContext.destination);
ttsSocket.addEventListener('message', async (event) => {  if (typeof event.data === 'string') return;
  const pcm = await event.data.arrayBuffer();  player.port.postMessage(pcm);});

Worklet module (/audio/pcm-player.worklet.js):

class PCMPlayerProcessor extends AudioWorkletProcessor {  constructor() {    super();    this.queue = [];    this.offset = 0;    this.port.onmessage = (event) => {      const int16 = new Int16Array(event.data);      const float32 = new Float32Array(int16.length);      for (let i = 0; i < int16.length; i++) {        float32[i] = int16[i] / 0x8000;      }      this.queue.push(float32);    };  }
  process(_, outputs) {    const output = outputs[0][0];    output.fill(0);
    let written = 0;    while (written < output.length && this.queue.length) {      const chunk = this.queue[0];      const remaining = chunk.length - this.offset;      const copyCount = Math.min(remaining, output.length - written);      output.set(chunk.subarray(this.offset, this.offset + copyCount), written);      written += copyCount;      this.offset += copyCount;      if (this.offset >= chunk.length) {        this.queue.shift();        this.offset = 0;      }    }
    return true;  }}
registerProcessor('pcm-player', PCMPlayerProcessor);

Remember to resume or unlock the AudioContext in response to a user gesture in browsers that require it.

Designing your pipeline#

Truebar lets you compose multiple stages. Consider the following when building pipelines:

  • Riva pipelines (RIVA:* or RIVA-STREAM:* tags) often expect normalised, phonemised text. Prepend NLP_tn, NLP_g2a, and optionally NLP_st stages before the final TTS task.
  • Automatic SSML: Many voices enable SSML automatically. Send structured tokens or wrap content in <speak> tags when you want to use ssml.
  • Sentence segmentation: Keep segments concise (1–2 sentences). This reduces latency.
  • Sample rate: Output is always 16 kHz mono PCM. Resample on the client side if you require 8 kHz telephony output or 48 kHz media playback.

Example pipeline with Riva preprocessing:

[  {    "task": "NLP_st",    "exceptionHandlingPolicy": "THROW",    "config": { "tag": "RIVA:en-US", "parameters": {} }  },  {    "task": "NLP_tn",    "exceptionHandlingPolicy": "THROW",    "config": { "tag": "RIVA:en-US", "parameters": {} }  },  {    "task": "NLP_g2a",    "exceptionHandlingPolicy": "THROW",    "config": { "tag": "RIVA:en-US", "parameters": {} }  },  {    "task": "TTS",    "exceptionHandlingPolicy": "THROW",    "config": {      "tag": "RIVA:en-US:Alloy",      "parameters": {}    }  }]

Troubleshooting#

  • 401/403 responses – Confirm your token includes PIPELINE_ONLINE_API and STAGE_TTS.
  • Session never reaches CONFIGURED – Double-check the stage tag or use discovery to list available tags for your account.
  • Audio stutters – Buffer chunks before playback or stream them into an audio worklet to smooth timing.
  • Need automatic token refresh – Use the Java SDK or reuse the token helper from the streaming quickstarts.

Next steps#