Skip to main content

Streaming TTS Quickstart

This tutorial streams text to the Truebar WebSocket API and plays back the synthesised audio. It mirrors the STT quickstart but focuses on text-to-speech (TTS) pipelines.

Prerequisites#

  • A Truebar tenant with Streaming TTS enabled for your project. Ask your workspace admin to grant the Streaming pipelines privilege to a dedicated service user instead of your personal login.
  • Credentials for the OAuth password grant flow (username, password, and client ID). Store them in a .env file or secrets manager—avoid checking them into version control.
  • Runtime requirements:
    • Node.js 18+ with ES modules enabled ("type": "module" in package.json or an .mjs extension).
    • Python 3.10+ with pip and a working C toolchain (required by soundfile/libsndfile).
    • FFmpeg (for the ffplay utility) or an alternative audio player.

If you are using a non-production cluster, replace hostnames and WebSocket URLs with the ones provided by your environment administrator.

Prepare environment variables#

Load the following variables in your shell (or write them to .env and use a loader such as direnv or dotenv-cli):

export TRUEBAR_USERNAME="alice@example.com"export TRUEBAR_PASSWORD="super-secret"export TRUEBAR_CLIENT_ID="truebar-client"export TRUEBAR_AUTH_URL="https://auth.true-bar.si/realms/truebar/protocol/openid-connect/token"export TRUEBAR_TTS_WS_URL="wss://api.true-bar.si/api/pipelines/stream"

Rotate credentials regularly and prefer short-lived passwords or tokens where possible.

What you will build#

  1. Authenticate with Truebar and attach the resulting bearer token as a query parameter on the TTS WebSocket endpoint.
  2. Stream text segments to the pipeline and receive raw 16 kHz PCM audio chunks in response.
  3. Persist the stream locally (speech.pcm or speech.wav) and play it back for verification.

Expect initial configuration round trips to add ~150–300 ms latency depending on the selected voice and preprocessing stages. See the troubleshooting notes below for guidance on rate limits and payload sizing.

Voice tag

Set TRUEBAR_TTS_TAG to the voice you intend to use (discoverable via GET /api/pipelines/stages or your .env.truebar). Riva voices also require the accompanying TRUEBAR_TTS_NLP_* tags shown later in this guide.


1. Run the sample#

Install dependencies and run the script below:

npm init -ynpm pkg set type=modulenpm install axios wsnode tts.js
Node environment

If your credentials live in .env, launch the script with npx dotenv -e .env -- node tts.js (or configure your shell to export the variables before running node). The sample requires Node.js 18+.

tts.js
import axios from "axios";import WebSocket from "ws";import { createWriteStream } from "node:fs";
const isRivaTag = (tag: string | undefined) => !!tag && /^(RIVA|RIVA-STREAM|VIT):/i.test(tag);
const parseJsonEnv = (name: string) => {  const raw = process.env[name];  if (!raw) return {};  try {    const parsed = JSON.parse(raw);    return typeof parsed === "object" && parsed !== null ? parsed : {};  } catch (error) {    throw new Error(`Invalid JSON in ${name}: ${raw}`);  }};
async function fetchToken() {  const form = new URLSearchParams({    grant_type: "password",    username: process.env.TRUEBAR_USERNAME!,    password: process.env.TRUEBAR_PASSWORD!,    client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client",  });
  const { data } = await axios.post(process.env.TRUEBAR_AUTH_URL!, form, {    headers: { "Content-Type": "application/x-www-form-urlencoded" },    // Retry guidance: a 401 means the password or client ID is wrong or expired.    validateStatus: (status) => status >= 200 && status < 500,  });
  if (!data.access_token) {    throw new Error(`Failed to obtain token: ${JSON.stringify(data)}`);  }
  return data.access_token as string;}
function buildPipelineConfig() {  const tag = process.env.TRUEBAR_TTS_TAG ?? "RIVA:en-US:*:*";  const pipeline: any[] = [];
  if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_ST_TAG) {    pipeline.push({      task: "NLP_st",      exceptionHandlingPolicy: "THROW",      config: {        tag: process.env.TRUEBAR_TTS_NLP_ST_TAG,        parameters: parseJsonEnv("TRUEBAR_TTS_NLP_ST_PARAMETERS"),      },    });  }
  if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_TN_TAG) {    pipeline.push({      task: "NLP_tn",      exceptionHandlingPolicy: "THROW",      config: {        tag: process.env.TRUEBAR_TTS_NLP_TN_TAG,        parameters: parseJsonEnv("TRUEBAR_TTS_NLP_TN_PARAMETERS"),      },    });  }
  if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_G2A_TAG) {    pipeline.push({      task: "NLP_g2a",      exceptionHandlingPolicy: "THROW",      config: {        tag: process.env.TRUEBAR_TTS_NLP_G2A_TAG,        parameters: parseJsonEnv("TRUEBAR_TTS_NLP_G2A_PARAMETERS"),      },    });  }
  return {    type: "CONFIG",    pipeline: [      ...pipeline,      {        task: "TTS",        exceptionHandlingPolicy: "THROW",        config: {          tag,          parameters: parseJsonEnv("TRUEBAR_TTS_PARAMETERS"),        },      },    ],  };}
const useSsml = () => {  const flag = process.env.TRUEBAR_TTS_ENABLE_SSML;  return !!flag && ["1", "true", "yes", "on"].includes(flag.toLowerCase());};
const wrapText = (text: string) => {  const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim();  const rateAttr = rate ? ` rate="${rate}"` : "";  const escaped = text.replace(/&/g, "&amp;").replace(/</g, "&lt;").replace(/>/g, "&gt;");  return `<speak><prosody${rateAttr}>${escaped}</prosody></speak>`;};
async function main() {  const token = await fetchToken();  const ws = new WebSocket(process.env.TRUEBAR_TTS_WS_URL!, {    headers: { Authorization: `Bearer ${token}` },  });  const out = createWriteStream("speech.pcm");  let streamed = false;  const text =    "Hello from the Truebar streaming API. This sample generates audio on demand.";
  ws.on("message", (payload, isBinary) => {    if (isBinary) {      out.write(payload as Buffer);      return;    }
    const msg = JSON.parse(payload.toString());    if (msg.type === "STATUS") {      console.log("STATUS:", msg.status);      if (msg.status === "CONFIGURED" && !streamed) {        streamed = true;        const payload = useSsml() ? wrapText(text) : text;        ws.send(          JSON.stringify({            type: "TEXT_SEGMENT",            textSegment: { text: payload },          }),        );        ws.send(JSON.stringify({ type: "EOS", lockSession: false }));      }      if (msg.status === "FINISHED") {        out.end();        ws.close();      }      return;    }
    if (msg.type === "ERROR") {      console.error("Pipeline error", msg);    }  });
  ws.once("message", () => {    ws.send(JSON.stringify(buildPipelineConfig()));  });
  ws.on("open", () => console.log("TTS stream connected"));  ws.on("close", () => console.log("TTS stream closed"));  ws.on("error", (err) => console.error("TTS error", err));}
main().catch((err) => {  console.error("Fatal error", err);  process.exit(1);});

Listen to the output (PCM16LE, 16 kHz mono):

ffplay -f s16le -ar 16000 -ac 1 speech.pcm
No ffplay?

Install FFmpeg (brew install ffmpeg, sudo apt-get install ffmpeg, or choco install ffmpeg). As alternatives, convert the raw PCM to WAV with ffmpeg -f s16le -ar 16000 -ac 1 -i speech.pcm speech.wav and play it via afplay (macOS) or any desktop audio editor (Windows/Audacity).

Run the Python sample in Jupyter Notebook#

%pip install aiohttp numpy soundfile nest_asyncioimport nest_asyncio, asyncio, osfrom tts import main  # assumes the tutorial code lives in tts.py next to the notebook
os.environ["TRUEBAR_USERNAME"] = "alice@example.com"os.environ["TRUEBAR_PASSWORD"] = "super-secret-passphrase"os.environ["TRUEBAR_CLIENT_ID"] = "truebar-client"os.environ["TRUEBAR_AUTH_URL"] = "https://playground-auth.true-bar.si/realms/truebar/protocol/openid-connect/token"os.environ["TRUEBAR_TTS_WS_URL"] = "wss://playground-api.true-bar.si/api/pipelines/stream"
nest_asyncio.apply()  # allows asyncio.run inside Jupyterawait main()

Play the generated audio directly in the notebook:

from IPython.display import AudioAudio("speech.wav")

2. Configure voices and pipeline stages#

  • Voice tags follow the pattern PROVIDER:locale:gender:voice. Run GET /api/pipelines?task=TTS or check the Truebar Console > Pipelines to list everything your tenant can access. Set TRUEBAR_TTS_TAG to lock the example to a specific voice.
  • Pipeline parameters let you tune the generated speech without switching voices. Common fields include speakingRate (0.5–2.0), pitch (−20 to +20 semitones), and volumeGainDb (−96 to +16). Pass them in the parameters object shown in the snippets.
  • Pre-processing stages (punctuation, text normalisation, accentuation) run before synthesis. Add additional objects ahead of the TTS task in the pipeline array to clean or enrich text before it reaches the voice.
  • Segmenting text: Use multiple TEXT_SEGMENT messages for long content. Keep individual segments under a few sentences (<4 KB payload) to avoid hitting per-message limits and to reduce latency.

Enable the Riva multi-stage pipeline#

Truebar’s Riva voices expect a four-stage pipeline (semantic tagging → text normalisation → grapheme-to-accentuation → TTS). Configure it by exporting the NLP stage tags and turning on SSML support:

export TRUEBAR_TTS_TAG="RIVA:sl-SI:ZIGA:20250408-2103"export TRUEBAR_TTS_ENABLE_SSML=true                # wrap payload in <speak> automaticallyexport TRUEBAR_TTS_NLP_ST_TAG="VIT:sl-SI:*:1.0.0"export TRUEBAR_TTS_NLP_TN_TAG="VIT:sl-SI:*:*"export TRUEBAR_TTS_NLP_G2A_TAG="VIT:sl-SI:*:*"export TRUEBAR_TTS_NLP_ST_PARAMETERS='{"processSsml": true}'export TRUEBAR_TTS_NLP_G2A_PARAMETERS='{"fastEnabled": true}'

With those variables set, the tutorial’s Python and Node.js samples emit a CONFIG payload similar to:

{  "type": "CONFIG",  "pipeline": [    {"task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:1.0.0", "parameters": {"processSsml": true}}},    {"task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:*", "parameters": {}}},    {"task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": {"tag": "VIT:sl-SI:*:*", "parameters": {"fastEnabled": true}}},    {"task": "TTS", "exceptionHandlingPolicy": "THROW", "config": {"tag": "RIVA:sl-SI:ZIGA:20250408-2103", "parameters": {}}}  ]}

You can discover valid tags and parameters via GET /api/pipelines/stages (use the token from the earlier step) or by copying them from your working .env.truebar. Toggle TRUEBAR_TTS_ENABLE_SSML off if you want to send plain text through the same pipeline. Both the Node.js and Python samples automatically add these NLP stages when the active tag belongs to the Riva family (or when you override with TRUEBAR_TTS_USE_RIVA_PIPELINE=1).

3. Troubleshooting and production readiness#

  • 401 or 403 responses typically mean the service user lacks the Streaming pipelines permission or the password expired. Reset credentials or switch to a client-credential flow if available.
  • Watch for 429 rate-limit responses. Batch non-urgent synthesis jobs and retry with exponential backoff.
  • Treat network hiccups as transient: reconnect the WebSocket and resume from the last unsent segment. Both samples can be extended with retry wrappers such as p-retry (Node) or tenacity (Python).
  • Monitor latency by timestamping when you send CONFIG, the first audio chunk, and FINISHED. This highlights voice selection or preprocessing steps that slow down synthesis.
  • Log and persist pipeline errors from the ERROR message type. They often contain actionable details (e.g. unsupported SSML tag, missing pipeline stage).

4. Next steps#

  • Prototype advanced SSML features (say-as, emphasis, phonemes) and combine them with preprocessing stages to keep payloads readable.
  • Review the Streaming TTS guide for protocol details, SSML authoring tips, and pipeline discovery queries.
  • Instrument your application with observability (request IDs, audio durations, jitter) before deploying to production.
  • If you also need transcription, try the Streaming STT quickstart and experiment with duplex pipelines.