Streaming TTS Guide
Streaming TTS lets you turn text into speech with sub-second latency. This page follows the same structure as the main API quickstart: authenticate, open a WebSocket pipeline, stream text segments, and play back audio chunks as they arrive.
Prerequisites#
- OAuth credentials (username/password or a refresh token) with
PIPELINE_ONLINE_APIandSTAGE_TTSroles. TRUEBAR_*environment variables from the Getting Started guide.- A 16Â kHz mono PCM playback path (the API emits PCM16LE frames). For browsers this typically means using the Web Audio API; on the backend you can forward frames to FFmpeg or GStreamer.
- Optional: one of the client libraries if you prefer SDK helpers over raw WebSocket code.
Stage tags
Set TRUEBAR_TTS_TAG to the voice you intend to use. When the tag belongs to the Riva family (RIVA:*, RIVA-STREAM:*, VIT:*), also export the TRUEBAR_TTS_NLP_* tags so the helper stages (sentence tokenization, text normalisation, grapheme-to-accentuation) are available.
Step 1 – Authenticate#
Use the OAuth password grant (or refresh grant) to obtain an access token. The helpers below match the snippets used throughout the quickstarts.
- Node.js
- Python
- Java
import axios from "axios";
export async function fetchTruebarAccessToken() { const form = new URLSearchParams({ grant_type: "password", username: process.env.TRUEBAR_USERNAME, password: process.env.TRUEBAR_PASSWORD, client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client", });
const { data } = await axios.post( process.env.TRUEBAR_AUTH_URL, form, { headers: { "Content-Type": "application/x-www-form-urlencoded" } }, );
if (!data?.access_token) { throw new Error("Missing access_token in response"); }
return data.access_token;}import osimport requests
def fetch_truebar_access_token() -> str: payload = { "grant_type": "password", "username": os.environ["TRUEBAR_USERNAME"], "password": os.environ["TRUEBAR_PASSWORD"], "client_id": os.getenv("TRUEBAR_CLIENT_ID", "truebar-client"), } response = requests.post( os.environ["TRUEBAR_AUTH_URL"], data=payload, headers={"Content-Type": "application/x-www-form-urlencoded"}, timeout=10, ) response.raise_for_status() body = response.json() if "access_token" not in body: raise RuntimeError(f"Missing access_token in response: {body}") return body["access_token"]import java.net.URI;import java.net.URLEncoder;import java.net.http.HttpClient;import java.net.http.HttpRequest;import java.net.http.HttpResponse;import java.nio.charset.StandardCharsets;import java.util.regex.Matcher;import java.util.regex.Pattern;
public class AuthClient { private static final Pattern ACCESS_TOKEN = Pattern.compile("\"access_token\"\\s*:\\s*\"([^\"]+)\"");
public static String fetchToken() throws Exception { String form = String.join("&", "grant_type=password", "username=" + URLEncoder.encode(System.getenv("TRUEBAR_USERNAME"), StandardCharsets.UTF_8), "password=" + URLEncoder.encode(System.getenv("TRUEBAR_PASSWORD"), StandardCharsets.UTF_8), "client_id=" + URLEncoder.encode( System.getenv().getOrDefault("TRUEBAR_CLIENT_ID", "truebar-client"), StandardCharsets.UTF_8) );
HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(System.getenv("TRUEBAR_AUTH_URL"))) .header("Content-Type", "application/x-www-form-urlencoded") .POST(HttpRequest.BodyPublishers.ofString(form)) .build();
HttpClient client = HttpClient.newHttpClient(); HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
if (response.statusCode() >= 400) { throw new IllegalStateException("Token request failed: " + response.body()); }
Matcher matcher = ACCESS_TOKEN.matcher(response.body()); if (!matcher.find()) { throw new IllegalStateException("Missing access_token in response: " + response.body()); } return matcher.group(1); }}Tokens typically last 15Â minutes. Store the refresh_token from the same response and repeat the call with grant_type=refresh_token before expiry.
Server-side quickstart (Node.js, Python, Java)#
The samples below authenticate, open the streaming pipeline, stream two text segments, and save the returned audio to disk.
- Node.js
- Python
- Java
import axios from "axios";import WebSocket from "ws";import { createWriteStream } from "node:fs";
const outputPath = process.env.TRUEBAR_TTS_OUTPUT ?? "speech.pcm";const text = process.env.TRUEBAR_TTS_TEXT ?? "Hello from the Truebar streaming TTS API.\n\nWe generate speech with sub-second latency.";
const parseJsonEnv = (name) => { const raw = process.env[name]; if (!raw) return {}; try { const parsed = JSON.parse(raw); return typeof parsed === "object" && parsed !== null ? parsed : {}; } catch (error) { throw new Error(`Invalid JSON in ${name}: ${raw}`); }};
const isRivaTag = (tag) => /^(RIVA|RIVA-STREAM|VIT):/i.test(tag);
const buildPipeline = () => { const tag = process.env.TRUEBAR_TTS_TAG ?? "RIVA:en-US:*:*"; const stages = []; if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_ST_TAG) { stages.push({ task: "NLP_st", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_ST_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_ST_PARAMETERS"), }, }); } if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_TN_TAG) { stages.push({ task: "NLP_tn", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_TN_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_TN_PARAMETERS"), }, }); } if (isRivaTag(tag) && process.env.TRUEBAR_TTS_NLP_G2A_TAG) { stages.push({ task: "NLP_g2a", exceptionHandlingPolicy: "THROW", config: { tag: process.env.TRUEBAR_TTS_NLP_G2A_TAG, parameters: parseJsonEnv("TRUEBAR_TTS_NLP_G2A_PARAMETERS"), }, }); } stages.push({ task: "TTS", exceptionHandlingPolicy: "THROW", config: { tag, parameters: parseJsonEnv("TRUEBAR_TTS_PARAMETERS"), }, }); return stages;};
async function fetchToken() { const form = new URLSearchParams({ grant_type: "password", username: process.env.TRUEBAR_USERNAME, password: process.env.TRUEBAR_PASSWORD, client_id: process.env.TRUEBAR_CLIENT_ID ?? "truebar-client", });
const { data } = await axios.post(process.env.TRUEBAR_AUTH_URL, form, { headers: { "Content-Type": "application/x-www-form-urlencoded" }, });
return data.access_token;}
const token = await fetchToken();const ws = new WebSocket(process.env.TRUEBAR_TTS_WS_URL, { headers: { Authorization: `Bearer ${token}` },});const out = createWriteStream(outputPath);let sentText = false;let configSent = false;
const useSsml = () => { const flag = (process.env.TRUEBAR_TTS_ENABLE_SSML ?? "").trim().toLowerCase(); const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim(); return Boolean(rate) || ["1", "true", "yes", "on"].includes(flag);};
const wrapText = (segment) => { const rate = process.env.TRUEBAR_TTS_SSML_RATE?.trim(); const rateAttr = rate ? ` rate="${rate}"` : ""; const escaped = segment.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">"); return `<speak><prosody${rateAttr}>${escaped}</prosody></speak>`;};
const sendSegments = () => { const segments = text.split(/\n{2,}/).filter(Boolean); const payloads = segments.length > 0 ? segments : ["Hello from Truebar."]; payloads.forEach((segment) => { const payload = useSsml() ? wrapText(segment) : segment; ws.send( JSON.stringify({ type: "TEXT_SEGMENT", textSegment: { text: payload }, }), ); }); ws.send(JSON.stringify({ type: "EOS", lockSession: false }));};
ws.on("open", () => console.log("TTS stream connected"));ws.on("close", () => { out.end(); console.log(`TTS stream closed – saved audio to ${outputPath}`);});ws.on("error", (err) => console.error("TTS stream error", err));
ws.on("message", (payload, isBinary) => { if (isBinary) { const buffer = Buffer.isBuffer(payload) ? payload : Buffer.from(payload); out.write(buffer); return; }
const msg = JSON.parse(payload.toString()); if (msg.type === "STATUS") { console.log("STATUS:", msg.status); if ((msg.status === "INITIALIZED" && !configSent) { configSent = true; ws.send(JSON.stringify({ type: "CONFIG", pipeline: buildPipeline() })); } if (msg.status === "CONFIGURED" && !sentText) { sentText = true; sendSegments(); } if (msg.status === "FINISHED") { ws.close(); } return; }
if (msg.type === "WARNING") { console.warn("TTS warning", msg); }
if (msg.type === "ERROR") { console.error("Streaming error", msg); ws.close(); }});import asyncioimport jsonimport osfrom pathlib import Pathimport wave
import aiohttp
TEXT = os.getenv( "TRUEBAR_TTS_TEXT", "Hello from the Truebar streaming TTS API.\n\nWe generate speech with sub-second latency.",)OUTPUT_PATH = Path(os.getenv("TRUEBAR_TTS_OUTPUT", "speech.wav"))
def load_stage_parameters(env_var: str) -> dict[str, object]: raw = os.getenv(env_var) if not raw: return {} try: data = json.loads(raw) except json.JSONDecodeError as exc: raise RuntimeError(f"Invalid JSON in {env_var}: {raw}") from exc if not isinstance(data, dict): raise RuntimeError(f"{env_var} must contain a JSON object") return data
def build_pipeline() -> list[dict[str, object]]: tag = os.getenv("TRUEBAR_TTS_TAG", "RIVA:en-US:*:*") is_riva = tag.upper().startswith(("RIVA", "RIVA-STREAM", "VIT")) pipeline: list[dict[str, object]] = []
if is_riva and os.getenv("TRUEBAR_TTS_NLP_ST_TAG"): pipeline.append({ "task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_ST_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_ST_PARAMETERS"), }, })
if is_riva and os.getenv("TRUEBAR_TTS_NLP_TN_TAG"): pipeline.append({ "task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_TN_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_TN_PARAMETERS"), }, })
if is_riva and os.getenv("TRUEBAR_TTS_NLP_G2A_TAG"): pipeline.append({ "task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": { "tag": os.environ["TRUEBAR_TTS_NLP_G2A_TAG"], "parameters": load_stage_parameters("TRUEBAR_TTS_NLP_G2A_PARAMETERS"), }, })
pipeline.append({ "task": "TTS", "exceptionHandlingPolicy": "THROW", "config": { "tag": tag, "parameters": load_stage_parameters("TRUEBAR_TTS_PARAMETERS"), }, }) return pipeline
async def fetch_token(session: aiohttp.ClientSession) -> str: payload = { "grant_type": "password", "username": os.environ["TRUEBAR_USERNAME"], "password": os.environ["TRUEBAR_PASSWORD"], "client_id": os.getenv("TRUEBAR_CLIENT_ID", "truebar-client"), } async with session.post( os.environ["TRUEBAR_AUTH_URL"], data=payload, headers={"Content-Type": "application/x-www-form-urlencoded"}, ) as resp: if resp.status == 401: raise RuntimeError("Authentication failed – check username/password/client ID.") resp.raise_for_status() data = await resp.json() return data["access_token"]
def use_ssml() -> bool: flag = os.getenv("TRUEBAR_TTS_ENABLE_SSML", "") return flag.lower() in {"1", "true", "yes", "on"} or bool(os.getenv("TRUEBAR_TTS_SSML_RATE"))
def wrap_text(text: str) -> str: rate = os.getenv("TRUEBAR_TTS_SSML_RATE") rate_attr = f' rate="{rate.strip()}"' if rate else "" escaped = ( text.replace("&", "&") .replace("<", "<") .replace(">", ">") ) return f"<speak><prosody{rate_attr}>{escaped}</prosody></speak>"
async def main() -> None: async with aiohttp.ClientSession() as session: token = await fetch_token(session)
headers = {"Authorization": f"Bearer {token}"} async with session.ws_connect(os.environ["TRUEBAR_TTS_WS_URL"], headers=headers) as ws: configured = asyncio.Event() audio_chunks: list[bytes] = []
await ws.send_json({"type": "CONFIG", "pipeline": build_pipeline()})
async def sender() -> None: try: await asyncio.wait_for(configured.wait(), timeout=5.0) except asyncio.TimeoutError: print("CONFIGURED status not received in time; sending payload anyway.")
segments = [chunk.strip() for chunk in TEXT.split("\n\n") if chunk.strip()] if not segments: segments = [TEXT.strip() or "Hello from Truebar."] for segment in segments: payload = wrap_text(segment) if use_ssml() else segment await ws.send_json({ "type": "TEXT_SEGMENT", "textSegment": {"text": payload}, }) await ws.send_json({"type": "EOS", "lockSession": False})
async def receiver() -> None: async for msg in ws: if msg.type == aiohttp.WSMsgType.BINARY: audio_chunks.append(bytes(msg.data)) continue if msg.type != aiohttp.WSMsgType.TEXT: continue
data = msg.json() if data["type"] == "STATUS": print("STATUS:", data["status"]) if data["status"] in {"INITIALIZED", "CONFIG_REQUIRED", "CONFIGURED"}: configured.set() if data["status"] == "FINISHED": break elif data["type"] == "WARNING": print("WARNING:", data) elif data["type"] == "ERROR": raise RuntimeError(f"Pipeline error: {data}")
await asyncio.gather(sender(), receiver())
if not audio_chunks: raise RuntimeError("Pipeline returned no audio; check stage tags and permissions.")
frames = b"".join(audio_chunks) with wave.open(str(OUTPUT_PATH), "wb") as wav: wav.setnchannels(1) wav.setsampwidth(2) wav.setframerate(16_000) wav.writeframes(frames)
print(f"Saved synthesis to {OUTPUT_PATH}")
if __name__ == "__main__": asyncio.run(main())import java.io.IOException;import java.io.OutputStream;import java.net.URI;import java.net.http.HttpClient;import java.net.http.HttpRequest;import java.net.http.HttpResponse;import java.net.http.WebSocket;import java.nio.ByteBuffer;import java.nio.file.Files;import java.nio.file.Path;import java.util.ArrayList;import java.util.List;import java.util.Locale;import java.util.concurrent.CompletableFuture;import java.util.concurrent.CompletionStage;import java.util.regex.Matcher;import java.util.regex.Pattern;
public class StreamingTts { private static final Pattern TYPE_PATTERN = Pattern.compile("\"type\"\\s*:\\s*\"([^\"]+)\""); private static final Pattern STATUS_PATTERN = Pattern.compile("\"status\"\\s*:\\s*\"([^\"]+)\""); private static final Pattern ACCESS_TOKEN = Pattern.compile("\"access_token\"\\s*:\\s*\"([^\"]+)\"");
public static void main(String[] args) throws Exception { HttpClient client = HttpClient.newHttpClient(); String token = fetchToken(client);
Path outputPath = Path.of(System.getenv().getOrDefault("TRUEBAR_TTS_OUTPUT", "speech.pcm")); ensureOutputDirectory(outputPath);
List<String> segments = buildSegments(); String configMessage = buildConfigMessage(); CompletableFuture<Void> done = new CompletableFuture<>();
try (OutputStream out = Files.newOutputStream(outputPath)) { WebSocket.Listener listener = new WebSocket.Listener() { private boolean configSent = false; private boolean textSent = false;
@Override public void onOpen(WebSocket webSocket) { webSocket.request(1); }
@Override public CompletionStage<?> onText(WebSocket webSocket, CharSequence data, boolean last) { String message = data.toString(); Matcher typeMatcher = TYPE_PATTERN.matcher(message); String type = typeMatcher.find() ? typeMatcher.group(1) : "";
if ("STATUS".equals(type)) { Matcher statusMatcher = STATUS_PATTERN.matcher(message); if (statusMatcher.find()) { String status = statusMatcher.group(1); System.out.println("STATUS: " + status); if (!configSent && ("INITIALIZED".equals(status) || "CONFIG_REQUIRED".equals(status))) { configSent = true; webSocket.sendText(configMessage, true).join(); } if ("CONFIGURED".equals(status) && !textSent) { textSent = true; sendSegments(webSocket, segments); } if ("FINISHED".equals(status)) { webSocket.sendClose(WebSocket.NORMAL_CLOSURE, "done"); } } } else if ("ERROR".equals(type)) { System.err.println("Streaming error: " + message); webSocket.sendClose(WebSocket.NORMAL_CLOSURE, "error"); } else if ("WARNING".equals(type)) { System.out.println("WARNING: " + message); }
webSocket.request(1); return null; }
@Override public CompletionStage<?> onBinary(WebSocket webSocket, ByteBuffer data, boolean last) { byte[] buffer = new byte[data.remaining()]; data.get(buffer); try { out.write(buffer); } catch (IOException exception) { throw new RuntimeException("Failed to write audio", exception); } webSocket.request(1); return null; }
@Override public void onError(WebSocket webSocket, Throwable error) { done.completeExceptionally(error); }
@Override public CompletionStage<?> onClose(WebSocket webSocket, int statusCode, String reason) { done.complete(null); return null; }
private void sendSegments(WebSocket webSocket, List<String> payloads) { for (String segment : payloads) { String message = String.format( "{\"type\":\"TEXT_SEGMENT\",\"textSegment\":{\"text\":\"%s\"}}", jsonEscape(segment) ); webSocket.sendText(message, true).join(); } webSocket.sendText("{\"type\":\"EOS\",\"lockSession\":false}", true).join(); } };
client.newWebSocketBuilder() .header("Authorization", "Bearer " + token) .buildAsync(URI.create(System.getenv("TRUEBAR_TTS_WS_URL")), listener) .join();
done.join(); }
System.out.println("Saved audio to " + outputPath); }
private static String fetchToken(HttpClient client) throws Exception { String form = "grant_type=password&username=" + System.getenv("TRUEBAR_USERNAME") + "&password=" + System.getenv("TRUEBAR_PASSWORD") + "&client_id=" + System.getenv().getOrDefault("TRUEBAR_CLIENT_ID", "truebar-client");
HttpRequest request = HttpRequest.newBuilder() .uri(URI.create(System.getenv("TRUEBAR_AUTH_URL"))) .header("Content-Type", "application/x-www-form-urlencoded") .POST(HttpRequest.BodyPublishers.ofString(form)) .build();
HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString()); if (response.statusCode() >= 400) { throw new IllegalStateException("Token request failed: " + response.body()); } Matcher matcher = ACCESS_TOKEN.matcher(response.body()); if (!matcher.find()) { throw new IllegalStateException("Missing access_token"); } return matcher.group(1); }
private static String buildConfigMessage() { return "{\"type\":\"CONFIG\",\"pipeline\":" + buildPipelineJson() + "}"; }
private static String buildPipelineJson() { String tag = System.getenv().getOrDefault("TRUEBAR_TTS_TAG", "RIVA:en-US:*:*"); boolean isRiva = isRivaTag(tag); List<String> stages = new ArrayList<>();
if (isRiva && hasEnv("TRUEBAR_TTS_NLP_ST_TAG")) { stages.add(stageEntry("NLP_st", System.getenv("TRUEBAR_TTS_NLP_ST_TAG"), loadParameters("TRUEBAR_TTS_NLP_ST_PARAMETERS"))); } if (isRiva && hasEnv("TRUEBAR_TTS_NLP_TN_TAG")) { stages.add(stageEntry("NLP_tn", System.getenv("TRUEBAR_TTS_NLP_TN_TAG"), loadParameters("TRUEBAR_TTS_NLP_TN_PARAMETERS"))); } if (isRiva && hasEnv("TRUEBAR_TTS_NLP_G2A_TAG")) { stages.add(stageEntry("NLP_g2a", System.getenv("TRUEBAR_TTS_NLP_G2A_TAG"), loadParameters("TRUEBAR_TTS_NLP_G2A_PARAMETERS"))); }
stages.add(stageEntry("TTS", tag, loadParameters("TRUEBAR_TTS_PARAMETERS"))); return "[" + String.join(",", stages) + "]"; }
private static String stageEntry(String task, String tag, String parametersJson) { return String.format( "{\"task\":\"%s\",\"exceptionHandlingPolicy\":\"THROW\",\"config\":{\"tag\":\"%s\",\"parameters\":%s}}", task, tag, parametersJson ); }
private static String loadParameters(String envVar) { String raw = System.getenv(envVar); if (raw == null || raw.isBlank()) { return "{}"; } String trimmed = raw.strip(); if (!(trimmed.startsWith("{") && trimmed.endsWith("}"))) { throw new IllegalArgumentException(envVar + " must contain a JSON object"); } return trimmed; }
private static List<String> buildSegments() { String raw = System.getenv().getOrDefault( "TRUEBAR_TTS_TEXT", "Hello from the Truebar streaming TTS API.\n\nWe generate speech with sub-second latency." ); String[] parts = raw.split("\\n{2,}"); List<String> output = new ArrayList<>(); boolean useSsml = ssmlEnabled();
for (String part : parts) { String trimmed = part.strip(); if (trimmed.isEmpty()) { continue; } output.add(useSsml ? wrapText(trimmed) : trimmed); }
if (output.isEmpty()) { String fallback = "Hello from Truebar."; output.add(useSsml ? wrapText(fallback) : fallback); }
return output; }
private static boolean ssmlEnabled() { String rate = System.getenv("TRUEBAR_TTS_SSML_RATE"); if (rate != null && !rate.isBlank()) { return true; } String flag = System.getenv("TRUEBAR_TTS_ENABLE_SSML"); if (flag == null) { return false; } String normalised = flag.trim().toLowerCase(Locale.ROOT); return normalised.equals("1") || normalised.equals("true") || normalised.equals("yes") || normalised.equals("on"); }
private static String wrapText(String text) { String rate = System.getenv("TRUEBAR_TTS_SSML_RATE"); String rateAttr = (rate != null && !rate.isBlank()) ? " rate=\"" + rate.strip() + "\"" : ""; return "<speak><prosody" + rateAttr + ">" + escapeXml(text) + "</prosody></speak>"; }
private static String escapeXml(String text) { return text.replace("&", "&").replace("<", "<").replace(">", ">"); }
private static String jsonEscape(String value) { StringBuilder builder = new StringBuilder(); for (int i = 0; i < value.length(); i++) { char ch = value.charAt(i); switch (ch) { case '\\': builder.append("\\\\"); break; case '"': builder.append("\\\""); break; case '\n': builder.append("\\n"); break; case '\r': builder.append("\\r"); break; case '\t': builder.append("\\t"); break; default: if (ch < 0x20) { builder.append(String.format("\\u%04x", (int) ch)); } else { builder.append(ch); } } } return builder.toString(); }
private static boolean isRivaTag(String tag) { String upper = tag.toUpperCase(Locale.ROOT); return upper.startsWith("RIVA:") || upper.startsWith("RIVA-STREAM:") || upper.startsWith("VIT:"); }
private static boolean hasEnv(String name) { String value = System.getenv(name); return value != null && !value.isBlank(); }
private static void ensureOutputDirectory(Path path) throws IOException { Path parent = path.toAbsolutePath().getParent(); if (parent != null) { Files.createDirectories(parent); } }}- Connect to the URL stored in
TRUEBAR_TTS_WS_URL. - Send the bearer token in the
Authorizationheader so that credentials do not leak through logs or caches. - The first message you receive is a
STATUSupdate (INITIALIZED). Reply with yourCONFIGpayload at that point.
Step 3 – Configure the pipeline#
Send a CONFIG message containing the pipeline definition. Riva-family voices normally run the semantic tagging (NLP_st), text normalisation (NLP_tn), and grapheme-to-accentuation (NLP_g2a) stages before the final TTS task. Single-stage voices only require the TTS entry. The server-side quickstart above builds the pipeline dynamically; the Example pipeline later in this guide shows the JSON structure in full.
Step 4 – Stream text segments#
- Send only one
TEXT_SEGMENTmessage per sentence. Keep segments short (1–2 sentences) for lower latency. - Wrap the text in SSML (
<speak>…</speak>) when you setTRUEBAR_TTS_ENABLE_SSML=trueor when the stage listsprocessSsmlsupport. - Finish the session with
{"type":"EOS","lockSession":false}so the platform can flush the pipeline.
{ "type": "TEXT_SEGMENT", "textSegment": { "text": "Hello from Truebar!", }}Please note that there are also other methods available for finishing the session besides sending EOS:
- Closing the WebSocket directly: This will immediately terminate the session on the API side. Any unprocessed data at the time of closure will be discarded. The session will be marked as CANCELED and cannot be resumed.
- Canceling the session without closing the WebSocket: This method is useful when the client does not need to wait for all results but wants to keep the WebSocket connection open. Instead of sending an EOS message, the client can send a CANCEL message. After canceling the session, the client will receive a message with status = CANCELED. The WebSocket connection will then return to the same state as when it was first INITIALIZED. To start a new session over the same WebSocket, the client should send a CONFIG message and begin streaming data messages again. Using this method, the session is again marked as CANCELED and can not be resumed.
Discover voices and preprocessing stages#
Use /api/pipelines/stages to list the TTS voices available to your account. When the selected tag begins with RIVA, RIVA-STREAM, or VIT, the response also includes the NLP helper stages (NLP_st, NLP_tn, NLP_g2a) that should precede the TTS stage. Export those tags (and their optional parameter JSON) via environment variables such as TRUEBAR_TTS_NLP_ST_TAG and TRUEBAR_TTS_NLP_ST_PARAMETERS. For single-stage voices, you only need the TTS tag.
Step 5 – Play back audio#
Binary messages from the socket contain PCM16LE audio chunks (mono, 16Â kHz). Forward them directly to your playback pipeline.
ttsSocket.on("message", (event, isBinary) => { if (isBinary) { const pcmChunk = Buffer.from(event as ArrayBuffer); audioPlayback.enqueue(pcmChunk); // Write to speaker, file, or RTP stream. return; }
const payload = JSON.parse(event.toString()); if (payload.type === "STATUS" && payload.status === "FINISHED") { console.log("TTS session complete"); } if (payload.type === "ERROR") { console.error("TTS error", payload); }});Record local timestamps when you send CONFIG, when you receive STATUS: CONFIGURED, and when STATUS: FINISHED arrives to track end-to-end latency. The payloads include server-side timestamps that you can compare against your own logs if you need deeper diagnostics.
When the service sends an ERROR message, log it (the payload contains a descriptive message field), close the socket, and reconnect once the underlying issue is resolved or after a short backoff if you suspect a transient failure.
Browser playback example#
Modern browsers support AudioWorklet for low-latency playback. Load a small worklet that accepts PCM16 frames and posts them to the audio graph:
const audioContext = new AudioContext({ sampleRate: 16_000 });await audioContext.audioWorklet.addModule('/audio/pcm-player.worklet.js');
const player = new AudioWorkletNode(audioContext, 'pcm-player');player.connect(audioContext.destination);
ttsSocket.addEventListener('message', async (event) => { if (typeof event.data === 'string') return;
const pcm = await event.data.arrayBuffer(); player.port.postMessage(pcm);});Worklet module (/audio/pcm-player.worklet.js):
class PCMPlayerProcessor extends AudioWorkletProcessor { constructor() { super(); this.queue = []; this.offset = 0; this.port.onmessage = (event) => { const int16 = new Int16Array(event.data); const float32 = new Float32Array(int16.length); for (let i = 0; i < int16.length; i++) { float32[i] = int16[i] / 0x8000; } this.queue.push(float32); }; }
process(_, outputs) { const output = outputs[0][0]; output.fill(0);
let written = 0; while (written < output.length && this.queue.length) { const chunk = this.queue[0]; const remaining = chunk.length - this.offset; const copyCount = Math.min(remaining, output.length - written); output.set(chunk.subarray(this.offset, this.offset + copyCount), written); written += copyCount; this.offset += copyCount; if (this.offset >= chunk.length) { this.queue.shift(); this.offset = 0; } }
return true; }}
registerProcessor('pcm-player', PCMPlayerProcessor);Remember to resume or unlock the AudioContext in response to a user gesture in browsers that require it.
Designing your pipeline#
Truebar lets you compose multiple stages. Consider the following when building pipelines:
- Riva pipelines (
RIVA:*orRIVA-STREAM:*tags) often expect normalised, phonemised text. PrependNLP_tn,NLP_g2a, and optionallyNLP_ststages before the finalTTStask. - Automatic SSML: Many voices enable SSML automatically. Send structured tokens or wrap content in
<speak>tags when you want to use ssml. - Sentence segmentation: Keep segments concise (1–2 sentences). This reduces latency.
- Sample rate: Output is always 16Â kHz mono PCM. Resample on the client side if you require 8Â kHz telephony output or 48Â kHz media playback.
Example pipeline with Riva preprocessing:
[ { "task": "NLP_st", "exceptionHandlingPolicy": "THROW", "config": { "tag": "RIVA:en-US", "parameters": {} } }, { "task": "NLP_tn", "exceptionHandlingPolicy": "THROW", "config": { "tag": "RIVA:en-US", "parameters": {} } }, { "task": "NLP_g2a", "exceptionHandlingPolicy": "THROW", "config": { "tag": "RIVA:en-US", "parameters": {} } }, { "task": "TTS", "exceptionHandlingPolicy": "THROW", "config": { "tag": "RIVA:en-US:Alloy", "parameters": {} } }]Troubleshooting#
- 401/403 responses – Confirm your token includes
PIPELINE_ONLINE_APIandSTAGE_TTS. - Session never reaches
CONFIGURED– Double-check the stage tag or use discovery to list available tags for your account. - Audio stutters – Buffer chunks before playback or stream them into an audio worklet to smooth timing.
- Need automatic token refresh – Use the Java SDK or reuse the token helper from the streaming quickstarts.
Next steps#
- Combine streaming TTS with streaming STT to build a full duplex voice assistant.
- Try the Streaming TTS quickstart for copy-paste JavaScript and Python samples.
- Use the Truebar history API to archive synthesized audio or diagnose past sessions.