API Documentation

Complete reference for the TTSFM Text-to-Speech API. Free, simple, and powerful.

Overview

The TTSFM API provides a modern, OpenAI-compatible interface for text-to-speech generation. It supports multiple voices, audio formats, and includes advanced features like text length validation and intelligent auto-combine functionality.

Base URL: http://nxdev-org-ttsfm.hf.space/api/

Key Features

  • 🎀 11 different voice options - Choose from alloy, echo, nova, and more
  • 🎡 Multiple audio formats - MP3, WAV, OPUS, AAC, FLAC, PCM support
  • πŸ€– OpenAI compatibility - Drop-in replacement for OpenAI's TTS API
  • ✨ Auto-combine feature - Automatically handles long text (>4096 chars) by splitting and combining audio
  • πŸ“Š Text length validation - Smart validation with configurable limits
  • πŸ“ˆ Real-time monitoring - Status endpoints and health checks
New in v3.2.3: Enhanced `/v1/audio/speech` endpoint with intelligent auto-combine feature. Streamlined web interface with clean, user-friendly design and automatic long-text handling!

Authentication

Currently, the API supports optional API key authentication. If configured, include your API key in the request headers.

Authorization: Bearer YOUR_API_KEY

Text Length Validation

TTSFM includes built-in text length validation to ensure compatibility with TTS models. The default maximum length is 4096 characters, but this can be customized.

Important: Text exceeding the maximum length will be rejected unless validation is disabled or the text is split into chunks.

Validation Options

  • max_length: Maximum allowed characters (default: 4096)
  • validate_length: Enable/disable validation (default: true)
  • preserve_words: Avoid splitting words when chunking (default: true)

API Endpoints

GET /api/voices

Get list of available voices.

Response Example:
{
  "voices": [
    {
      "id": "alloy",
      "name": "Alloy",
      "description": "Alloy voice"
    },
    {
      "id": "echo",
      "name": "Echo", 
      "description": "Echo voice"
    }
  ],
  "count": 6
}

GET /api/formats

Get list of supported audio formats.

Response Example:
{
  "formats": [
    {
      "id": "mp3",
      "name": "MP3",
      "mime_type": "audio/mp3",
      "description": "MP3 audio format"
    }
  ],
  "count": 6
}

POST /api/validate-text

Validate text length and get splitting suggestions.

Request Body:
{
  "text": "Your text to validate",
  "max_length": 4096
}
Response Example:
{
  "text_length": 5000,
  "max_length": 4096,
  "is_valid": false,
  "needs_splitting": true,
  "suggested_chunks": 2,
  "chunk_preview": [
    "First chunk preview...",
    "Second chunk preview..."
  ]
}

POST /api/generate

Generate speech from text.

Request Body:
{
  "text": "Hello, world!",
  "voice": "alloy",
  "format": "mp3",
  "instructions": "Speak cheerfully",
  "max_length": 4096,
  "validate_length": true
}
Parameters:
  • text (required): Text to convert to speech
  • voice (optional): Voice ID (default: "alloy")
  • format (optional): Audio format (default: "mp3")
  • instructions (optional): Voice modulation instructions
  • max_length (optional): Maximum text length (default: 4096)
  • validate_length (optional): Enable validation (default: true)
Response:

Returns audio file with appropriate Content-Type header.

Python Package

Long Text Support

The TTSFM Python package includes built-in long text splitting functionality for developers who need fine-grained control:

from ttsfm import TTSClient, Voice, AudioFormat

# Create client
client = TTSClient()

# Generate speech from long text (automatically splits into separate files)
responses = client.generate_speech_long_text(
    text="Very long text that exceeds 4096 characters...",
    voice=Voice.ALLOY,
    response_format=AudioFormat.MP3,
    max_length=2000,
    preserve_words=True
)

# Save each chunk as separate files
for i, response in enumerate(responses, 1):
    response.save_to_file(f"part_{i:03d}.mp3")
Developer Features:
  • Manual Splitting: Full control over text chunking for advanced use cases
  • Word Preservation: Maintains word boundaries for natural speech
  • Separate Files: Each chunk saved as individual audio file
  • CLI Support: Use `--split-long-text` flag for command-line usage
Note: For web users, the auto-combine feature in `/v1/audio/speech` is recommended as it automatically handles long text and returns a single seamless audio file.

POST /api/generate-combined

Generate a single combined audio file from long text. Automatically splits text into chunks, generates speech for each chunk, and combines them into one seamless audio file.

Request Body:
{
  "text": "Very long text that exceeds the limit...",
  "voice": "alloy",
  "format": "mp3",
  "instructions": "Optional voice instructions",
  "max_length": 4096,
  "preserve_words": true
}
Response:

Returns a single audio file containing all chunks combined seamlessly.

Response Headers:
  • X-Chunks-Combined: Number of chunks that were combined
  • X-Original-Text-Length: Original text length in characters
  • X-Audio-Size: Final audio file size in bytes

POST /v1/audio/speech

Enhanced OpenAI-compatible endpoint with auto-combine feature. Automatically handles long text by splitting and combining audio chunks when needed.

Request Body:
{
  "model": "gpt-4o-mini-tts",
  "input": "Text of any length...",
  "voice": "alloy",
  "response_format": "mp3",
  "instructions": "Optional voice instructions",
  "speed": 1.0,
  "auto_combine": true,
  "max_length": 4096
}
Enhanced Parameters:
  • auto_combine (boolean, default: true):
    • true: Automatically split long text and combine audio chunks into a single file
    • false: Return error if text exceeds max_length (standard OpenAI behavior)
  • max_length (integer, default: 4096): Maximum characters per chunk when splitting
Response Headers:
  • X-Auto-Combine: Whether auto-combine was enabled (true/false)
  • X-Chunks-Combined: Number of audio chunks combined (1 for short text)
  • X-Original-Text-Length: Original text length (for long text processing)
  • X-Audio-Format: Audio format of the response
  • X-Audio-Size: Audio file size in bytes
docs.examples_title
# Short text (works normally)
curl -X POST http://nxdev-org-ttsfm.hf.space/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Hello world!",
    "voice": "alloy"
  }'

# Long text with auto-combine (default)
curl -X POST http://nxdev-org-ttsfm.hf.space/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Very long text...",
    "voice": "alloy",
    "auto_combine": true
  }'

# Long text without auto-combine (will error)
curl -X POST http://nxdev-org-ttsfm.hf.space/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini-tts",
    "input": "Very long text...",
    "voice": "alloy",
    "auto_combine": false
  }'
Audio Combination: Uses advanced audio processing (PyDub) when available, with intelligent fallbacks for different environments. Supports all audio formats.
Use Cases:
  • Long Articles: Convert blog posts or articles to single audio files
  • Audiobooks: Generate chapters as single audio files
  • Podcasts: Create podcast episodes from scripts
  • Educational Content: Convert learning materials to audio
Example Usage:
# Python example
import requests

response = requests.post(
    "http://nxdev-org-ttsfm.hf.space/api/generate-combined",
    json={
        "text": "Your very long text content here...",
        "voice": "nova",
        "format": "mp3",
        "max_length": 2000
    }
)

if response.status_code == 200:
    with open("combined_audio.mp3", "wb") as f:
        f.write(response.content)

    chunks = response.headers.get('X-Chunks-Combined')
    print(f"Combined {chunks} chunks into single file")