Salad Transcription API · Capability
Salad Speech-to-Text
Workflow capability for speech-to-text transcription using Salad's distributed GPU inference network. Supports multi-language transcription, speaker diarization, word-level timestamps, and SRT caption generation for audio and video content.
Run with Naftiko
Audio TranscriptionCaptionsDiarizationMedia ProcessingSaladSpeech RecognitionSubtitlesVideo Transcription
What You Can Do
POST
Transcribe media
— Submit a media URL for transcription with language and output options.
/v1/transcriptions
GET
Get transcript
— Retrieve the full transcript, segments, and optional SRT output.
/v1/transcriptions/{jobId}
MCP Tools
transcribe-audio-video
Submit an audio or video file URL to Salad for speech-to-text transcription. Supports 97 languages, speaker diarization, word-level timestamps, and SRT output. Returns a job ID to retrieve results.
get-transcription-result
Retrieve the completed transcription for a job by ID. Returns segments, word timestamps, speaker labels, and optional SRT caption content.
read-only
idempotent
APIs Used
salad-transcription