appssemble

AI Engineering Services Blog Case Studies About Contact

Case Studies/AI Communication Platform

Real-time AI translation across every platform

We built the full stack for a communication platform that translates voice and video in real time. Voice cloning, AI agents, video file processing. Six clients (web, desktop, iOS, Android, Slack, Teams) all running against one real-time backend.

Start a project

<500msvoice translation latency

6platforms shipped

70+languages supported

What it does

The full system

Live Voice & Video Calls with Translation

Group audio and video calls where each person speaks their own language and hears everyone else in theirs. Speech recognition, translation, and synthesis all happen in the same pipeline. We got end-to-end latency under 500ms on production calls with speaker tracking across the full conversation.

WebRTCReal-time STTLive translationMulti-participant

Voice Cloning & Synthesis

Users record a short voice sample and the system clones it. When someone speaks English and another participant hears it in French, it still sounds like the original person. We built per-user and per-project voice profiles with async processing and automatic cleanup of stale samples.

Voice cloningTTS synthesisVoice profilesAsync processing

AI Agents in Calls

AI agents that join calls and conversations as regular participants. They listen, respond, and take actions based on configurable personalities and knowledge bases. For complex scenarios we built multi-agent handoff where one agent handles intake and another handles resolution. The system also tracks sentiment and mood across all participants in real time.

AI agentsMulti-agent handoffSentiment trackingConfigurable

Video File Translation

Upload a video in any language and get it back translated with the original speakers' cloned voices. The pipeline extracts audio, separates speakers via diarization, translates each track independently, synthesizes speech using the cloned voice, and reassembles the final video. Subtitle generation included. All of it runs on distributed Celery task queues.

Speaker diarizationVideo processingSubtitle generationDistributed tasks

Cross-Platform Messaging

Real-time chat with on-demand message translation, reactions, threading, and file sharing. The same feature set runs on web, Electron desktop, iOS, and Android. WebSocket channels handle instant delivery with read receipts and presence indicators. We also built quick chat rooms for throwaway conversations that don't require registration.

WebSocketCross-platformReal-timeQuick chat

Meetings & Conferences

Scheduled meetings with auto-generated summaries and full transcript export. For large events there is a conference mode with speaker roles, registration with approval workflows, live polling, and recording. Calendar integration pulls from Microsoft and Google so scheduling stays in sync.

AI summariesConference modePollingCalendar sync

Under the hood

What powers it

Ruby on Rails

The API backend. 107 models, 40+ async job types, authentication, payments, and the WebSocket broadcast layer via AnyCable.

Go WebSocket Server

A separate Go service we wrote specifically for the translation pipeline. Handles dictation, conversation translation, and voice resolution at the latency Rails could not.

Python ML Services

Agent framework, video translation pipeline, speaker diarization, and voice activity detection. Each service is containerized and deploys independently.

React & Electron

Web and desktop apps sharing one component library. The same codebase runs in the browser and as a native desktop application.

Native iOS & Android

Swift and native Android apps with full feature parity. Calls, chat, translation, voice cloning, and agent interactions all work on mobile.

LiveKit

Video and audio infrastructure for multi-participant calls with real-time processing and recording.

Process

Shape-up4-week cyclesDaily syncs

Technologies

Ruby on RailsGoPythonReactElectronSwiftKotlinPostgreSQLRedisLiveKitDocker

Takeaways

What we learned

Latency is the whole game

In real-time voice translation, anything over half a second breaks the conversation. Rails could not hit the target so we built a dedicated Go service for the translation pipeline. It was the right call. Pick the language that fits the constraint.

Six platforms, one truth

Web, desktop, iOS, Android, Slack, and Teams all needed the same features. We learned fast that the API contract is the actual product. Get it right and the clients are straightforward. Get it wrong and you end up maintaining six applications that slowly drift apart.

Models are the easy part

Wiring up a speech model takes a day. Making it work at scale with speaker diarization, voice cloning, failure recovery, and cost monitoring across 70+ languages is where the real engineering lives. The models are interchangeable. The system around them is what matters.

10M+events / day

Grovs

Open SourceAttributionDeep Linking

→500K+app sessions

Semaphr

SaaSiOS SDKAndroid SDK

→<30sdraft to e-Factura

Incasez

Invoicinge-FacturaSaaS

Let's talk about your project