Best Practices for Integrating Async TTS APIs Like Genny in Mobile Apps?

Hi everyone,

I’ve been experimenting with the Genny (LOVO AI) API to integrate AI-generated voice into mobile and web applications, and I’d love to learn how others are handling this in production.

The async workflow for TTS generation seems powerful, especially for use cases like:

AI-driven learning platforms
Automated voiceovers for content
Voice-enabled user interactions

One challenge I’m currently facing is balancing latency vs quality, particularly when trying to deliver near real-time responses.

For mobile apps (especially iOS), I’m exploring a few approaches:

Polling vs webhook-based job completion handling
Streaming audio vs pre-generated file playback
Managing background tasks efficiently

Questions:

How are you handling async TTS workflows in production apps?
Do you cache generated audio or generate it on demand?
Any tips for improving response time without sacrificing quality?

Would really appreciate any insights or real-world experiences.

Thanks!