Navigating the AI Jungle – Audio

A simple guide to AI tools for audio transcription, synthesis, and generation

Feb 07, 2025

∙ Paid

We are surrounded by AI tools and services today, and the audio space is no exception. There are various AI-powered tools available for transcribing speech into text, synthesizing audio from text, and even generating sound effects based on descriptions. Just as you can ask an AI to create an image from a text prompt, you can now describe a sound, and AI will generate it.

Navigating the AI Jungle - Chat Bots

Erik Engheim

Feb 2

Read full story

It’s even possible to describe a style of music and provide lyrics to create complete songs. You can generate a synthetic voice based on your own or create a unique voice using just a textual description.

Google Cloud

Google Cloud offers advanced speech synthesis capabilities through the Google Cloud Console, including support for Speech Synthesis Markup Language (SSML). SSML allows you to control pronunciation, tone, and emphasis using a markup language.

However, in my experience, Google’s interface is frustrating for end users. While it may be well-suited for developers, the platform is too clunky for casual users. Exploring different voices is cumbersome, and the sheer scale of Google’s AI tools can be overwhelming. This reminds me of my experience with Microsoft years ago—companies with too many products can end up being difficult to navigate.

By contrast, OpenAI, which focuses solely on AI, offers a much more streamlined experience. You don’t get lost in unrelated services. If you are already using various Google Cloud services, you may find it convenient, but for most users, I’d recommend avoiding it. ElevenLabs, on the other hand, is far more user-friendly and focused exclusively on audio.

ElevenLabs

ElevenLabs provides a wide range of AI-powered audio tools, including:

• Text-to-speech synthesis – Choose from a large library of voices or create your own.

• Custom voice creation – Generate a unique voice by either uploading a sample of an existing voice or describing one for the AI to create.

• Audio transcription – Convert spoken words into text.

• Sound effect generation – Describe a sound, and AI will generate it.

• Voice changer – Modify an existing recording to sound like a different voice.

Keep reading with a 7-day free trial

Subscribe to Erik Explores to keep reading this post and get 7 days of free access to the full post archives.