Erik Explores

Erik Explores

Share this post

Erik Explores
Erik Explores
Navigating the AI Jungle – Audio

Navigating the AI Jungle – Audio

A simple guide to AI tools for audio transcription, synthesis, and generation

Erik Engheim's avatar
Erik Engheim
Feb 07, 2025
∙ Paid

Share this post

Erik Explores
Erik Explores
Navigating the AI Jungle – Audio
1
Share

We are surrounded by AI tools and services today, and the audio space is no exception. There are various AI-powered tools available for transcribing speech into text, synthesizing audio from text, and even generating sound effects based on descriptions. Just as you can ask an AI to create an image from a text prompt, you can now describe a sound, and AI will generate it.

Navigating the AI Jungle - Chat Bots

Navigating the AI Jungle - Chat Bots

Erik Engheim
·
Feb 2
Read full story

It’s even possible to describe a style of music and provide lyrics to create complete songs. You can generate a synthetic voice based on your own or create a unique voice using just a textual description.

Google Cloud Console

Google Cloud

Google Cloud offers advanced speech synthesis capabilities through the Google Cloud Console, including support for Speech Synthesis Markup Language (SSML). SSML allows you to control pronunciation, tone, and emphasis using a markup language.

However, in my experience, Google’s interface is frustrating for end users. While it may be well-suited for developers, the platform is too clunky for casual users. Exploring different voices is cumbersome, and the sheer scale of Google’s AI tools can be overwhelming. This reminds me of my experience with Microsoft years ago—companies with too many products can end up being difficult to navigate.

Erik Explores is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

By contrast, OpenAI, which focuses solely on AI, offers a much more streamlined experience. You don’t get lost in unrelated services. If you are already using various Google Cloud services, you may find it convenient, but for most users, I’d recommend avoiding it. ElevenLabs, on the other hand, is far more user-friendly and focused exclusively on audio.

Elevenlabs used for synthesizing voices and sound effects

ElevenLabs

ElevenLabs provides a wide range of AI-powered audio tools, including:

• Text-to-speech synthesis – Choose from a large library of voices or create your own.

• Custom voice creation – Generate a unique voice by either uploading a sample of an existing voice or describing one for the AI to create.

• Audio transcription – Convert spoken words into text.

• Sound effect generation – Describe a sound, and AI will generate it.

• Voice changer – Modify an existing recording to sound like a different voice.

Keep reading with a 7-day free trial

Subscribe to Erik Explores to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Erik Engheim
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share