The Creation of a Next-Generation Vocal Transformer: A Conversation with Minimal Audio

Minimal Audio has never been shy about bending sound design into new dimensions, and with Evoke, the company’s latest release, they’ve turned their gaze toward the human voice itself.

Launched last week, Evoke is a hypermodern vocal processor that fuses vocal modeling with spectral resynthesis, offering an entirely new way to transform, harmonize, and animate vocals in real time. It’s the kind of tool that’s equally intuitive and forward-thinking capable of generating entirely new voices from existing recordings while retaining the organic nuances of performance.

From instant retuning and multi-voice harmonization to an eight-slot modular effects rack and Minimal’s signature modulation system, Evoke serves as a studio powerhouse and a creative playground. Whether used to polish a pop topline or warp a vocal into an alien texture, it promises a new kind of workflow: the fun kind that invites you to play rather than tweak.

We sat down with Ben Wyss, Minimal Audio’s Head of Product Development, to talk about how this tool came together. Take a read on to learn about the creation of one of the coolest vocal transformation plugins on the market today.

Can you break down in simple terms how Evoke’s vocal resynthesis engine actually works under the hood?

Yeah, so the basic idea for the resynth engine is that it analyzes the input and extracts key sonic elements that we can manipulate before they are used to reconstruct the sound. There’s a lot of complexity under the hood, but I’ll try to break it down simply!

The first step is detecting the input’s pitch, which is used for pitch tracking, retuning, and harmonization. Then Evoke captures the dynamics and frequency spectrum using a special vocal-modeling process, which is somewhat similar to a traditional frequency analyzer but without an FFT. This allows you to shift and widen formants, for example. You can inject different timbres into the engine via the character settings, which can totally change the sound while still keeping the captured aspects of the input intact. Everything is then recombined in a way that’s akin to a regular synthesizer.

Of course, there are tons of interesting ways these elements can be manipulated, which were mainly what we explored after developing the core engine.

Evoke generates a completely new output signal rather than just processing the input – did that pose any major challenges during development?

There were two quite challenging aspects to developing the resynthesis engine. One was figuring out how to make the controls highly usable and impressive-sounding without losing the original sound. There was no real point of reference for this, either, since we really wanted to build something new while still feeling intuitive.

The other main challenge was developing and tuning all the different algorithms to produce a high-quality output, from stable pitch detection to preventing artifacts during resynthesis. Overall, I think it ended up sounding really good — way more “high-definition” than vocal effects like a traditional vocoder.

What are some unexpected textures or vocal transformations you’ve heard Evoke produce that surprised even the development team?

There is actually a whole submenu of texture Character Modes that really surprised us during development! My favorite one of these is the Liquid mode, which makes vocals (or anything else) sound like its made out of… liquid. Its a bit hard to describe with words, but its quite convincing and fun.

Hearing all the presets has been really surprising as well. There is an entire folder of factory presets called Transformations that gets much more experimental and can completely transform a vocal into a DnB bass, for example. Our preset designers always find ways to use our plugins that I never would have thought of.

How might Evoke interact with other sound sources beyond vocals — such as drums or synths?

There is actually a whole submenu of texture Character Modes that really surprised us during development! My favorite one of these is the Liquid mode, which makes vocals (or anything else) sound like it’s made out of… liquid. It’s a bit hard to describe with words, but it’s quite convincing and fun.

Hearing all the presets has been really surprising as well. There is an entire folder of factory presets called Transformations that gets pretty experimental and can completely transform a vocal into a DnB bass, for example. Our preset designers always find ways to use our plugins that I never would have thought of.

How does the engine handle extreme transformations — for example, taking a whisper and making it sound like a choir?

The first step is detecting the pitch, which may or may not be possible, since a whisper, for example, is mostly noise. Then the resynth engine generates the core sound using either a specially tuned synthesizer or a noise generator. Once that basic stuff is out of the way, there are many processes and options for developing the sound further. For a choir-like effect, I’d probably use the Choral mode, which has a very natural timbre with internal unison detuning. After that, I’d try adding additional harmonization voices to get a polyphonic effect.

What was the biggest obstacle in making a processor this advanced still feel approachable to beginners?

It felt like we had to walk a fine line between complexity and simplicity as well as making Evoke both new and familiar. There were a number of more advanced control that got removed or combined with other controls. A lot of this came down to deciding on which controls to focus on and the UI’s layout. We tried to focus on what made sense conceptually — for instance I think the Resolution control makes a ton of sense, but its actually controlling many things at once under the hood.

I’d like to make Evoke even more accessible in the future with a dedicated Play View similar to Current and Rift, as well as adding more advanced features.

It felt like we had to walk a fine line between complexity and simplicity, as well as making Evoke both new and familiar. There were a number of more advanced controls that got removed or combined with other controls. We tried to focus on what made sense conceptually — for instance, I think the Resolution control makes a ton of sense when you hear it, but it’s actually controlling many things at once under the hood.

I’d like to make Evoke even more accessible in the future with a dedicated Play View similar to Current and Rift, as well as adding more advanced features.

How do you see the resynthesis engine evolving in future versions of the plugin?

I think where Evoke goes next is mostly up to our users! I’m sure there are a lot of features and use cases that we didn’t think of, so we’ll be listening to feedback and hopefully incorporating them into future updates. There’s a dedicated feedback page on our website for this.

Something that I’d like to do is further flesh out the harmonizer voices so that you can control each individual voice in more ways. This would allow you to create more complex harmony voicings and make each voice sound different in new ways. It’s something we toyed with during development, but we decided it would add a lot of complexity (and CPU usage) to the plugin, so it made sense to stick with something straightforward and potentially add an option for it later.

Another pretty obvious one to me is adding more Character Modes in order to open up more timbral characters and textures. I think there’s a lot of opportunity here, but they are essentially full synth presets, so creating them takes some time. I know some users have wanted the ability to create their own modes, which would also be cool.

We just released Evoke 1.0, but we really like improving our plugins through major updates, so hopefully there will be a 2.0 update at some point!

Grab Evoke here.

The post The Creation of a Next-Generation Vocal Transformer: A Conversation with Minimal Audio appeared first on Magnetic Magazine.