Home / Types of AI / Speech AI and voice cloning

Speech AI and voice cloning — consent failures and biometric governance gaps

Voice recognition, text-to-speech, and voice cloning — where biometric data collection and fraud enablement risks are most acute.


AGC
AI Generated, Human Reviewed

Speech AI encompasses voice recognition (speech-to-text), text-to-speech synthesis, voice cloning tools, and voice-enabled assistants. The category spans widely deployed consumer tools — Siri, Alexa, Google Assistant — and specialised platforms for voice cloning, dubbing, and synthetic voice generation.

The governance concerns centre on two distinct risk areas: biometric data collection — voice is a biometric identifier, and its collection, storage, and use is subject to heightened data protection requirements — and voice cloning without consent, which enables a range of documented harms from financial fraud to reputational damage and political manipulation.


Voice cloning for fraud

Cloned voices used in impersonation scams. Vishing attacks using AI-cloned voices of executives, family members, and public figures are a documented and growing financial crime category.

Biometric data handling

Voice data collected without adequate consent. Always-on voice assistants collect ambient audio. Data retention, sharing with third parties, and use for training are inconsistently disclosed.

Consent-free voice replication

Voices cloned from minimal samples without consent. Professional voice actors and public figures have had their voices replicated for commercial use without consent or compensation.

Accessibility data risk

Voice data disproportionately from assistive technology users. Voice recognition tools used as assistive technology collect detailed speech patterns from users who may have limited ability to meaningfully consent or opt out.


EU AI Act classification: Voice data constitutes biometric data under GDPR, requiring explicit consent for processing. Real-time biometric identification systems are prohibited in public spaces with limited exceptions (Article 5). AI-generated voice content of real people — particularly in political contexts — is subject to deepfake labelling requirements under Article 50. Emotion recognition from voice in workplace and educational contexts is restricted.


QUESTIONS

What is speech AI?

Speech AI refers to systems that recognise, process, or synthesise human speech. This includes voice assistants, transcription tools, text-to-speech engines, and voice cloning platforms. Voice is classified as biometric data under GDPR.

What is voice cloning and why is it a risk?

Voice cloning is the AI replication of a person’s voice from audio samples — sometimes just a few seconds of recording. Documented risks include fraud (impersonating executives or family members), reputational damage, and the replication of professional voices for commercial use without consent.

Is voice data protected under GDPR?

Yes. Voice is classified as biometric data under GDPR Article 9, requiring explicit consent for processing in most contexts. Ambient audio collection by always-on assistants raises specific compliance questions around the validity of consent and the scope of data retained.