privacydeep-dive

Your Voice Notes Are Not Private

AI transcription apps send your audio to cloud servers, extract biometric voice data, and some share it with dozens of advertising partners.

Voice notes feel intimate. You record something quickly — a thought while driving, a memo to yourself, a reminder you don’t want to forget — and the recording captures more than the words. It captures your voice as a biometric signature, the ambient sounds of your environment, and the emotional inflection of a private moment.

Most voice recording and transcription apps send that audio to cloud servers to process it. Some share the resulting data with advertising partners. And unlike a password you can change after a breach, your voice cannot be revoked once it’s been captured and modelled.


What AI Transcription Does With Your Audio

Modern voice-to-text transcription is accurate because it has been trained on billions of hours of real audio from real people. The apps you use for dictation, voice memos, and transcription contribute to that training — often without your knowledge.

The mechanics vary by service, but the typical flow is:

  • You record audio on your device
  • The audio file is uploaded to the app’s cloud servers
  • The app routes the audio to a transcription provider — Amazon Transcribe, Google Speech-to-Text, or similar
  • The transcription provider processes the audio and returns text
  • The text is returned to the app and displayed to you
  • The audio and/or text may be retained at each step, subject to the retention policies of each party involved

The app you’re using has a privacy policy. The transcription provider it routes your audio to has a separate privacy policy. If that provider further delegates to subprocessors, there are additional policies on top. Most users have no visibility into this chain.


Voice Is Biometric Data

The risk profile of voice data is different from most personal information, and the difference is significant.

Your voice contains biometric characteristics. Modern speaker recognition systems can identify individuals with high accuracy from short recordings — short enough to appear in a casual voice note. This biometric signature can be extracted by machine learning models when they process audio, often as a byproduct of transcription rather than an intentional collection decision.

Unlike a password, a credit card number, or a physical address, your voiceprint cannot be changed if it’s compromised. Once your voice has been captured and potentially shared, it’s part of a dataset that could be used to identify you in any future context where your voice is heard — surveillance systems, phone authentication, other platforms, or law enforcement applications.

In October 2025, Google agreed to a $1.375 billion settlement with Texas for unlawfully collecting biometric data, including voiceprints, across its services. The settlement — the largest biometric privacy settlement in US history at the time — concerned data collected over an extended period. It represents what happens when biometric collection is caught and prosecuted. For every enforcement action of this kind, there are many more instances of voice data collection that proceed without scrutiny.


The 41 Advertising Partners Problem

A 2025 study from the University of Washington found that Amazon shares Alexa voice interaction data with up to 41 advertising partners. More than 70% of the privacy policies examined in that research did not mention this data sharing in accessible language.

This research examined smart home devices, not standard voice note apps — but the infrastructure is related. Voice interaction data flowing to Alexa is processed by Amazon. Voice data flowing through apps that use Amazon Transcribe goes to the same company. The downstream uses of that data, and who Amazon shares it with under its own terms, are governed by Amazon’s privacy policies rather than the app’s.

For users who dictate personal notes, business memos, or anything that could be used for targeting, the question of where voice data ends up after the app processes it is not a hypothetical concern. It’s a description of how the voice processing ecosystem currently works.


What “Offline” Processing Actually Means

Some apps advertise offline or on-device transcription as a privacy feature. This is worth understanding precisely.

Apple’s Voice Memos app can transcribe audio on-device on supported iPhone models. Google’s Pixel Recorder similarly offers on-device transcription on Pixel phones. If transcription genuinely happens on the device, the audio doesn’t need to leave it — which is a real privacy improvement.

But “on-device transcription” is a narrower claim than it appears:

Sync still uploads. An app can transcribe locally but then sync both the audio and the transcript to cloud servers for access across devices. Local transcription doesn’t mean local storage.

Only some features are on-device. An app might offer on-device transcription but use cloud services for search, AI insights, organisation features, or backup. The transcription step is on-device; everything around it isn’t.

On-device models are limited. On-device transcription models are smaller and less accurate than cloud models. Many apps offer on-device as an option but default to cloud for better results.

The privacy-relevant question is not just whether transcription happens on-device, but what happens to the audio and transcript afterwards. An app that transcribes locally but syncs everything to the cloud has moved the privacy problem downstream rather than solving it.

Genuinely private voice note storage means audio stays on your device or in storage you control, on-device transcription is used where possible, and no copy of audio or transcript goes to a third party without your explicit understanding.


The Emotional Content Problem

Text notes can be written carefully. Voice notes are usually not.

When you record a voice note, you’re typically speaking in real time — before you’ve edited your thoughts, while you’re in the middle of an emotion, without the deliberateness that written communication imposes. Voice recordings capture hesitation, emotional inflection, things you say and then try to correct. They’re more revealing than almost any other form of personal record.

This matters because the sensitivity of the underlying content isn’t just about the words being said. It’s about what the voice itself reveals: your emotional state when you recorded it, whether you sound stressed or calm, the background audio that places you at a location.

A voice note you recorded during a difficult period of your life is a different category of personal data than a document or a photo. It contains information that the speaker didn’t consciously choose to include — and that information goes wherever the audio goes.


The Regulatory Gap

Biometric data is increasingly regulated, but the regulation hasn’t fully caught up with how voice data is collected and used in consumer apps.

In the US, the Illinois Biometric Information Privacy Act (BIPA) regulates the collection of biometric identifiers — including voiceprints — with requirements for written consent, data retention policies, and restrictions on sale. Illinois courts have found repeatedly that violations of BIPA result in statutory damages. Over 100 new BIPA class-action lawsuits were filed in 2025 alone.

But BIPA applies only in Illinois. Most US states don’t have equivalent protections. At the federal level, there’s no biometric privacy law comparable to what Illinois provides. Many voice recording apps operate with no specific legal obligation to tell you what they’re doing with your voiceprint.

Internationally, GDPR treats biometric data as a special category requiring explicit consent — but enforcement is inconsistent, and apps available globally are often designed to the lowest applicable standard.


What Genuine Voice Privacy Looks Like

If voice privacy is a priority, these are the properties to look for in any voice recording or transcription service:

On-device transcription with no audio upload. The audio should never leave the device. Some apps and operating system features make this possible; it requires actively choosing them over defaults.

Explicit disclosure of transcription providers. If audio leaves the device at any point, the privacy policy should name every provider it goes to and what those providers do with it. “We may use third-party service providers” is insufficient.

No indefinite retention of raw audio. Audio used for transcription should be deleted after processing, not retained. A policy should state specifically how long audio is kept and under what circumstances it can be deleted.

No biometric extraction or speaker identification. The service should not extract voiceprint data as a byproduct of transcription. This is harder to verify but should be stated in the privacy policy.

Encryption of stored recordings. If voice notes are backed up or synced, they should receive the same encryption treatment as other sensitive files.


How daftei Handles Voice Notes

Voice notes in daftei are treated as personal memory files — the same category as photos, documents, and other private records. They’re encrypted in transit with TLS 1.3 and at rest with AES-256. daftei doesn’t extract biometric data from voice recordings, doesn’t share audio with third-party AI training pipelines, and doesn’t run advertising.

The business model doesn’t create incentives to derive additional value from your audio. A voice note you store in daftei is a file — one you can retrieve, play back, and permanently delete. When you delete content in daftei, the 30-day grace window applies before permanent irreversible erasure, after which the file is gone.

This is server-side encryption rather than zero-knowledge encryption where daftei couldn’t technically access your audio. But it’s a meaningfully different starting point than services whose revenue depends on knowing more about their users.


The Defaults Are Set Against You

The voice note privacy problem is, at its core, a defaults problem. Most voice recording apps are designed for convenience — cloud sync on by default, AI features on by default, minimal friction for sharing. Privacy requires opting out of multiple defaults that most users don’t know exist.

The result is that billions of hours of personal audio — recorded privately, in moments of reflection, in professional contexts, during difficult personal periods — are flowing through cloud infrastructure that most users have no clear picture of.

Amazon, in 2025, eliminated the option to process Echo recordings locally. The choice was made by the company, not the user. Voice data that could have stayed on-device now travels to Amazon’s cloud by design.

Until voice data is treated with the same regulatory seriousness as fingerprints and facial recognition — which are increasingly regulated in the US and EU — the defaults will continue to favour extraction over protection.

For now, the relevant questions are: does your voice note app transcribe on-device or in the cloud? Does it name the transcription providers it uses? Does it tell you what happens to the audio after transcription is complete? If the answers are unclear, that’s a deliberate choice by the service, not an oversight.

Your memories deserve better than an ad platform.

Try daftei free →
← All posts