It’s a habit that’s crept into daily life without much scrutiny: you have a PDF, a scanned form, or a photo of a document, and instead of reading the whole thing, you drop it into ChatGPT or another AI chatbot and ask for a summary, a translation, or help filling it out. It’s fast, and for a grocery receipt or a recipe, it’s genuinely harmless. For a passport scan, a tax form, or a medical record, it’s worth pausing on what actually happens to that file once you hit upload.
Security researchers have been tracking this behavior at scale, and the numbers are large enough to take seriously. Enterprise security firm Zscaler reported that data loss prevention systems flagged over 410 million policy violations tied specifically to ChatGPT usage in 2025 — a 99% year-over-year increase — covering financial records, personally identifiable information, health data, and other regulated content being sent to the tool. That’s the corporate version of the same habit playing out at home, just with a system in place to notice it.
What Happens to a File You Upload
When you upload a document or photo to a general-purpose AI chatbot, a few things are true regardless of which provider you’re using:
The file is processed on the company’s servers, not your device. Unlike a local app that scans a document offline, a cloud-based chatbot sends your file to its infrastructure to extract text, interpret images, and generate a response. That’s the entire mechanism that makes the feature work — there’s no way around the upload happening.
It may be retained beyond the conversation. Most major AI providers retain conversation data, including uploaded files, for some period by default, used for abuse monitoring, service improvement, or — depending on your account settings — model training. Consumer-tier accounts on most platforms default to broader data use than paid or enterprise tiers, which often include training opt-outs as a selling point.
It’s only as protected as your account. A document uploaded to a chatbot lives inside your account history there. If that account is compromised — a reused password, a phishing attempt, a stolen session token — whatever you’ve uploaded is exposed along with it, the same as any other stored data.
It may be read by human reviewers. Most providers reserve the right to have a portion of conversations reviewed by humans or contractors for safety and quality purposes, typically anonymized but not always perfectly so, especially when a document contains directly identifying information like a name, address, or ID number embedded in the image itself.
None of this makes AI chatbots uniquely dangerous — it’s a description of how cloud services in general work. The issue is that the convenience of “just paste it in” tends to bypass the moment of hesitation people apply to other sensitive uploads, like emailing a scan to a stranger.
The Documents Worth Treating Differently
Not every upload carries equal risk. A clear line exists between low-stakes and high-stakes content:
Low stakes: a recipe, a public article, a non-personal spreadsheet, a meme. Nothing in these reveals anything about you specifically, and there’s little practical downside if a copy persists somewhere.
High stakes: a passport, driver’s license, or national ID scan; a Social Security number or tax form; a medical record or prescription; a financial statement with account numbers; anything containing a minor’s information. These documents combine two properties that make them genuinely risky to hand to a third-party service casually — they’re hard or impossible to “reissue” if exposed, and they’re frequently the exact data set used in identity theft and fraud.
The test worth applying before any upload: if this document were leaked tomorrow, would it just be mildly embarrassing, or would it actually let someone open a credit line, file a fraudulent tax return, or impersonate me? If it’s the second, it doesn’t belong in a general-purpose chatbot’s upload field, regardless of how good the summary would be.
It’s Not Just ChatGPT
It’s worth being precise here: this isn’t a problem specific to one product. The same description applies to Google’s Gemini, Microsoft Copilot, Anthropic’s Claude, and any other general-purpose AI chatbot that accepts file or image uploads. ChatGPT shows up in the security research disproportionately because it has the largest consumer user base, not because its data practices are uniquely worse than the alternatives — Zscaler’s report also flagged Grammarly as an even larger destination for sensitive enterprise text by volume, simply because grammar-checking tools see almost everything someone types.
The provider-specific details (exact retention windows, whether training is opt-out or opt-in by default, whether a “memory” feature persists facts about you across separate conversations) vary and are worth checking in each tool’s settings. But the underlying shape of the risk — cloud processing, default retention, account-level exposure, possible human review — is common to the entire category, not a single-vendor problem. If you’ve already read about how AI chatbots retain cross-session memory of what you tell them, this is the document-specific version of the same broader pattern.
The Screenshot Version of the Same Problem
A specific habit worth flagging separately: screenshotting a sensitive document instead of uploading the original file doesn’t meaningfully change the risk. People often feel more comfortable pasting a screenshot of a bank statement or ID card into a chat than attaching the original PDF, as if a screenshot is somehow less “real” data. It isn’t — an AI chatbot’s image-reading capability extracts the same text and visual information from a screenshot that it would from the source document, and the image itself is uploaded and retained the same way a file attachment would be.
The same applies to photographing a physical document with your phone and uploading that photo directly into a chat, which is an increasingly common way people interact with paperwork — a landlord’s lease, a doctor’s handwritten note, a government form. The physical-to-digital step doesn’t add any privacy protection; it’s still a personal document landing on a third party’s servers.
What to Do Instead
Strip identifying details before you upload, when you can. If you want ChatGPT to explain a confusing clause in a tax form, you usually don’t need to upload the whole document — copy just the text of the clause in question. Most of the value of these tools comes from understanding language, not from needing the actual document image at all.
Use a provider’s enterprise or privacy-focused tier if you genuinely need document-level analysis regularly. Several AI providers offer tiers with explicit no-training guarantees and tighter retention policies — worth checking if you do this often enough that it matters.
Keep the actual sensitive documents somewhere narrower in scope. A passport scan, tax form, or medical record doesn’t need to live in a tool whose core purpose is processing language at scale across millions of unrelated conversations. It needs a place built to store and retrieve it — nothing more.
This is the same logic covered in scanning your tax forms and ID with a document app — the safest place for a sensitive scan is the narrowest one, not the most convenient one.
A Quick Test for Borderline Cases
Most documents aren’t as obviously high-stakes as a passport, which makes the decision genuinely harder in practice. A lease agreement, a pay stub, a school enrollment form, an insurance policy — these sit in a middle zone where the document contains real personal details but isn’t quite as catastrophic if exposed as a government ID or SSN.
For that middle zone, three questions help: Does this document contain a number that could be used to open an account or file a claim in my name (an SSN, a full account number, a policy number combined with personal details)? Does it identify a minor? Would I be comfortable if this exact file were forwarded to a stranger by mistake? A “yes” to any of these is a reasonable signal to redact, summarize manually, or skip the upload — a “no” across the board generally means the convenience is worth the minor exposure.
Where daftei Fits
daftei isn’t a chatbot, and it doesn’t try to read or interpret what you upload beyond letting you search and retrieve it. Files are encrypted with TLS 1.3 in transit and AES-256 at rest, daftei never trains third-party AI models on anything you store, never sells your data, and never shows ads. If you need an AI tool to actually summarize or explain a document, that’s a legitimate use case — just make that decision deliberately, document by document, rather than defaulting to “paste it in” for everything that crosses your screen.
The habit worth building isn’t “never use AI on personal documents.” It’s “know which documents are cheap to expose and which aren’t, and treat them differently” — the same instinct most people already apply to deciding what to email versus what to hand someone in person.
A Realistic Routine, Not a Rule You’ll Break in a Week
Outright bans on a genuinely useful tool rarely survive contact with real life — if “never upload documents to AI” is the rule, the more likely outcome is following it carefully for a week and then ignoring it the first time you’re in a hurry. A routine that survives is simpler: keep a mental (or literal) short list of document types that always get redacted or summarized manually rather than uploaded whole — ID scans, tax forms, medical records, anything with a full account or government ID number — and treat everything else as fine for normal use.
That short list is small enough to actually remember, which is the entire point. The goal isn’t maximal caution on every interaction with an AI tool; it’s making sure the handful of documents that would actually hurt you if exposed get a different, more deliberate path than the recipe you wanted reformatted or the email you wanted shortened.