Why can't I copy text from a PDF?+
The most common reasons: (1) The PDF is image-based — it was created by scanning a physical document, so the 'text' is pixel data in an image rather than machine-readable characters. Image-based PDFs require OCR to make their text selectable and copyable. (2) The PDF has permissions restrictions set by the creator that disable text selection in standard PDF readers. (3) The PDF uses custom font encoding that makes the text appear selectable but paste as garbled characters. Test which situation you're in: try Ctrl+F to search for a word you can see — if search finds it, the text is machine-readable. If search can't find visible text, the PDF is image-based.
What is OCR and do I need it to extract text from a scanned PDF?+
OCR (Optical Character Recognition) converts images of text — including scanned PDF pages — into machine-readable text by analyzing pixel patterns and identifying characters. You need OCR to extract text from scanned PDFs because they contain images of text, not actual text data. Without OCR, there's nothing to extract. With OCR, the recognized text can be extracted, searched, and copied. OCR accuracy depends on scan quality: 200–300 DPI resolution, clean ink, and straight page orientation produce the best results.
How do I extract text from a PDF without uploading it?+
Browser-based PDF text extraction tools process the file locally within your browser's JavaScript environment using the PDF's embedded text data. The PDF is read by the browser's File API into local memory, the text content is extracted from the PDF structure, and the result is offered as a plain text download — without transmitting the document to a server. You can verify this by opening DevTools (F12) → Network tab before loading your PDF: a local tool shows no outbound requests carrying your document content during processing.
Why does copied PDF text come out garbled or with symbols?+
Garbled text usually indicates a font encoding problem: the PDF uses a custom or subset font encoding where the stored character codes don't map to standard Unicode characters. The characters render correctly in the PDF viewer because the viewer uses the embedded font; but when copied, the character codes don't translate correctly to the pasted text. This is common in older PDFs and PDFs from certain authoring workflows. A dedicated PDF extraction tool that handles encoding more robustly than clipboard copy often produces cleaner output for these documents.
Can I extract text from a password-protected PDF?+
It depends on what type of password protection is applied. A document open password encrypts the file content — you need the password to open the file at all, before any extraction is possible. Without the password, the content is inaccessible. A permissions-only password sets restrictions on operations (copying, printing) in the PDF header, but the content itself isn't encrypted — standard PDF readers honor these restrictions, but the content is accessible to tools that access the PDF structure directly. If you own the document or have authorization to extract its content, using the password you have to open it first, then extracting, is the straightforward approach.
What's the best way to extract tables from a PDF?+
For tables in native-text PDFs, a specialized table extraction tool — one that detects row-column structure rather than just extracting text position by position — produces usable output. These tools extract table content to CSV, Excel, or structured text that preserves the row-column relationships. For tables in scanned PDFs, OCR must run first; after OCR, the recognized text can be processed by a table extractor. Plain text copy-paste from a table usually produces poorly ordered text that requires significant cleanup to use as structured data.