PDF OCR FAQ
- Why another OCR option? Why like this?
- Why do I need an API key? How do I get one?
- How much will it cost? (cost calculator)
- Is this safe? Can I trust you/the provider?
- How big can the input be?
- Why are some characters wrong, words misplaced, etc.? Why are lines returned as-is, e.g., with hyphenation? Will it work with complex page formats?
- Side-by-side provider comparison
Why another OCR option? Why like this?
This interface provides access to cloud OCR services without requiring coding or cloud-storage setup. Both supported providers accept multi-page PDFs and deliver high accuracy on Indic scripts. Unlike Google Drive OCR, neither is confused by a likely low-quality text layer potentially already embedded in the file. More background here.
Two providers are currently supported:
- Google Cloud Vision — mature, widely used, strong general OCR. Requires a Google Cloud account with billing enabled. Setup is involved but only needs to be done once.
- Sarvam Vision — new Indian AI focused on Indic languages, with impressive performance on Sanskrit. Simpler to set up: just create an account at sarvam.ai and generate a key.
Why do I need an API key? How do I get one?
Both providers are paid services that bill based on usage. Each user supplies their own API key — a password-like string linked to your own account — so you pay directly and no one else's usage is charged to you. You can create or delete a key at any time.
1a. Google Cloud Vision — video walkthrough
1b. Google Cloud Vision — written walkthrough
-
Go to the Google Cloud Console:
While logged into your Google account, visit the Google Cloud Console. If it's your first time, you may need to click through some initial setup prompts. -
Enable billing:
Walk through the steps at console.cloud.google.com/billing to enable billing. You'll need a credit card. -
Select a project:
Either create a new project (click New Project in the top navigation dropdown), or, if this is your first time, just use the default ("My First Project"). -
Enable the Cloud Vision API:
In the left sidebar, go to APIs & Services → Library. Search for "Cloud Vision API" (it has a blue diamond logo). Click it, then click the blue Enable button. -
Create an API key:
Go to APIs & Services → Credentials. Click + Create Credentials and select API key.
A long string will appear. This is your API key. Copy it, store it securely, and treat it like a financial password. With this key, anyone can charge OCR processing (or other services, if you don't restrict the key) to your account. -
(Optional) Restrict the key:
In the API key management screen, click the three dots next to your key, and choose Edit. Click Restrict key, and from the dropdown, select Cloud Vision API. Don't forget to click Save. -
Use the key:
Return to the OCR page, select Google Cloud Vision, and paste your key into the Google Cloud API key field.
2. Sarvam Vision — written walkthrough
-
Create an account:
Go to sarvam.ai and sign up. -
Add a payment method:
Navigate to your account billing settings and add a credit card. -
Generate an API key:
In your dashboard, find the API keys section and create a new key. Copy it and store it securely — treat it like a financial password. -
Use the key:
Return to the OCR page, select Sarvam Vision, and paste your key into the Sarvam API key field.
How much will it cost?
Google Cloud Vision: (= ), and the first 1,000 pages per month are free. New accounts come with a $300 credit valid for 90 days. For light personal use you likely won't exceed the free allowance. Tracking your usage in real time is possible, if a bit tricky (this Reddit post has tips).
Note: Google Cloud Vision uses standard Google Cloud billing postpay, where you're charged at the end of the month. This is separate from the Gemini API, which as of March 2026 requires prepay credits for lower-tier developer accounts. Don't confuse the two: Cloud Vision billing is set up through Google Cloud Console, not AI Studio.
Sarvam Vision: loading exchange rate… (= ). New accounts receive some free credits. In addition to the API (e.g. via Skrutable), you can also use the Sarvam Playground directly — this does not require an API key but does limit each job to 5 pages.
Note: Sarvam billing works the other way around, through prepay. That is, you must add credits ahead of time via the Sarvam billing page, either as one-off top-ups or threshold-based auto-recharge. Be prepared to pay in INR (₹), i.e., pre-authorize with your card company if needed.
Cost calculator
After each OCR job, the page count and estimated cost are also shown below the result.
Is this safe? Can I trust you/the provider?
Your API key is never stored by this application. The open-source code reads the key from the HTML form, sends it with the OCR request to the provider, and discards it. Store your key with a password manager (e.g., 1Password) to keep it secure. If you have concerns about PDF contents, avoid uploading sensitive material — it is sent to a third-party cloud service.
How big can the input be?
Files up to ~128 MB are supported. Split larger files into parts. Google Cloud Vision handles up to 2,000 pages natively, so you'll almost always hit the file size cap before the page cap. Sarvam Vision's API accepts only 10 pages at a time, but this interface handles splitting and reassembly automatically, so larger files are fine.
Why are some characters wrong, words misplaced, etc.? Why are lines returned as-is, e.g., with hyphenation? Will it work with complex page formats?
All OCR results contain some errors — please review and clean up as needed, e.g., using regular expressions. Results with these services are also quite literal: line breaks and hyphens come through as-is. For complex layouts (e.g. columns) and unwanted material (e.g. footnotes), manually cropping pages is often best. Sarvam Vision is designed with Indic script layouts in mind and appears to do better with multi-column Sanskrit pages.
Side-by-side provider comparison
The following pages are from Bāṇa's Kādambarī (Peterson 1885). Left and right panels always show Google Cloud Vision and Sarvam Vision output respectively. The middle panel toggles between the original scan, and diff views (vs. ground truth) for each provider — errors highlighted in yellow by Meld . CER is computed on body text only (headers, margin numbers, and page numbers excluded).
| Google Cloud Vision | Ground Truth | Sarvam Vision |
|---|---|---|
| preserved (3/3) | Page header (कादम्बरी ।) |
not preserved (0/3) |
| not preserved (0/3) | Page numbers (५ ६ ७) |
not preserved (0/3) |
| preserved in-place (12/12) | Margin line numbers (5, 10, 15, 20) |
some preserved in-place (5.5/12), some duplicated at page end (4) |
| all preserved | Line-end hyphens | all preserved |
| most read as pipe | (10/14), some preserved (4/14) |
Daṇḍas (। ॥) | all preserved (14/14) |
| 45 spaces, 1 newline | Spurious whitespace | none |
| 2.85% | Character error rate (body text only, daṇḍas normalized) |
0.90% |
| $0.0045 | Cost (3 pages) | $0.018 (₹1.50) |