Hacker News with Generative AI: OCR

Show HN: Qwen-2.5-32B is now the best open source OCR model (github.com/getomni-ai)
A benchmarking tool that compares OCR and data extraction capabilities of different large multimodal models such as gpt-4o, evaluating both text and json extraction accuracy. The goal of this benchmark is to publish a comprehensive benchmark of OCR accuracy across traditional OCR providers and multimodal Language Models. The evaluation dataset and methodologies are all Open Source, and we encourage expanding this benchmark to encompass any additional providers.
How do open source VLMs perform at OCR (getomni.ai)
For several months, we’ve been evaluating how well vision models handle OCR. Our initial benchmark focused on the closed-source models (GPT, Gemini, and Claude) and their comparisons to traditional OCR providers (AWS, Azure, GCP, etc.).
Show HN: We OCR'ed 60k pages of the JFK files with AI (doctly.ai)
ScanSearch Integrates Workflow and Full OCR Scan (scansearch.com)
Scanning documents with full-text OCR improves efficiency by making them searchable, allowing quick access to information. It enhances collaboration, ensures compliance, and strengthens security by safeguarding sensitive data, providing a significant edge for businesses.
Auntie PDF – an open source app built using Mistral OCR (auntiepdf.com)
Your all-knowing guide that unpacks every PDF into clear, actionable insights. Just like your favorite aunt, but for documents!
Evaluating Mistral OCR Against Gemini 2.0 Flash (reducto.ai)
Today, Mistral AI released a new OCR model, claiming to be state-of-the-art (SOTA) on unreleased benchmarks. We decided to put the model to the test.
Mistral OCR (mistral.ai)
Introducing the world’s best document understanding API.
Mistral OCR (mistral.ai)
Introducing the world’s best document understanding API.
I built an extension that lets you extract text from anywhere to your clipboard (chromewebstore.google.com)
Extract text from any part of your browser screen using Esticra OCR.
Putting Andrew Ng's OCR models to the test (runpulse.com)
Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X.
OlmOCR: Open-source tool to extract plain text from PDFs (allenai.org)
Ask HN: What is the best method for turning a scanned book as a PDF into text? (ycombinator.com)
I like reading philosophy, particularly from the authors rather than a secondhand account.
Benchmarking vision-language models on OCR in dynamic video environments (arxiv.org)
This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments.
OCR Crypto Stealers in Google Play and App Store (securelist.com)
In March 2023, researchers at ESET discovered malware implants embedded into various messaging app mods. Some of these scanned users’ image galleries in search of crypto wallet access recovery phrases. The search employed an OCR model which selected images on the victim’s device to exfiltrate and send to the C2 server. The campaign, which targeted Android and Windows users, saw the malware spread through unofficial sources.
Liberate tabular data from scanned documents (blog.wzb.eu)
During the last months I often had to deal with the problem of extracting tabular data from scanned documents.
Show HN: Adventures in OCR (medusis.com)
This past few weeks I've been working on OCRing an ancient book: a late 19th century edition of 18th century memoirs, in French: Les Mémoires de Saint-Simon.
Show HN: High-accuracy OCR API for receipts/invoice with easy customisation (visionparser.com)
Welcome to the next level of document automation! Our innovative OCR API, powered by state-of-the-art Generative AI, gives you a flexible solution that fits your unique workflow and business requirements. Experience exceptional accuracy, speed, affordability and customisation.
Llama-OCR: Document to Markdown (llamaocr.com)
Upload an image to turn it into structured markdown
Show HN: I launched a super cheap and simple to use OCR tool for macOS (textcapture.app)
Ever tried to quickly copy and paste some text, only to realize it's unselectable or embedded in a video or image? That happens to me all the time!
Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o (github.com/yigitkonur)
Swift OCR: LLM Powered Fast OCR ⚡
General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model (huggingface.co)
Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters.
Show HN: LLM-aided OCR – Correcting Tesseract OCR errors with LLMs (github.com/Dicklesworthstone)
Show HN: Zerox – Document OCR with GPT-mini (github.com/getomni-ai)
Ask HN: How to OCR a PDF and preserve whitespace? (ycombinator.com)
OCR Tools for Mac, iOS and Windows (rorybowcott.com)
Extracting Words from Scanned Books: A Step-by-Step Tutorial with Python, OpenCV (github.com/feitgemel)