Hacker News with Generative AI: OCR

Image to Text Converter (vheer.com)
Turn images into readable text with ease using Vheer’s AI-powered tool. Extract text from images in seconds and download it instantly. Perfect for business documents, handwritten notes, and more.

Image Processing, AI, OCR, Text Extraction, Business

14 points by vertex_steven 54 days ago | 4 comments

Show HN: GenAI-powered OCR API for PDF receipts/invoices with smart extraction (visionparser.com)
Welcome to the next level of document automation! Our innovative Receipt and Invoice Parsing API, powered by state-of-the-art Generative AI, gives you a flexible solution that extracts structured data from any receipt format. Experience exceptional accuracy, speed, affordability and customisation for receipt parsing.

Generative AI, OCR, API, Document Automation, Finance

10 points by daleef_rahman 56 days ago | 5 comments

Show HN: A highly extensible framework for building OCR systems (github.com/robbyzhaox)
MyOCR is a highly extensible and customizable framework for building OCR systems. Engineers can easily train, integrate deep learning models into custom OCR pipelines for real-world applications.

Open Source, Computer Vision, OCR, Deep Learning, Software

16 points by robbyzhao 61 days ago | 0 comments

ClawPDF – Open-Source Virtual/Network PDF Printer with OCR and Image Support (github.com/clawsoftware)
ClawPDF may seem like yet another Virtual PDF/OCR/Image Printer, but it actually comes packed with features that are typically found in enterprise solutions.

Open Source, PDF, OCR, Image Processing, Software

192 points by miles 61 days ago | 28 comments

Show HN: SnipFast – Extract Highlighted Text from Physical Books (snipfa.st)

Software, Tools, OCR, Text Extraction, Books

10 points by tomsaju 85 days ago | 0 comments

Show HN: OCR pipeline for ML training (tables, diagrams, math, multilingual) (github.com/ses4255)
This OCR system is specifically designed to extract structured data from complex educational materials—such as exam papers—in a format optimized for machine learning (ML) training.

Machine Learning, OCR, Education, Data Extraction, Open Source

170 points by ses425500000 105 days ago | 38 comments

Show HN: Docsumo's OCR Benchmark Report – Surpassing Mistral and Landing AI (docsumo.com)
In the past month, the AI community witnessed the launch of two much-anticipated OCR solutions—Mistral OCR by the Mistral team (known for their LLMs) and Agentic Document Extraction by Landing AI, Andrew Ng’s company. At Docsumo, we live and breathe Document AI. So when these releases hit the market, we couldn’t resist putting them to the test

OCR, AI, Benchmarking, Open Source, Document AI

3 points by snehanairdoc 107 days ago | 0 comments

Show HN: Qwen-2.5-32B is now the best open source OCR model (github.com/getomni-ai)
A benchmarking tool that compares OCR and data extraction capabilities of different large multimodal models such as gpt-4o, evaluating both text and json extraction accuracy. The goal of this benchmark is to publish a comprehensive benchmark of OCR accuracy across traditional OCR providers and multimodal Language Models. The evaluation dataset and methodologies are all Open Source, and we encourage expanding this benchmark to encompass any additional providers.

Open Source, OCR, AI, Benchmarking, Language Models

211 points by themanmaran 108 days ago | 47 comments

How do open source VLMs perform at OCR (getomni.ai)
For several months, we’ve been evaluating how well vision models handle OCR. Our initial benchmark focused on the closed-source models (GPT, Gemini, and Claude) and their comparisons to traditional OCR providers (AWS, Azure, GCP, etc.).

Open Source, Vision Models, OCR

4 points by tosh 112 days ago | 0 comments

Show HN: We OCR'ed 60k pages of the JFK files with AI (doctly.ai)

AI, OCR, History, Government Documents

11 points by kapitalx 122 days ago | 3 comments

ScanSearch Integrates Workflow and Full OCR Scan (scansearch.com)
Scanning documents with full-text OCR improves efficiency by making them searchable, allowing quick access to information. It enhances collaboration, ensures compliance, and strengthens security by safeguarding sensitive data, providing a significant edge for businesses.

OCR, Document Management, Workflow, Business Efficiency, Security

3 points by CDScanSearch 128 days ago | 0 comments

Auntie PDF – an open source app built using Mistral OCR (auntiepdf.com)
Your all-knowing guide that unpacks every PDF into clear, actionable insights. Just like your favorite aunt, but for documents!

Open Source, OCR, Document Processing, PDF, Artificial Intelligence

99 points by bilater 133 days ago | 40 comments

Evaluating Mistral OCR Against Gemini 2.0 Flash (reducto.ai)
Today, Mistral AI released a new OCR model, claiming to be state-of-the-art (SOTA) on unreleased benchmarks. We decided to put the model to the test.

OCR, Generative AI, AI Models, Benchmarking, Technology

15 points by raunakchowdhuri 134 days ago | 0 comments

Mistral OCR (mistral.ai)
Introducing the world’s best document understanding API.

OCR, API, Document Understanding

1756 points by littlemerman 134 days ago | 417 comments

Mistral OCR (mistral.ai)
Introducing the world’s best document understanding API.

OCR, API, Document Understanding

48 points by meetpateltech 134 days ago | 3 comments

I built an extension that lets you extract text from anywhere to your clipboard (chromewebstore.google.com)
Extract text from any part of your browser screen using Esticra OCR.

Chrome Extensions, OCR, Productivity, Text Extraction, Web Development

6 points by mosmn 136 days ago | 4 comments

Putting Andrew Ng's OCR models to the test (runpulse.com)
Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X.

Artificial Intelligence, Computer Vision, OCR, Andrew Ng

124 points by ritvikpandey21 141 days ago | 61 comments

OlmOCR: Open-source tool to extract plain text from PDFs (allenai.org)

Open Source, OCR, PDF, Text Extraction, Software

313 points by eamag 143 days ago | 42 comments

Ask HN: What is the best method for turning a scanned book as a PDF into text? (ycombinator.com)
I like reading philosophy, particularly from the authors rather than a secondhand account.

Text Extraction, OCR, Philosophy, Digital Books

206 points by resource_waste 155 days ago | 108 comments

Benchmarking vision-language models on OCR in dynamic video environments (arxiv.org)
This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments.

Computer Vision, OCR, Video Processing, Benchmarking, Open Source

142 points by ashu_trv 155 days ago | 58 comments

OCR Crypto Stealers in Google Play and App Store (securelist.com)
In March 2023, researchers at ESET discovered malware implants embedded into various messaging app mods. Some of these scanned users’ image galleries in search of crypto wallet access recovery phrases. The search employed an OCR model which selected images on the victim’s device to exfiltrate and send to the C2 server. The campaign, which targeted Android and Windows users, saw the malware spread through unofficial sources.

Mobile Security, Malware, Cryptocurrency, OCR, Android

35 points by shifty1 163 days ago | 5 comments

Liberate tabular data from scanned documents (blog.wzb.eu)
During the last months I often had to deal with the problem of extracting tabular data from scanned documents.

Data Extraction, OCR, Document Processing

6 points by leonry 212 days ago | 0 comments

Show HN: Adventures in OCR (medusis.com)
This past few weeks I've been working on OCRing an ancient book: a late 19th century edition of 18th century memoirs, in French: Les Mémoires de Saint-Simon.

OCR, Historical Research, Projects, Software

126 points by bambax 213 days ago | 45 comments

Show HN: High-accuracy OCR API for receipts/invoice with easy customisation (visionparser.com)
Welcome to the next level of document automation! Our innovative OCR API, powered by state-of-the-art Generative AI, gives you a flexible solution that fits your unique workflow and business requirements. Experience exceptional accuracy, speed, affordability and customisation.

OCR, Generative AI, Business, Automation, Software

21 points by salihkoodathil 217 days ago | 6 comments

Llama-OCR: Document to Markdown (llamaocr.com)
Upload an image to turn it into structured markdown

OCR, Markdown, AI, Tools

293 points by lapnect 245 days ago | 96 comments

Show HN: I launched a super cheap and simple to use OCR tool for macOS (textcapture.app)
Ever tried to quickly copy and paste some text, only to realize it's unselectable or embedded in a video or image? That happens to me all the time!

macOS, OCR, Software, Productivity Tools

24 points by auden_pierce 257 days ago | 41 comments

Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o (github.com/yigitkonur)
Swift OCR: LLM Powered Fast OCR ⚡

Open Source, Language Models, OCR, Text Extraction

191 points by yigitkonur35 300 days ago | 91 comments

General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model (huggingface.co)
Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters.

OCR, Computer Vision, Artificial Intelligence, Machine Learning

31 points by ac1spkrbox 310 days ago | 2 comments

Show HN: LLM-aided OCR – Correcting Tesseract OCR errors with LLMs (github.com/Dicklesworthstone)