Hacker News with Generative AI: OCR

Ask HN: What is the best method for turning a scanned book as a PDF into text? (ycombinator.com)
I like reading philosophy, particularly from the authors rather than a secondhand account.
Benchmarking vision-language models on OCR in dynamic video environments (arxiv.org)
This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments.
OCR Crypto Stealers in Google Play and App Store (securelist.com)
In March 2023, researchers at ESET discovered malware implants embedded into various messaging app mods. Some of these scanned users’ image galleries in search of crypto wallet access recovery phrases. The search employed an OCR model which selected images on the victim’s device to exfiltrate and send to the C2 server. The campaign, which targeted Android and Windows users, saw the malware spread through unofficial sources.
Liberate tabular data from scanned documents (blog.wzb.eu)
During the last months I often had to deal with the problem of extracting tabular data from scanned documents.
Show HN: Adventures in OCR (medusis.com)
This past few weeks I've been working on OCRing an ancient book: a late 19th century edition of 18th century memoirs, in French: Les Mémoires de Saint-Simon.
Show HN: High-accuracy OCR API for receipts/invoice with easy customisation (visionparser.com)
Welcome to the next level of document automation! Our innovative OCR API, powered by state-of-the-art Generative AI, gives you a flexible solution that fits your unique workflow and business requirements. Experience exceptional accuracy, speed, affordability and customisation.
Llama-OCR: Document to Markdown (llamaocr.com)
Upload an image to turn it into structured markdown
Show HN: I launched a super cheap and simple to use OCR tool for macOS (textcapture.app)
Ever tried to quickly copy and paste some text, only to realize it's unselectable or embedded in a video or image? That happens to me all the time!
Show HN: PDF to MD by LLMs – Extract Text/Tables/Image Descriptives by GPT4o (github.com/yigitkonur)
Swift OCR: LLM Powered Fast OCR ⚡
General OCR Theory: Towards OCR-2.0 via a Unified End-to-End Model (huggingface.co)
Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters.
Show HN: LLM-aided OCR – Correcting Tesseract OCR errors with LLMs (github.com/Dicklesworthstone)
Show HN: Zerox – Document OCR with GPT-mini (github.com/getomni-ai)
Ask HN: How to OCR a PDF and preserve whitespace? (ycombinator.com)
OCR Tools for Mac, iOS and Windows (rorybowcott.com)
Extracting Words from Scanned Books: A Step-by-Step Tutorial with Python, OpenCV (github.com/feitgemel)