Hacker News with Generative AI: Document Processing

Auntie PDF – an open source app built using Mistral OCR (auntiepdf.com)
Your all-knowing guide that unpacks every PDF into clear, actionable insights. Just like your favorite aunt, but for documents!
Show HN: Open-Source DocumentAI with Ollama (rlama.dev)
A powerful document question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems for all your document needs.
LyX is 30 years old (wikipedia.org)
LyX (styled as LYX; pronounced [ˈlɪks][3]) is an open source, graphical user interface document processor based on the LaTeX typesetting system.
The Modern Document Processing Stack (github.com/marcelmarais)
This is a production-ready document conversion and processing engine (and primarily a wrapper of other tools). It uses open-source libraries to convert common file formats (PDF, DOCX, etc.) and web content to Markdown—a format that is friendly for LLMs and embedding models.
Build Your Own AI-Powered Document Chatbot in Minutes with Simple RAG (ycombinator.com)
Build Your Own AI-Powered Document Chatbot in Minutes with Simple RAG!
Liberate tabular data from scanned documents (blog.wzb.eu)
During the last months I often had to deal with the problem of extracting tabular data from scanned documents.
From PDFs to AI-ready structured data: a deep dive (explosion.ai)
PDFs are ubiquitous in industry and daily life. Paper is scanned, documents are sent and received as PDF, and they’re often kept as the archival copy. Unfortunately, processing PDFs is hard. In this blog post, I’ll present a new modular workflow for converting PDFs and similar documents to structured data and show how to build end-to-end document understanding and information extraction pipelines for industry use cases.
Show HN: Documind – Open-source AI tool to turn documents into structured data (github.com/DocumindHQ)
Launch HN: Midship (YC S24) – Turn PDFs, docs, and images into usable data (ycombinator.com)
Hey HN, we are Max, Kieran, and Aahel from Midship (https://midship.ai). Midship makes it easy to extract data from unstructured documents like pdfs and images.
Docling parses documents and exports them to desired format with ease and speed (ds4sd.github.io)
Docling parses documents and exports them to the desired format with ease and speed.