Hacker News with Generative AI: Image Processing

Depth Anything V2 (depth-anything-v2.github.io)
Depth Anything V2 is trained from 595K synthetic labeled images and 62M+ real unlabeled images, providing the most capable monocular depth estimation (MDE) model with the following features: more fine-grained details than Depth Anything V1, more robust than Depth Anything V1 and SD-based models (e.g., Marigold, Geowizard), more efficient (10x faster) and more lightweight than SD-based models, impressive fine-tuned performance with our pre-trained models. We also release six metric depth models of three scales for indoor and outdoor scenes, respectively.
What If Every Picture You've Ever Seen Already Exists? (ycombinator.com)
I was thinking recently about how images work at the data level, and it kind of broke my brain.
ClawPDF – Open-Source Virtual/Network PDF Printer with OCR and Image Support (github.com/clawsoftware)
ClawPDF may seem like yet another Virtual PDF/OCR/Image Printer, but it actually comes packed with features that are typically found in enterprise solutions.
Turning into Turing (2022) (jk-keller.com)
I stumbled into this one while working on a different project where I scripted the rotation of an image in preparation for an animation. When I looked at the last frame though, I noticed the image looked washed out and with odd patterning.
How to reverse engineer AI models: a study on Google Photos (skyld.io)
Google Photos is one of the most widely-used photo management applications globally, pre-installed on almost every Android device running Google Mobile Services (GMS). It is appreciated by users because it offers powerful features like “Magic Eraser” and advanced AI-powered photo editing tools. Of course, Google doesn’t open-source its AI models to keep its competitive edge.
Generative Modelling in Latent Space (sander.ai)
Most contemporary generative models of images, sound and video do not operate directly on pixels or waveforms. They consist of two stages: first, a compact, higher-level latent representation is extracted, and then an iterative generative process operates on this representation instead. How does this work, and why is this approach so popular?
Watermark segmentation (github.com/Diffusion-Dynamics)
This repository by Diffusion Dynamics, showcases the core technology behind the watermark segmentation capabilities of our first product, clear.photo. This work leverages insights from research on diffusion models for image restoration tasks.
Doing the Prospero-Challenge in RPython (pypy.org)
Recently I had a lot of fun playing with the Prospero Challenge by Matt Keeter. The challenge is to render a 1024x1024 image of a quote from The Tempest by Shakespeare. The input is a mathematical formula with 7866 operations, which is evaluated once per pixel.
Estimating Camera Motion from a Single Motion-Blurred Image (jerredchen.github.io)
Given a single motion-blurred image, we exploit the motion blur cues to predict the camera velocity at that instant without performing any deblurring.
Show HN: I built a tool to add noise texture to your images (vercel.app)
Drop or select a file
High-Performance PNG Decoding (blend2d.com)
It's been some time I have written about a High-Performance QOI Codec, which joined other codecs offered by Blend2D library in 2024. The development of image codecs continued and now I would like to announce a new high-performance PNG codec, which is much faster than other available codecs written in C, C++, and other programming languages.
StarVector: Generating Scalable Vector Graphics Code from Images and Text (starvector.github.io)
StarVector represents a breakthrough in Scalable Vector Graphics (SVG) generation, seamlessly integrating visual and textual inputs into a unified foundation SVG model.
Image Dithering: Eleven Algorithms and Source Code (tannerhelland.com)
Today’s graphics programming topic - dithering - is one I receive a lot of emails about, which some may find surprising.
Compression of Spectral Images Using Spectral JPEG XL (jcgt.org)
Paint.net 5.1.5 Is Now Available with JPEG XL Support (getpaint.net)
This update adds JPEG XL (*.jxl) support, improves quantization color quality, updates AVIF loading to better handle mapping HDR images to SDR, and fixes some bugs.
Arbitrary-Scale Super-Resolution with Neural Heat Fields (therasr.github.io)
Thera is the first arbitrary-scale super-resolution method with a built-in physical observation model.
Image Processing in C (2000) [pdf] (ed.ac.uk)
Fast-PNG: PNG image decoder and encoder (github.com/image-js)
PNG image decoder and encoder written entirely in JavaScript.
Dithering in Colour (obrhubr.org)
After reading a post on the HN frontpage from amanvir.com about dithering, I decided to join in on the fun. Here’s my attempt at implementing Atkinson dithering with support for colour palettes and correct linearisation.
Creating static map images with OpenStreetMap, Web Mercator, and Pillow (alexwlchan.net)
I’ve been working on a project where I need to plot points on a map. I don’t need an interactive or dynamic visualisation – just a static map with coloured dots for each coordinate.
Commodore 64 PETSCII Image (2022) (medium.com)
It is said that life begins at the age of 40 — I hope it is true … After 23 years of working in IT and spending the last 12 years in a company I co-founded, my contract was terminated, I lost my job and I suddenly found myself almost overnight in a situation where I have all day ahead with no schedule, no deadlines, no important phone calls, no one is waiting for my analysis, consultation or technical specification.
MS Paint IDE (ms-paint-i.de)
MS Paint IDE is a program that can read a normal image file saved with MS Paint, and can then translate it to text with the ability to highlight the text in the image, parse the code, compile and execute it.
Dramatically improve microscope resolution with Fourier Ptychography [video] (youtube.com)
Encrypt Images Without a Key Using Visual Cryptography (github.com/coduri)
VisualCrypto is an open-source Python-based toolkit with a web interface designed for Visual Secret Sharing (VSS), a cryptographic technique that splits a secret image into multiple shares.
Show HN: Automated Sorting of group photos by user defined N people in each pic (github.com/Karvy-Singh)
Sort photos based on the criteria of "Me with my favorite people (x, y, z...)" out of a bunch of group photos/random photos.
Bilinear down/upsampling, aligning pixel grids, and that infamous GPU half pixel (2021) (bartwronski.com)
See this ugly pixel shift when upsampling a downsampled image? My post describes where it can come from and how to avoid those!
Detecting edges of images at the speed of light (phys.org)
Physicists from the group of Jorik van de Groep at the UvA-Institute of Physics have devised a new method that can be used to detect edges of images in an extremely energy efficient and ultrafast way.
Subpixel Zoo: A Catalog of Subpixel Geometry (geometrian.com)
An image pixel might be a little square with a flat color[1], but to actually present the image on a physical display (or related technology), we illuminate discrete 'subpixels'. The perceived color in an area is then the addition of the subpixels' emissions in that area.
How hard would it be to display the contents of an image file on the screen? (nereid.pl)
How hard would it be to display the contents of an image file on the screen? You just load the image pixels somehow, perhaps using a readily available library, and then display those pixels on the screen. Easy, right? Well, not quite, as it turns out.
Deformable Image Registration KU Repository (github.com/ThomasAlscher1991)
The Deformable Image Registration KU repository contains software developed at the Department of Computer Science at the University of Copenhagen dealing with flow based image registration.