Hacker News with Generative AI: Video Processing

Generative Modelling in Latent Space (sander.ai)
Most contemporary generative models of images, sound and video do not operate directly on pixels or waveforms. They consist of two stages: first, a compact, higher-level latent representation is extracted, and then an iterative generative process operates on this representation instead. How does this work, and why is this approach so popular?

Generative AI, Machine Learning, Image Processing, Sound Processing, Video Processing

15 points by xavriley 95 days ago | 0 comments

Applying Pandas' Group_by on Videos (mixpeek.com)

Data Analysis, Pandas, Video Processing

8 points by Beefin 103 days ago | 0 comments

Bring Silent Videos to Life Sounds(Open-Source) (github.com/open-mmlab)
FoleyCrafter is a video-to-audio generation framework which can produce realistic sound effects semantically relevant and synchronized with videos.

Open Source, Audio Generation, Video Processing, Machine Learning

14 points by BruceWok 141 days ago | 4 comments

Benchmarking vision-language models on OCR in dynamic video environments (arxiv.org)
This paper introduces an open-source benchmark for evaluating Vision-Language Models (VLMs) on Optical Character Recognition (OCR) tasks in dynamic video environments.

Computer Vision, OCR, Video Processing, Benchmarking, Open Source

142 points by ashu_trv 155 days ago | 58 comments

Airflow – Stream media files directly from macOS to AirPlay devices (airflow.app)
Airflow is different We're not cutting any corners. This is not yet another FFmpeg wrapper like you might have seen elsewhere. Don't get us wrong, we love FFmpeg and use many of its parts under the hood, but our custom built video processing pipeline goes way beyond wrapping FFmpeg and calling it a day. We've been working on it for years it and it lets us do things that other similar software simply can't.

macOS, Streaming, Video Processing, AirPlay, Media

222 points by tiagod 170 days ago | 79 comments

FastVideo: a lightweight framework for accelerating large video diffusion models (github.com/hao-ai-lab)
FastVideo is a lightweight framework for accelerating large video diffusion models.

Deep Learning, Video Processing, Computer Vision, Generative AI, Software

110 points by zhisbug 213 days ago | 24 comments

Meta's new Video Understanding Multimodal Model used Qwen model for training (arxiv.org)
Despite the rapid integration of video perception capabilities into Large Multimodal Models (LMMs), the underlying mechanisms driving their video understanding remain poorly understood.

Meta, Artificial Intelligence, Video Processing, Computer Vision

7 points by BUFU 214 days ago | 1 comments

Representing Long Volumetric Video with Temporal Gaussian Hierarchy (zju3dv.github.io)
This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos.

Computer Vision, Video Processing, 3D Reconstruction

7 points by pr337h4m 216 days ago | 0 comments

Don't Look Twice: Faster Video Transformers with Run-Length Tokenization (rccchoudhury.github.io)
We present Run-Length Tokenization (RLT), a simple and efficient approach to speed up video transformers by removing redundant tokens from the input.

Computer Vision, Machine Learning, Video Processing, Deep Learning, Transformers

75 points by jasondavies 245 days ago | 16 comments

Generating high-quality thumbnails from videos (apple.com)

Video Processing, Image Generation, Machine Learning

51 points by Austin_Conlon 250 days ago | 11 comments

Show HN: FFmpeg-over-IP – Connect to remote FFmpeg servers (github.com/steelbrain)
Connect to remote ffmpeg servers. Are you tired of unsuccessfully trying to pass your GPU through to a docker container running in a VM? So was I! ffmpeg-over-ip allows you to run an ffmpeg server on a machine with access to a GPU (Linux, Windows, or Mac) and connect to it from a remote machine. The only thing you need is Node.js installed and a shared filesystem (could be NFS, SMB, etc.) between the two machines.

Software, Video Processing, Remote Access, Docker

147 points by steelbrain 288 days ago | 72 comments

Video segmentation with Segment Anything 2 (SAM2) (roboflow.com)

Computer Vision, Video Processing, Generative AI, Machine Learning

32 points by SkalskiP 351 days ago | 3 comments

StreamPot: Run FFmpeg as an API with fluent-FFmpeg compatibility, queues and S3 (github.com/StreamPot)

API, Video Processing, FFmpeg, Cloud Storage, Software

218 points by thunderbong 356 days ago | 36 comments

Texture Enhancement for Video Super-Resolution (github.com/DachunKai)

Computer Vision, Video Processing, Deep Learning

179 points by smusamashah 388 days ago | 31 comments

The challenge of writing a on-demand transcoder (zoriya.dev)

Software Development, Video Processing, Technical Challenges

50 points by zoriya 400 days ago | 10 comments

Show HN: ffmpeg-english "capture from /dev/video0 every 1 second to jpg files" (github.com/dheera)