Hacker News with Generative AI: Parallel Computing

Comparing Parallel Functional Array Languages: Programming and Performance (arxiv.org)
Parallel functional array languages are an emerging class of programming languages that promise to combine low-effort parallel programming with good performance and performance portability.

Programming Languages, Parallel Computing, Performance, Functional Programming

91 points by vok 60 days ago | 23 comments

WebMonkeys: parallel GPU programming in JavaScript (2016) (github.com/VictorTaelin)
Allows you to spawn thousands of parallel tasks on the GPU with the simplest, dumbest API possible. It works on the browser (with browserify) and on Node.js. It is ES5-compatible and doesn't require any WebGL extension.

JavaScript, GPU Programming, Web Development, Parallel Computing

115 points by surprisetalk 72 days ago | 28 comments

Unlocking Ractors: Object_id (byroot.github.io)
In a previous post about ractors, I explained why I think it’s really unlikely you’d ever be able to run an entire application inside a ractor, but that they could still be situationally very useful to move CPU-bound work out of the main thread, and to unlock some parallel algorithm.

Programming, Parallel Computing, Concurrency, Performance Optimization, Software Design

78 points by ksec 79 days ago | 0 comments

Introduction to Parallel Computing Tutorial (llnl.gov)
In the simplest sense, parallel computing is the simultaneous use of multiple compute resources to solve a computational problem:

Parallel Computing, Tutorial, Computer Science

16 points by ibobev 93 days ago | 0 comments

An Educational Parallel Algorithm Collection (github.com/s-hironobu)
This is a parallel algorithm collection written in C. It contains fifteen programs that are explained in the book "The Art of Multiprocessor Programming (M. Herlihy, N. Shavit)".

Programming, C, Algorithms, Parallel Computing, Education

8 points by tanelpoder 96 days ago | 0 comments

Optimizing Matrix Multiplication (coffeebeforearch.github.io)
Matrix multiplication is an incredibly common operation across numerous domains. It is also known as being “embarrassingly parallel”. As such, one common optimization is parallelization across threads on a multi-core CPU or GPU. However, parallelization is not a panacea. Poorly parallelized code may provide minimal speedups (if any).

Optimization, Matrix Multiplication, Performance, Parallel Computing, Computer Science

3 points by jxmorris12 101 days ago | 0 comments

FFN Fusion: Rethinking Sequential Computation in Large Language Models (arxiv.org)
We introduce FFN Fusion, an architectural optimization technique that reduces sequential computation in large language models by identifying and exploiting natural opportunities for parallelization.

Generative AI, Machine Learning, Optimization, Parallel Computing

5 points by Dezash 110 days ago | 0 comments

I want a good parallel computer (raphlinus.github.io)
The GPU in your computer is about 10 to 100 times more powerful than the CPU, depending on workload. For real-time graphics rendering and machine learning, you are enjoying that power, and doing those workloads on a CPU is not viable. Why aren’t we exploiting that power for other workloads? What prevents a GPU from being a more general purpose computer?

GPU Computing, Parallel Computing, Hardware

233 points by raphlinus 116 days ago | 194 comments

Parallel Histogram Computation with CUDA (khushi-411.github.io)
The aim of the blog posts is to introduce a parallel histogram pattern, where each output element can be updated by any thread. Therefore, we should coordinate among threads as they update the output value. In this blog post, we will read the introduction about using atomic operations to serialize the updates of each element. Then, we will study an optimization technique: privatization. Let’s dig in!

CUDA, Parallel Computing, Optimization, Programming

8 points by coffeeaddict1 125 days ago | 0 comments

Sorting algorithms with CUDA (ashwanirathee.com)
Building on my previous post on sorting algorithms, I implemented the same algorithms using CUDA to explore performance improvements through parallel computing.

CUDA, Sorting Algorithms, Parallel Computing

150 points by ashwani-rathee 126 days ago | 41 comments

Speeding up computational lithography with the power and parallelism of GPUs (semiengineering.com)
A new lithography library brings mask optimization operations to GPUs.

Computational Lithography, GPUs, Parallel Computing, Semiconductor Engineering

57 points by PaulHoule 133 days ago | 1 comments

Taichi: High-Performance Parallel Programming in Python (taichi-lang.org)
Taichi is a domain-specific language embedded in Python that helps you easily write portable, high-performance parallel programs.

Programming Languages, Python, Parallel Computing

6 points by PaulHoule 135 days ago | 0 comments

3FS – a parallel file system from DeepSeek (twitter.com)
Something went wrong, but don’t fret — let’s give it another shot.

File Systems, Parallel Computing, DeepSeek

18 points by k_sze 138 days ago | 0 comments

DualPipe: Bidirectional pipeline parallelism algorithm (github.com/deepseek-ai)
DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.

Machine Learning, Algorithms, Parallel Computing

180 points by mfiguiere 139 days ago | 20 comments

DeepSeek Open Source Optimized Parallelism Strategies, 3 repos (github.com/deepseek-ai)
Here, we publicly share profiling data from our training and inference framework to help the community better understand the communication-computation overlap strategies and low-level implementation details.

Open Source, Deep Learning, Optimization, Parallel Computing, AI

103 points by helloericsf 139 days ago | 8 comments

TabulaROSA: Tabular OS Massively Parallel Heterogeneous Compute Engines (2018) (arxiv.org)
The rise in computing hardware choices is driving a reevaluation of operating systems.

Operating Systems, Parallel Computing, Hardware

5 points by teleforce 141 days ago | 0 comments

Visualizing 6D Mesh Parallelism (main-horse.github.io)
This is a companion longpost for a fun project I’ve yet to finish. In here, I show the reader how I personally visualize the collective communications involved in a simple 2⁶ 6D parallel mesh:

Visualization, Parallel Computing, 6D Mesh, Computer Graphics

59 points by lnyan 210 days ago | 3 comments

Programming Language Memory Models (2021) (swtch.com)
Programming language memory models answer the question of what behaviors parallel programs can rely on to share memory between their threads.

Programming Languages, Memory Management, Parallel Computing

140 points by fanf2 215 days ago | 29 comments

Amdahl's Law (wikipedia.org)
In computer architecture, Amdahl's law (or Amdahl's argument[1]) is a formula that shows how much faster a task can be completed when you add more resources to the system.

Computer Architecture, Performance Optimization, Parallel Computing

35 points by redbell 234 days ago | 4 comments

Show HN: Chili. Rust port of Spice, a low-overhead parallelization library (github.com/dragostis)
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Programming, Rust, Libraries, Open Source, Parallel Computing

233 points by dragostis 299 days ago | 42 comments

Richard Feynman and the Connection Machine (1989) (longnow.org)
One day when I was having lunch with Richard Feynman, I mentioned to him that I was planning to start a company to build a parallel computer with a million processors. His reaction was unequivocal, "That is positively the dopiest idea I ever heard."

Richard Feynman, Parallel Computing, History

177 points by jmstfv 312 days ago | 49 comments

I want a good parallel computer [video] (youtube.com)

Parallel Computing, Video, Hardware

15 points by raphlinus 351 days ago | 0 comments

Parallel Nix Evaluation (determinate.systems)

Parallel Computing, Nix, System Administration, Software Development

7 points by tripdout 382 days ago | 0 comments

Welcome to the Parallel Future of Computation (higherorderco.com)

Parallel Computing, Technology

168 points by Epholys 425 days ago | 6 comments

Bend: A Parallel Language (pages.dev)

Programming, Languages, Parallel Computing

11 points by fofoz 425 days ago | 1 comments

Bend: A Python-Like Parallel Language for GPUs and Multicore CPUs (higherorderco.com)

Programming, Parallel Computing, GPUs, CPUs, Python

12 points by davikr 426 days ago | 1 comments

Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x (hao-ai-lab.github.io)