Hacker News with Generative AI: Monitoring

Show HN: rtcollector - A modular, RedisTimeSeries-native observability agent (github.com/xe-nvdk)
rtcollector is a lightweight, plugin-based agent for collecting system and application metrics, and pushing them to RedisTimeSeries.
Monitoring Node.js: Key Metrics You Should Track (last9.io)
Understand which metrics matter in Node.js applications, why they’re important, and how to track them effectively in production.
Grafana Assistant, a context-aware LLM agent built into Grafana Cloud (grafana.com)
Today, as part of the GrafanaCON 2025 keynote in Seattle, we previewed Grafana Assistant, our new LLM-powered agent in Grafana Cloud that helps you learn and solve problems in Grafana easier than ever.
Pgwatch: PostgreSQL Monitoring Solution (github.com/cybertec-postgresql)
🔬PGWATCH: PostgreSQL metrics monitor/dashboard
Monitoring my Minecraft server with OpenTelemetry and Prometheus (dash0.com)
One of the secret pleasures of life is to be paid for things you would do for free. On a completely unrelated note, this blog post documents my time figuring out how to monitor a Minecraft server with OpenTelemetry, Prometheus and Dash0.
Cloudflare's approach to global service health metrics and software releases (cloudflare.com)
Show HN: Neurox – GPU Observability for AI Infra (github.com/neuroxhq)
This Helm chart is designed to install Neurox. Neurox helps monitor your AI workloads running on your Kubernetes GPU cluster. Purpose-built dashboards and reports combine metrics and live Kubernetes runtime state data to help admins, developers, researchers, and finance auditors surface relevant insights. Visit our main website for information.
Show HN: Raindrop – Sentry for AI Products (raindrop.ai)
Raindrop sends you alerts when your AI misbehaves and links straight to the events, so you can dig into the conversations or traces, understand the root cause, and fix it—fast.
I gave up on self-hosted Sentry (2024) (bugsink.com)
In the early 2010s, I was a big fan of Sentry. It was a great tool for tracking errors in web applications. At the time, I was making software for law firms, so sending error reports to a third-party service was out of the question, I needed to host it myself. So I did.
Show HN: Coroot – eBPF-based, open source observability with actionable insights (github.com/coroot)
Coroot is an open-source APM & Observability tool, a DataDog and NewRelic alternative. Metrics, logs, traces, continuous profiling, and SLO-based alerting, supercharged with predefined dashboards and inspections.
Engineering a Trace Details Page That Handles a Million Spans (signoz.io)
Show HN: Dish: A lightweight HTTP and TCP socket monitoring tool written in Go (github.com/thevxn)
tiny one-shot monitoring service remote configuration of independent 'dish network' (via -source ${REMOTE_JSON_API_URL} flag) fast concurrent testing, low overall execution time, 10-sec timeout per socket by default 0 dependencies
JEP Draft: JFR Method Timing and Tracing (openjdk.org)
Extend JDK Flight Recorder (JFR) to support bytecode-based method timing and tracing for quick and easy use.
Gravity CI (gravity.ci)
Gravity monitors build artifact sizes to prevent accidental increases – right in your CI pipeline.
Some notes on Grafana Loki's new "structured metadata" (utoronto.ca)
Grafana Loki somewhat bills itself as "Prometheus for logs", and so it's unsurprising that it started with a data model much like Prometheus.
WinRing0: Why Windows is flagging your monitoring and fan control apps as threat (theverge.com)
On Tuesday morning, some PC gamers woke up to discover their computers were seemingly under threat.
Show HN: Pulse – Maintain healthy OpenSearch and Elasticsearch clusters (pulse.support)
Pulse puts you in control of your search cluster monitoring and maintenance. Get more clarity, better performance, and lower costs
Show HN: Subtrace – Wireshark for Docker Containers (github.com/subtrace)
Subtrace is Chrome DevTools for your backend. It tracks the API requests coming in and going out of your servers so that you can solve problems in production quickly.
Ask HN: What do you run instead of Datadog? (ycombinator.com)
Datadog has turned into an ever loving piece of shit. I am sick of their sales team grabbing us by the ankles and "Accidentally" charging for services we don't use. Now, this morning they changed something with their AWS integration that is causing 10X the API calls against our accounts (and thus, 10X guardduty costs on our end analyzing those API requests).
Grafana: Why observability needs FinOps, and vice versa (grafana.com)
Observability tools have changed the way we monitor infrastructure and applications, as teams get complete visibility into performance across complex, multi-cloud environments.
Datadog Dollars: Why Your Monitoring Bill Is Breaking the Bank (oneuptime.com)
Have you ever opened your monitoring bill and felt your heart skip a beat? You're not alone. In the world of digital infrastructure, many companies are experiencing sticker shock when they see their Datadog invoices. Let's unpack why your monitoring bill might be breaking the bank and explore how you can rein it back in.
How to monitor and debug Terraform (Terragrunt/OpenTofu) using OpenTelemetry (dash0.com)
This blog post provides a comprehensive guide on monitoring and debugging Terragrunt, Terraform/OpenTofu using OpenTelemetry.
Perforator – cluster-wide continuous profiling tool for large data centers (github.com/yandex)
Perforator is a production-ready, open-source Continuous Profiling app that can collect CPU profiles from your production without affecting its performance, made by Yandex and inspired by Google-Wide Profiling.
Top OpenTelemetry Collector Components (dash0.com)
Understanding and managing the performance of your applications can be a significant challenge – but it doesn’t have to be. This is where OpenTelemetry comes in, offering a powerful framework for collecting and exporting telemetry data (traces, metrics, and logs) from your applications.
Kubestatus: Open source tool to easily add status page to your K8s cluster (github.com/soub4i)
Kubestatus is an free and open-source tool to easily add status page to your Kubernetes cluster that currently display the status (operational, degraded or DOWN) of services.It is written in Go and uses the Kubernetes API to fetch information about the clusters and resources checck the kubestatus-operand image.
Slum: The Shadow Library Uptime Monitor (open-slum.org)
Datadog acquires Quickwit (datadoghq.com)
To help our customers meet these requirements without sacrificing visibility or introducing multiple logging tools, we are pleased to announce that Quickwit—a popular open source distributed search engine—is joining Datadog.
Kubernetes horizontal pod autoscaling powered by an OpenTelemetry-native tool (dash0.com)
This blog post shows how to use Dash0 as the source of truth to automatically scale applications running on Kubernetes.
37signals Dev – Monitoring 10 Petabytes of Data in Pure Storage (37signals.com)
How we use Prometheus to have metrics and alerts for Pure Storage.
Using AZs can eat up your budget – From Prometheus to VictoriaMetrics (prezi.com)
By 2024, Prezi’s monitoring system, built around Prometheus, was becoming outdated. It was already 5+ years old, running on a deprecated internal platform and accumulating a significant amount of costs every month.