
Building a Robust Homelab Monitoring System with Grafana and Prometheus
Introduction
As the number of self-hosted applications on my Proxmox-based homelab grew, eventually reaching over 15, I realised the importance of having a centralised and powerful monitoring solution. While I initially relied on Uptime Kuma for basic uptime monitoring, I needed a more comprehensive and flexible tool to monitor system health, resource usage, and performance metrics. This led me to explore the combination of Grafana and Prometheus.
This blog post documents my journey of setting up a complete monitoring stack using Docker, Traefik, Grafana, and Prometheus. I’ll also share some insights into my architecture, best practices I followed, and lessons I learned along the way.
Key Components in My Setup
My stack includes:
- Proxmox VE: Virtualisation platform.
- Debian 12: Stable Linux base for Docker.
- Docker & Docker Compose: Containers & orchestration.
- Traefik Proxy: Reverse proxy & SSL termination.
- Grafana: Visualisation and alerting dashboard.
- Prometheus: Metrics collection and monitoring.
- Node Exporter: Host metrics exporter.
- Cadvisor: Docker container metrics exporter.
Proxmox VE
My virtualisation platform of choice is Proxmox VE, an incredibly reliable and user-friendly hypervisor. It enables me to run multiple virtual machines and containers efficiently. I created a new virtual machine for this project using a Debian 12 template, which serves as the Docker host.
Docker and Docker Compose
I installed Docker and Docker Compose on the Debian VM. I chose to manage each application with its docker-compose.yml
file. This modular approach gives me fine-grained control over individual deployments and simplifies maintenance and troubleshooting.
Reverse Proxy with Traefik
To streamline external access and ensure secure connections, I integrated Traefik as my reverse proxy. Traefik automatically handles SSL certificate generation and renewal via Let’s Encrypt, making it incredibly convenient to deploy HTTPS-enabled services.
All containers that need to be accessed externally are connected to a shared Docker network named proxy
. This allows Traefik to route traffic efficiently to each service based on hostnames and labels defined in the Docker Compose configurations.
Example Traefik labels:
labels:
- "traefik.enable=true"
- "traefik.http.routers.traefik.entrypoints=http"
- "traefik.http.routers.traefik.rule=Host(`traefik.syjapp.com`)"
- "traefik.http.middlewares.traefik-auth.basicauth.users=${TRAEFIK_DASHBOARD_CREDENTIALS}"
- "traefik.http.middlewares.traefik-https-redirect.redirectscheme.scheme=https"
- "traefik.http.middlewares.sslheader.headers.customrequestheaders.X-Forwarded-Proto=https"
- "traefik.http.routers.traefik.middlewares=traefik-https-redirect"
- "traefik.http.routers.traefik-secure.entrypoints=https"
- "traefik.http.routers.traefik-secure.rule=Host(`traefik.syjapp.com`)"
- "traefik.http.routers.traefik-secure.middlewares=traefik-auth"
- "traefik.http.routers.traefik-secure.tls=true"
- "traefik.http.routers.traefik-secure.tls.certresolver=cloudflare"
- "traefik.http.routers.traefik-secure.tls.domains[0].main=syjapp.com"
- "traefik.http.routers.traefik-secure.tls.domains[0].sans=*.syjapp.com"
- "traefik.http.routers.traefik-secure.service=api@internal"
Grafana: Beautiful Dashboards and Alerts



Grafana is the heart of my monitoring dashboard. It connects to Prometheus and presents metrics in beautiful, customizable dashboards. Whether it’s CPU load, memory usage, or Docker container stats, Grafana makes it easy to visualise everything in real-time.
I added a few plugins, such as the grafana-clock-panel grafana-simple-json-datasource
to enhance the dashboard experience. Grafana is also behind Traefik for secure, authenticated access.
Beyond visualisations, Grafana also provides robust alerting capabilities. I configured alerts to notify me via email and Telegram when specific thresholds are exceeded, like high CPU or disk usage.
Prometheus: Powerful Metrics Collection
Prometheus is responsible for scraping and storing all metrics data. Its pull-based model and powerful query language, PromQL, make it an excellent choice for small- and large-scale monitoring setups.
My prometheus.yml
The config includes jobs for:
- Prometheus itself (self-monitoring)
- Cadvisor (container metrics)
- Node Exporter (host metrics)
Here’s a simplified example:
---
global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
# external_labels:
# monitor: 'codelab-monitor'
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
# Example job for node_exporter
- job_name: 'node_exporter'
static_configs:
- targets: ['192.168.80.40:9100']
# Example job for cadvisor
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
Exporters: Gathering the Right Data
Node Exporter
Node Exporter collects system-level metrics from the Docker host, such as CPU, memory, disk, and network usage. Initially, I faced issues scraping data when Node Exporter was connected to the proxy
network. Creating a separate network resolved the issue.
Cadvisor
Cadvisor, developed by Google, collects resource usage and performance metrics from running Docker containers. It’s easy to deploy and integrates seamlessly with Prometheus. However, like Node Exporter and Prometheus, it lacks authentication, so isolating it within your internal network is essential.
Security Best Practices
- Traefik Authentication: Use middleware like basic auth or OAuth to protect sensitive dashboards.
- Network Isolation: Restrict access to Prometheus, Cadvisor, and Node Exporter using Docker networks or firewall rules.
- SSL Certificates: Ensure all external access is routed through HTTPS using Traefik.
- Data Persistence: Mount volumes for both Grafana and Prometheus to retain historical data across container restarts.
Future Enhancements
- Log Monitoring with Loki: Integrate Grafana Loki and Promtail for centralised log collection and analysis.
- External Monitoring: Use Blackbox Exporter to check external websites and APIs.
- Advanced Alerting: Configure more complex alert rules and expand notification channels.
If you prefer a visual walkthrough or want to follow along step-by-step, check out my video tutorial covering the entire setup:
In the video, I cover:
- Creating a Docker host on Proxmox
- Deploying Prometheus, Grafana, Node Exporter, Cadvisor and a few other Docker apps
- Configuring Traefik with SSL and secure routing
- Setting up monitoring dashboards and alerts
- Troubleshooting common networking issues
Final Thoughts
This project gave me valuable hands-on experience with enterprise-grade tools. Setting up Grafana and Prometheus in a Docker environment helped me understand system monitoring from the ground up. With this setup in place, I now have deep visibility into my homelab’s health and performance, and the flexibility to grow and adapt it as needed.
If you’re running multiple services at home or in a lab environment, I highly recommend investing time in a similar setup. It’s a rewarding learning experience and gives you the tools to manage your infrastructure like a pro.
My docker-compose file for Grafana
services:
grafana:
image: grafana/grafana-enterprise
container_name: grafana
restart: unless-stopped
#ports:
#- '3000:3000'
user: "0"
volumes:
- ./grafana-storage:/var/lib/grafana
networks:
- proxy
environment:
- "GF_PLUGINS_PREINSTALL=grafana-clock-panel, grafana-simple-json-datasource"
#Labels for traefik
labels:
- "traefik.enable=true"
- "traefik.http.routers.grafana.entrypoints=http"
- "traefik.http.routers.grafana.rule=Host(`grafana.syjapp.com`)"
- "traefik.http.middlewares.grafana-https-redirect.redirectscheme.scheme=https"
- "traefik.http.routers.grafana.middlewares=grafana-https-redirect"
- "traefik.http.routers.grafana-secure.entrypoints=https"
- "traefik.http.routers.grafana-secure.rule=Host(`grafana.syjapp.com`)"
- "traefik.http.routers.grafana-secure.tls=true"
- "traefik.http.routers.grafana-secure.service=grafana"
- "traefik.http.services.grafana.loadbalancer.server.port=3000"
- "traefik.docker.network=proxy"
volumes:
grafana-storage: {}
networks:
proxy:
external: true
My docker-compose file for Prometheus
services:
prometheus:
#ports:
#- 9090:9090
image: prom/prometheus
container_name: prometheus
networks:
proxy:
command:
"--config.file=/etc/prometheus/prometheus.yaml"
volumes:
- "/home/sanju/docker/prometheus/config/prometheus.yaml:/etc/prometheus/prometheus.yaml:ro"
- "prometheus-data:/prometheus"
labels:
- "traefik.enable=true"
- "traefik.http.routers.prometheus.entrypoints=http"
- "traefik.http.routers.prometheus.rule=Host(`prometheus.syjapp.com`)"
- "traefik.http.middlewares.prometheus-https-redirect.redirectscheme.scheme=https"
- "traefik.http.routers.prometheus.middlewares=prometheus-https-redirect"
- "traefik.http.routers.prometheus-secure.entrypoints=https"
- "traefik.http.routers.prometheus-secure.rule=Host(`prometheus.syjapp.com`)"
- "traefik.http.routers.prometheus-secure.tls=true"
- "traefik.http.routers.prometheus-secure.service=prometheus"
- "traefik.http.services.prometheus.loadbalancer.server.port=9090"
- "traefik.docker.network=proxy"
node_exporter:
image: quay.io/prometheus/node-exporter:latest
container_name: node_exporter
command:
- '--path.rootfs=/host'
ports:
- 9100:9100
pid: host
restart: unless-stopped
volumes:
- '/:/host:ro,rslave'
cadvisor:
image: gcr.io/cadvisor/cadvisor:v0.52.1
container_name: cadvisor
restart: unless-stopped
#ports:
#- 8080:8080
networks:
proxy:
volumes:
- /:/rootfs:ro
- /run:/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
devices:
- /dev/kmsg
labels:
- "traefik.enable=true"
- "traefik.http.routers.cadvisor.entrypoints=http"
- "traefik.http.routers.cadvisor.rule=Host(`cadvisor.syjapp.com`)"
- "traefik.http.middlewares.cadvisor-https-redirect.redirectscheme.scheme=https"
- "traefik.http.routers.cadvisor.middlewares=cadvisor-https-redirect"
- "traefik.http.routers.cadvisor-secure.entrypoints=https"
- "traefik.http.routers.cadvisor-secure.rule=Host(`cadvisor.syjapp.com`)"
- "traefik.http.routers.cadvisor-secure.tls=true"
- "traefik.http.routers.cadvisor-secure.service=cadvisor"
- "traefik.http.services.cadvisor.loadbalancer.server.port=8080"
- "traefik.docker.network=proxy"
volumes:
prometheus-data:
external: true
name: prometheus-data
networks:
proxy:
external: true