There’s a moment every developer hits — you’re paying $50/month for a handful of small VPS instances, your services are scattered across three different cloud providers, and you realize you have no idea where half your stuff is running. That was me two years ago. Today, everything runs on hardware I own, in a setup I fully control.
This post walks through how I built my self-hosted infrastructure from the ground up — the decisions, the stack, and the lessons I picked up along the way.
Why Self-Host? #
The obvious answer is cost. Running a few services on cloud VPS instances is cheap, but once you start stacking databases, queues, monitoring, storage, and multiple apps — costs add up fast. A single bare-metal server can replace several cloud instances for a fraction of the recurring cost.
But cost isn’t the real reason. Control is. When you self-host, you own the data, you own the network, and you decide the rules. No vendor lock-in, no surprise pricing changes, no arbitrary rate limits. If something breaks, it’s on you — but at least you can actually fix it.
There’s also the learning aspect. Managing your own infrastructure teaches you things that no tutorial or managed service ever will. DNS propagation, firewall rules, disk I/O bottlenecks, certificate renewal failures at 3 AM — these are the experiences that make you a better engineer.
The Hardware Layer #
Everything starts with Proxmox, an open-source virtualization platform built on top of Debian. It gives you a clean web UI for managing virtual machines and containers, with support for clustering, live migration, and backups out of the box.
I run Proxmox on a dedicated server with enough RAM and cores to comfortably host 20+ services. The storage backend is ZFS — a filesystem that handles compression, snapshots, and data integrity verification natively. ZFS snapshots are incredibly useful for rollbacks. Before any risky upgrade, I snapshot the dataset, and if things go wrong, I can roll back in seconds.
# Create a snapshot before upgrading
zfs snapshot rpool/data/myservice@pre-upgrade
# Something went wrong? Roll back instantly
zfs rollback rpool/data/myservice@pre-upgradeFor lightweight services, I use LXC containers instead of full VMs. LXC gives you near-native performance with process-level isolation — perfect for running databases, reverse proxies, or any service that doesn’t need a full kernel. The resource overhead is minimal compared to a VM.
My general rule: LXC for infrastructure services, Docker for application workloads. This keeps things clean and separable.
Container Orchestration #
Most of my application workloads run in Docker containers, orchestrated with Docker Compose. For the scale I operate at, Compose hits the sweet spot — declarative, version-controlled, and simple enough that I can understand exactly what’s running without consulting a dashboard.
A typical service looks like this:
# docker-compose.yml
services:
app:
image: ghcr.io/my-org/my-app:latest
restart: unless-stopped
environment:
DATABASE_URL: postgres://user:pass@db:5432/app
REDIS_URL: redis://redis:6379
labels:
- traefik.enable=true
- traefik.http.routers.app.rule=Host(`app.example.com`)
- traefik.http.routers.app.tls.certresolver=cloudflare
networks:
- traefik
- internal
db:
image: postgres:16-alpine
restart: unless-stopped
volumes:
- pgdata:/var/lib/postgresql/data
networks:
- internal
redis:
image: redis:7-alpine
restart: unless-stopped
networks:
- internal
volumes:
pgdata:
networks:
traefik:
external: true
internal:The pattern is consistent across all services: the app connects to an external Traefik network for ingress, while databases and caches live on an internal network that’s not exposed. Labels on the app container tell Traefik how to route traffic — no separate config files to maintain.
I also use Kubernetes for workloads that need horizontal scaling or more sophisticated scheduling. But honestly, for most self-hosted scenarios, Docker Compose is more than enough. K8s introduces significant operational complexity that’s only worth it when you genuinely need it.
Reverse Proxy & SSL #
Traefik is the centerpiece of my ingress layer. It automatically discovers services via Docker labels, handles TLS termination, and manages certificate renewal — all without manual intervention.
The Traefik configuration is minimal:
# traefik.yml
entryPoints:
web:
address: ':80'
http:
redirections:
entryPoint:
to: websecure
scheme: https
websecure:
address: ':443'
certificatesResolvers:
cloudflare:
acme:
email: admin@example.com
storage: /letsencrypt/acme.json
dnsChallenge:
provider: cloudflare
providers:
docker:
exposedByDefault: false
network: traefik
api:
dashboard: trueEvery new service I deploy gets automatic HTTPS with zero extra configuration. I just add the Traefik labels to the Docker Compose file, and Traefik picks it up within seconds. The DNS challenge via Cloudflare means I can issue wildcard certificates and don’t need to expose port 80 for HTTP challenges.
Cloudflare also sits in front as a CDN and DDoS protection layer. DNS records point to Cloudflare, which proxies traffic to my server. This keeps my actual server IP hidden and adds an extra layer of caching for static assets.
Networking #
This is where things get interesting. My network setup has evolved significantly over time.
At the edge, I run pfSense as my primary firewall. It handles VLAN segmentation, NAT, and firewall rules. I separate my network into multiple VLANs — management, servers, IoT devices, and guest traffic are all isolated from each other. A compromised IoT device shouldn’t be able to reach my server VLAN, and guest WiFi shouldn’t see anything on my internal network.
UniFi access points and switches handle the physical layer. The UniFi controller runs as an LXC container on Proxmox, managing all network hardware from a single interface. Say what you will about Ubiquiti’s pricing, but the management experience is hard to beat for a home/small office setup.
For remote access, Tailscale is a game-changer. It creates a WireGuard-based mesh VPN that connects all my devices — laptops, phones, servers — into a single private network, regardless of where they physically are. No port forwarding, no dynamic DNS, no VPN server to maintain.
# Access my home server from anywhere
ssh user@server # Just works, over Tailscale
# Access internal services without exposing them publicly
curl http://grafana:3000 # Only accessible via Tailscale networkServices that don’t need to be public — like Grafana, Proxmox UI, or internal admin panels — are only accessible through Tailscale. This drastically reduces the attack surface. The only ports exposed to the internet are 80 and 443, both behind Cloudflare.
Monitoring & Observability #
Running your own infrastructure without monitoring is like driving at night with the headlights off. Prometheus scrapes metrics from every service, and Grafana turns those metrics into dashboards I can actually understand.
Every Docker host runs node_exporter and cadvisor for system and container metrics. Application services expose custom Prometheus endpoints where relevant. Prometheus collects everything and stores it with configurable retention.
# prometheus.yml
scrape_configs:
- job_name: node
static_configs:
- targets:
- 'node-exporter:9100'
- job_name: cadvisor
static_configs:
- targets:
- 'cadvisor:8080'
- job_name: traefik
static_configs:
- targets:
- 'traefik:8080'I have Grafana dashboards for CPU/memory/disk usage, container health, network throughput, and Traefik request rates. AlertManager sends notifications to Telegram when something goes wrong — disk usage above 85%, a container restarting in a loop, or Traefik returning too many 5xx errors.
The monitoring stack itself runs on a separate LXC container so it stays up even if the Docker host has issues. You don’t want your monitoring to go down at the same time as the thing it’s monitoring.
Backup Strategy #
The one thing I’ve learned the hard way: backups that aren’t tested are not backups.
My strategy is layered:
- ZFS snapshots — automatic hourly snapshots with 7-day retention. Instant rollback for filesystem-level issues.
- Application-level backups — PostgreSQL
pg_dumpruns nightly via cron, compressed and stored on a separate ZFS dataset. - Off-site replication — critical data is synced to MinIO on a separate machine using
rclone, and the most important stuff goes to an off-site location.
# Nightly database backup via cron
0 3 * * * pg_dump -Fc mydb > /backups/mydb-$(date +\%Y\%m\%d).dump
# Sync to MinIO
0 4 * * * rclone sync /backups minio:backups --min-age 1hI test restores quarterly. It’s tedious, but the one time you need a backup and it doesn’t work, you’ll wish you had tested it.
Provisioning with Ansible #
When I started, I configured everything manually — SSH in, install packages, edit config files. That works for one server. It doesn’t work when you need to rebuild or replicate.
Ansible handles all server provisioning now. Every package, every config file, every cron job is defined in playbooks. If my server dies tomorrow, I can spin up a new Proxmox host and have everything running again by executing a single command.
ansible-playbook -i inventory site.ymlThe playbooks cover base system setup, Docker installation, Traefik configuration, monitoring stack deployment, firewall rules, and user management. It’s not glamorous work, but it’s the difference between a one-hour recovery and a two-day scramble.
Lessons Learned #
After running this setup for a while, a few things stand out:
Start simple. Don’t build the perfect infrastructure on day one. Start with Docker Compose on a single server. Add complexity only when you hit real limitations, not imagined ones.
Automate early. The second time you manually configure something, write an Ansible playbook for it. Your future self will thank you.
Network segmentation matters. VLANs and firewall rules feel like overkill until you have a security incident. It’s much easier to set up segmentation from the start than to retrofit it later.
Monitor everything, alert selectively. Collect all the metrics you can, but only alert on things that require immediate action. Alert fatigue is real and dangerous.
Document your setup. Not for others — for yourself in six months when you’ve forgotten why that one iptables rule exists. I keep a private wiki with network diagrams, service inventories, and runbooks for common operations.
What’s Next #
The infrastructure is never really "done." I’m currently exploring moving more workloads to Kubernetes for better resource utilization and looking into GitOps workflows with Flux or ArgoCD for automated deployments. I’m also considering adding a secondary node for Proxmox clustering to enable live migration and high availability.
But for now, this setup handles everything I throw at it — multiple web apps, databases, queues, monitoring, and storage — all on hardware I own, running software I control. It’s not perfect, but it’s mine.
If you’re thinking about self-hosting, my advice is simple: just start. Pick one service you’re currently paying for in the cloud, spin up a cheap used server or a mini PC, and move it over. You’ll learn more in a weekend than in a month of reading documentation.
Thanks for reading!