building a kubernetes homelab from scratch #1 — infrastructure and setup

This is the first post of a build series. I’m going from junior DevOps to Kubernetes expert in 6 to 12 months, in public. This series is the technical log of that.

What we’re building: a production-grade Kubernetes homelab, from scratch, on physical hardware. That phrase needs unpacking.

Production-grade means built the way a company would build it. Not a toy cluster on a cloud free tier. Real infrastructure decisions, real networking, real security baseline.

Kubernetes is the industry-standard tool for running applications on infrastructure. You don’t deploy your app directly to a server anymore. You deploy it to Kubernetes, and Kubernetes handles the rest. It’s what every company at any real scale is using right now.

Homelab means physical machines you own and control. Not VMs on a cloud provider. Actual hardware sitting somewhere, running Linux, connected over a network you manage.

From scratch means we’re doing it step by step, starting from nothing.

The idea came from Misha Vandenberg. He’s a senior DevOps engineer from the Netherlands who’s been pushing the homelab message hard: every DevOps engineer, every software developer who’s serious should build one. The skills you get are unmatched, and the CV project that comes out of it is one of the strongest arguments you can make to a hiring manager.

I was part of his community for a month in 2024 after winning a giveaway. His stuff is excellent. Also costs somewhere in the thousands per month. So we’re building our own version.

why bother

Three reasons I keep coming back to.

It gets you hired. At the end of this series, I’ll have something on my CV that shows I can work with real Kubernetes applications at a production-grade level. That’s a concrete, demonstrable skill. Not “I did a course.” Not “I watched some videos.” An actual running cluster with real workloads.

Kubernetes is where the money is. Platform engineers, infrastructure engineers, DevOps engineers, senior software engineers, cloud engineers. All of them need Kubernetes at some level. Salaries for senior roles in this space are well into six figures in most European and US markets. This skill compounds hard.

AI runs on Kubernetes. OpenAI, Anthropic, every major AI company runs their models on Kubernetes. With AI eating more and more of the infrastructure world, knowing Kubernetes deeply puts you in the path of that growth. And for now at least, it’s genuinely hard to replace. The operational complexity is too high for current AI agents to handle autonomously.

prerequisites before you start

Be honest with yourself here. Don’t skip this section.

Linux. You need to be comfortable in the terminal. File system navigation, processes, permissions, basic networking. Kubernetes runs on Linux. You access it over a terminal. There’s no GUI to save you.

Docker. Kubernetes orchestrates containers. If you don’t understand what a container is, how an image is built, and what Docker is doing, the Kubernetes layer on top will make no sense. Learn Docker first.

Hardware. You need at least one machine that isn’t your main laptop. An old laptop lying around, a Raspberry Pi, an old desktop, a second-hand mini PC. For a single-node setup you need one extra machine. For multi-node (recommended) you need two. If you don’t have one, a Raspberry Pi 4 or 5 is cheap enough to justify. Ask an AI which one to get for K3s.

A working laptop to connect from. Your main machine. This is where you’ll run kubectl commands and SSH into your nodes.

GitHub account connected to your CLI. You need to be able to clone your repos from the terminal. That means SSH keys set up between your local machine and GitHub. If you’re on Windows, set up WSL first. Honestly I’d recommend just dual-booting Linux on a spare machine if you can.

100 minutes a day. This is the one that matters most. This series runs 6 to 12 months. Not a sprint, a sustained build. If you can’t commit to consistent daily time, the project will die.

single node vs multi-node

There are two ways to start.

Single-node: one machine running the whole cluster. Simpler, cheaper, works fine for learning. The downside is you miss the most important architectural concept in Kubernetes.

Multi-node: one machine as the control plane (master node), one or more as worker nodes. This is closer to how production actually works, and it forces you to understand the distinction that defines Kubernetes.

I’m running two nodes. Here’s why that distinction matters.

In a production Kubernetes cluster, you never put everything on one machine. If one server goes down, the containers on it go down, and your users feel it. So you spread workloads across multiple nodes. If one fails, the others keep serving.

The control plane (master node) doesn’t run your application containers. It orchestrates everything. It decides which worker node runs which container, watches for failures, processes your kubectl commands. You communicate with the control plane. The control plane manages the workers.

Worker nodes are where your actual application containers run. You never talk to them directly. The control plane handles it.

For my setup: one master node, one worker node, and my main laptop to send commands. That’s the baseline.

the network

All three devices need to talk to each other. I’m not exposing my nodes to the public internet for anyone to SSH into. Instead, everything lives inside a private network using Tailscale.

Tailscale is a VPN that makes this simple. Each device gets an IP address inside the Tailscale network. Only devices in that network can reach each other. This handles the networking and the security in one step.

Go to tailscale.com, create an account, and add your main laptop first. Then we’ll add the nodes.

installing ubuntu on the nodes

Whatever hardware you’re using for your nodes, install Ubuntu Server. Not desktop. Server. It’s a minimal install without a GUI, which is what you want for a node that exists to run containers.

If you have a Raspberry Pi: ask an AI “how do I install Ubuntu Server on a Raspberry Pi [your model].” It’ll walk you through flashing the image.

If you have an old laptop or desktop: download Ubuntu Server from ubuntu.com, download Rufus, flash it to a USB stick, boot from it.

Connect the node to your network over LAN if you can. Otherwise ask an AI how to configure WiFi on Ubuntu Server.

Once Ubuntu is installed, add the device to Tailscale:

# on the node
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

Follow the auth link it gives you. The node joins your Tailscale network and gets a stable IP inside it.

SSH keys

Right now you can SSH into your node with a password. That works but it’s annoying. Set up key-based auth.

Generate a key pair on your main laptop if you don’t have one:

ssh-keygen -t ed25519 -C "your@email.com"

Hit enter through the prompts to use the defaults. You now have a private key at ~/.ssh/id_ed25519 and a public key at ~/.ssh/id_ed25519.pub.

Copy the public key to your node:

ssh-copy-id username@<tailscale-ip-of-node>

Enter your password once. After that, SSH is passwordless. You can now ssh username@<tailscale-ip> and you’re in immediately.

Repeat this for every node.

where we are

After this episode, the baseline is done:

Ubuntu Server running on the nodes
All devices connected over Tailscale
SSH key auth configured on the nodes
My main laptop can SSH into any node in one command

tailscale status  # should show all your devices with their IPs

no AI rule

One thing I decided for this series: no AI assistance for the Kubernetes and Linux work itself. Basic setup questions are fine (how do I install Ubuntu on this hardware, how do I configure WiFi). But the actual Kubernetes work, the debugging, the configuration. I’m doing it without AI.

Honest take: it’s miserable. Yesterday I spent way too long on something I’d have solved in two minutes with Claude. But when I finally figured it out myself, the understanding stuck in a way it doesn’t when you just copy an answer.

At work I use AI heavily. It makes me faster and that’s what my employer pays for. But for building these foundational skills, I’m deliberately working without it. The reps matter more than the speed right now.

next episode

Installing K3s on the control plane and worker node, setting up kubectl, configuring kubeconfig so I can manage the cluster from my laptop, and maybe getting a first look at a dev environment on the cluster.

The video is already up if you want to follow along.