Scheduling & Node Affinity
Scheduling Matters
Picture a global shipping company. Packages (pods) must be delivered to warehouses (nodes) across continents. Some warehouses have more space, some specialize in fragile goods, and some are closer to customers. Without a smart dispatcher, deliveries would pile up in the wrong places, wasting resources and delaying shipments.
Kubernetes faces the same challenge. Pods must be scheduled onto nodes efficiently, respecting resource limits, policies, and workload requirements. The Scheduler is Kubernetes’ dispatcher, ensuring workloads are placed intelligently across the cluster.
Role of the Scheduler
- Resource Awareness: The Scheduler checks CPU, memory, and GPU availability before placing pods.
- Constraints: It respects rules like node affinity, taints, and tolerations.
- Policies: It spreads pods across zones for high availability.
- Priorities: Critical workloads are scheduled first, ensuring resilience.
The Scheduler is the invisible hand of Kubernetes, balancing workloads across nodes.
Node Affinity – Steering Workloads
Sometimes, workloads need to run on specific nodes. For example:
- A GPU‑intensive AI model must run on GPU nodes.
- A database pod must run on nodes with SSD storage.
- Compliance workloads must run on nodes in a specific region.
Node Affinity provides rules for steering pods toward (or away from) certain nodes.
- Required Affinity: Hard rules - pods must run on matching nodes.
- Preferred Affinity: Soft rules - pods should run on matching nodes, but can run elsewhere if needed.
Analogy: Node affinity is like shipping fragile goods to warehouses with climate control - you don’t just send them anywhere.
Taints and Tolerations
- Taints: Mark nodes as restricted (e.g. “only for GPU workloads”).
- Tolerations: Allow pods to run on tainted nodes if they meet requirements.
Together, they ensure sensitive nodes aren’t overloaded with the wrong workloads.
Global Context
- Enterprises: Use scheduling policies to ensure compliance, performance, and cost optimization.
- Cloud Providers: Managed Kubernetes services integrate scheduling with autoscaling and multi‑zone resilience.
- Community: Scheduling strategies are evolving with AI‑driven placement and smarter resource allocation.
Hands‑On Exercise
- Add a toleration to the pod spec to allow scheduling on that node.
- Reflect: How do affinity, taints, and tolerations ensure workloads run in the right place?
Add a taint to a node:
kubectl taint nodes node1 hardware=gpu:NoSchedule
Create a pod with node affinity:
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: hardware
operator: In
values:
- gpu
containers:
- name: ai-model
image: tensorflow/tensorflow:latest
kubectl apply -f gpu-pod.yaml
kubectl get pods -o wide
The Hacker’s Notebook
- Scheduler is the dispatcher - balancing workloads across nodes.
- Node affinity is steering - ensuring pods land where they belong.
- Taints and tolerations are safeguards - protecting sensitive nodes.
- Lesson for engineers: Scheduling isn’t random - it’s policy‑driven orchestration.
- Hacker’s mindset: Treat scheduling rules as your strategy. With them, you can optimize performance, compliance, and resilience across global clusters.
