Kubernetes Troubleshooting Guide
This guide covers common issues and solutions when developing with Skaffold and Helm.
Quick Diagnostic Commands
# Check cluster status
kubectl cluster-info
kubectl get nodes
# Check all resources
kubectl get all
kubectl get pods,services,deployments,statefulsets
# Check resource usage
kubectl top nodes
kubectl top pods
# Check events
kubectl get events --sort-by=.metadata.creationTimestampPod Issues
Pod Stuck in Pending
Symptoms: kubectl get pods shows Pending status
Diagnosis:
kubectl describe pod <pod-name>
kubectl get events | grep <pod-name>Common Causes & Solutions:
Insufficient Resources
bash# Check node capacity kubectl describe nodes kubectl top nodes # Solution: Increase minikube resources minikube config set memory 6144 minikube config set cpus 4 minikube delete && minikube startStorage Issues
bash# Check persistent volumes kubectl get pv,pvc # Solution: Enable storage addon minikube addons enable storage-provisionerImage Pull Issues
bash# Check image pull policy kubectl describe pod <pod-name> | grep -A 5 "Image" # Solution: Build local images eval $(minikube docker-env) skaffold build
Pod Stuck in CrashLoopBackOff
Symptoms: Pod repeatedly restarts
Diagnosis:
kubectl logs <pod-name> --previous
kubectl describe pod <pod-name>Common Solutions:
Check Application Logs
bashkubectl logs deployment/strato-control-plane --tail=50Check Dependencies
bash# Verify database is running kubectl get pods | grep postgresql kubectl logs strato-postgresql-0Check Resource Limits
bashkubectl describe pod <pod-name> | grep -A 10 "Limits" # Increase limits in values-dev.yaml resources: limits: memory: 1Gi # Increase from 512Mi
Pod Stuck in Init
Symptoms: Pod shows Init:0/1 or similar
Diagnosis:
kubectl describe pod <pod-name>
kubectl logs <pod-name> -c <init-container-name>Solutions:
- Check Init Container Dependenciesbash
# Control plane waiting for PostgreSQL kubectl logs <control-plane-pod> -c wait-for-postgresql # Verify PostgreSQL is accessible kubectl exec -it strato-postgresql-0 -- pg_isready
Service Connectivity Issues
Can't Access Control Plane Web UI
Diagnosis:
# Check service and pods
kubectl get service strato-control-plane
kubectl get pods | grep control-plane
kubectl describe service strato-control-planeSolutions:
Use Port Forwarding
bashkubectl port-forward service/strato-control-plane 8080:8080 open http://localhost:8080Use minikube service
bashminikube service strato-control-plane --url # Use the returned URLCheck NodePort (if configured)
bashminikube ip # Get cluster IP # Access via http://<minikube-ip>:30080
Services Can't Communicate
Symptoms: Agent can't connect to Control Plane, or Control Plane can't reach database
Diagnosis:
# Test DNS resolution
kubectl exec -it deployment/strato-agent -- nslookup strato-control-plane
# Test port connectivity
kubectl exec -it deployment/strato-agent -- nc -zv strato-control-plane 8080Solutions:
Check Service Names
bashkubectl get services # Ensure services use correct names in environment variablesCheck Network Policies
bashkubectl get networkpolicies # Should be empty for developmentVerify Service Endpoints
bashkubectl get endpoints # Ensure services have valid endpoints
Storage Issues
Database Data Loss
Symptoms: Database loses data between restarts
Diagnosis:
kubectl get pv,pvc
kubectl describe pvc data-strato-postgresql-0Solutions:
Check Persistent Volume
bash# Ensure PVC is bound kubectl get pvc # Check storage class kubectl get storageclassFor minikube, ensure storage addon
bashminikube addons enable storage-provisioner
Disk Space Issues
Symptoms: Pods fail to start due to disk space
Diagnosis:
# Check node disk usage
kubectl describe nodes | grep -A 5 "Conditions"
# Check Docker disk usage
docker system dfSolutions:
# Clean up Docker
docker system prune -a
# Clean up minikube
minikube ssh -- docker system prune -a
# Increase minikube disk size
minikube config set disk-size 20GB
minikube delete && minikube startImage Issues
Image Pull Errors
Symptoms: ErrImagePull or ImagePullBackOff
Diagnosis:
kubectl describe pod <pod-name> | grep -A 10 "Events"Solutions:
Use Local Images
bash# Point Docker to minikube eval $(minikube docker-env) # Build images locally skaffold build # Ensure imagePullPolicy is correct # In values-dev.yaml: global: imagePullPolicy: NeverCheck Image Names
bash# List available images docker images | grep strato # Verify Skaffold configuration skaffold config list
Slow Image Builds
Solutions:
# Use Docker buildx cache
export DOCKER_BUILDKIT=1
# Configure Skaffold cache
export SKAFFOLD_CACHE_ARTIFACTS=true
# Use multi-stage builds efficiently
# Check Dockerfile for optimal layer cachingPerformance Issues
Slow Startup Times
Diagnosis:
# Check resource usage
kubectl top pods
kubectl top nodes
# Check startup times
kubectl get events | grep StartedSolutions:
Increase minikube Resources
bashminikube config set memory 8192 minikube config set cpus 6 minikube delete && minikube startOptimize Resource Requests
yaml# In values-dev.yaml - reduce for faster startup controlPlane: resources: requests: memory: 128Mi cpu: 50mUse Faster Storage
bash# For macOS with Docker Desktop minikube start --driver=hyperkit --disk-size=20GB
High Memory Usage
Diagnosis:
kubectl top pods --sort-by=memory
kubectl describe node minikube | grep -A 10 "Allocated resources"Solutions:
Reduce Resource Limits
yaml# values-dev.yaml postgresql: resources: limits: memory: 256Mi # Reduce from 512MiDisable Unused Services
yaml# values-dev.yaml agent: enabled: false ovn: enabled: false openvswitch: enabled: false
Skaffold Issues
Skaffold Build Failures
Diagnosis:
skaffold diagnose
skaffold config listSolutions:
Check Docker Context
bashdocker context list eval $(minikube docker-env)Clear Skaffold Cache
bashskaffold cache purgeVerbose Logging
bashskaffold dev -v info
File Sync Not Working
Symptoms: Code changes don't trigger updates
Diagnosis:
# Check Skaffold file watchers
skaffold dev -v debug | grep syncSolutions:
Check Sync Configuration
yaml# skaffold.yaml sync: manual: - src: "Sources/**/*.swift" dest: /app/SourcesFile Permissions
bash# Ensure files are accessible ls -la control-plane/Sources/
Helm Issues
Template Rendering Errors
Diagnosis:
helm template strato helm/strato --values helm/strato/values-dev.yaml
helm lint helm/stratoSolutions:
Check Values Syntax
bash# Validate YAML syntax yamllint helm/strato/values-dev.yamlDebug Template Rendering
bashhelm template strato helm/strato --values helm/strato/values-dev.yaml --debugCheck Dependencies
bashcd helm/strato helm dependency build
Network Debugging
DNS Resolution Issues
Diagnosis:
# Test DNS from inside pods
kubectl exec -it deployment/strato-control-plane -- nslookup strato-postgresql
kubectl exec -it deployment/strato-control-plane -- cat /etc/resolv.confSolutions:
# Check CoreDNS
kubectl get pods -n kube-system | grep coredns
kubectl logs -n kube-system deployment/corednsPort Conflicts
Diagnosis:
# Check what's using port 8080
lsof -i :8080
netstat -tulpn | grep 8080Solutions:
# Use different local port
kubectl port-forward service/strato-control-plane 8081:8080
# Or configure different NodePort
# In values-dev.yaml:
controlPlane:
service:
nodePort: 30081Emergency Procedures
Complete Reset
# Stop everything
skaffold delete
minikube stop
# Reset minikube
minikube delete
minikube start --memory=4096 --cpus=2
# Rebuild dependencies
cd helm/strato
helm dependency build
cd ../..
# Restart development
skaffold devBackup Development Data
# Backup PostgreSQL data
kubectl exec strato-postgresql-0 -- pg_dump -U strato strato > backup.sql
# Restore data (to new cluster)
kubectl exec -i strato-postgresql-0 -- psql -U strato strato < backup.sqlResource Monitoring
# Continuous monitoring
watch kubectl top pods
watch kubectl top nodes
# Resource alerts
kubectl get events --watch | grep -i "failed\|error\|warning"Getting Help
Check Logs First
bashkubectl logs deployment/strato-control-plane --tail=100Check Recent Events
bashkubectl get events --sort-by=.metadata.creationTimestamp | tail -20Gather Debug Info
bashkubectl get all > debug-resources.txt kubectl describe pods > debug-pods.txt kubectl top nodes > debug-usage.txtCommunity Resources
- Kubernetes Slack: #kubectl channel
- Skaffold GitHub issues
- Helm GitHub issues