DevOps by Default Blog

Posts tagged "Devops"

Clear

18 articles

Production Kubernetes Deployments with Helm

GOTRS 0.5.1 shipped with a production-ready Helm chart. Here’s what went into making it robust. The Problem We started with Kustomize manifests in a k8s/ directory. They worked for simple deployments but became unwieldy as configuration options grew. Database selection, replica counts, …

Read more

Container-First Development: Lessons from GOTRS

The latest GOTRS release focused on something that doesn’t make for exciting screenshots but matters enormously: trust in the development workflow. The Problem Every developer has a slightly different local environment. Different Go versions, different database clients, different shell …

Read more

Sustainable Infrastructure: Carbon-Aware Computing

Cloud computing consumes enormous energy. As infrastructure scales, so does environmental impact. We started measuring and optimising for carbon emissions alongside cost and performance. The Problem Cloud made infrastructure invisible—including its environmental cost. Spinning up resources was …

Read more

AI-Assisted Operations: Practical Applications Beyond Hype

AI promises to revolutionise everything, including operations. After a year of experimentation, we’ve found where it genuinely helps and where it’s still hype. Spoiler: it’s not replacing engineers anytime soon. The Problem Alert fatigue persisted despite tuning. Hundreds of alerts …

Read more

Building a Developer Portal with Backstage

“Who owns this service?” shouldn’t require Slack archaeology. Backstage gave us a single place for service catalogues, documentation, and developer workflows. The portal became the starting point for everything. The Problem Tribal knowledge dominated. Which team owns the payment …

Read more

OpenTofu State Encryption: A Feature Terraform Lacks

OpenTofu 1.7 introduced client-side state encryption—a feature the community requested from Terraform for years without success. For us, it solved a compliance problem that previously required workarounds. The Problem Terraform state contains secrets. Database passwords, API keys, and sensitive …

Read more

Supply Chain Security with SLSA and Sigstore

SolarWinds, Log4Shell, and countless smaller incidents proved that software supply chains are attack vectors. Compliance frameworks now require provenance verification. We implemented SLSA and Sigstore to meet requirements and build genuine trust. The Problem “Where did this binary come …

Read more

OpenTofu: Responding to the Terraform License Change

HashiCorp’s August 2023 license change sent shockwaves through the infrastructure-as-code community. Terraform moved from MPL to BSL, and within weeks, OpenTofu emerged as an open-source fork. We had decisions to make. The Problem The Business Source License isn’t open source. …

Read more

eBPF for Deep Observability Without Code Changes

Traditional observability requires instrumentation. Add libraries, modify code, redeploy. eBPF offers visibility into systems you can’t or won’t change, directly from the kernel. The Problem Instrumenting legacy applications was impractical. Some had no source code access. Others were …

Read more

Platform Engineering: Beyond the DevOps Team

“You build it, you run it” sounds empowering until developers spend more time on infrastructure than features. Platform engineering offers a middle path between centralised ops and full developer responsibility. The Problem DevOps promised developer autonomy. The reality? Developers …

Read more

Secrets Management with HashiCorp Vault

Secrets end up everywhere: environment variables, config files, CI systems, developer laptops. Centralising them isn’t just about security—it’s about knowing what credentials exist and who can access them. The Problem Credential sprawl was rampant. The same database password existed in …

Read more

Cloud Cost Optimization Without Sacrificing Reliability

Cloud bills have a way of growing faster than the applications they support. When finance asked for a 30% reduction, we had to find savings without compromising reliability. The Problem The bill had grown organically. Resources provisioned for load tests never deleted. Development environments …

Read more

GitOps with ArgoCD: Making Kubernetes Declarative

After years of imperative deployment scripts and kubectl commands in CI pipelines, we adopted GitOps. The shift was more cultural than technical, and the benefits exceeded expectations. The Problem Deployment scripts grew organically. Each application had slight variations. Some used Helm, others …

Read more

Log4Shell: Lessons for Vulnerability Response

December 2021 delivered Log4Shell, and the subsequent weeks were chaos. A month later, we’re reflecting on what worked, what didn’t, and what we’re changing permanently. The Problem The vulnerability itself was severe—remote code execution with trivial exploitation. But the real …

Read more

Implementing the Three Pillars of Observability

Everyone talks about observability, but most organisations have monitoring with extra steps. We spent a year building genuine observability and learned what actually matters. The Problem We had monitoring. Lots of it. Dashboards for everything. Alert fatigue was constant. When incidents occurred, we …

Read more

Terraform State Management at Scale

Terraform state is deceptively simple until you have multiple teams, dozens of repositories, and hundreds of resources. Then it becomes your biggest operational challenge. The Problem Local state files don’t scale. The moment two people run terraform apply simultaneously, you have a race …

Read more

Kubernetes Namespace Isolation: Beyond the Basics

Running multiple teams on a shared Kubernetes cluster sounds efficient until one team’s runaway pod consumes all the cluster resources. We learned this the hard way. The Problem Namespaces provide logical separation but not isolation. By default, pods in one namespace can communicate with pods …

Read more

Migrating CI Pipelines from Jenkins to GitHub Actions

After years of maintaining Jenkins servers, we finally made the switch to GitHub Actions. Here’s why it was worth the effort. The Problem Jenkins served us well for a decade, but the maintenance burden grew unsustainable. Plugin updates broke builds. Java version conflicts caused headaches. …

Read more