DevOps by Default Blog

Building a PaaS Core in Two Days


GoatFlow 0.8.0 was originally targeted for September 2026. We shipped it in March. The release adds nine major feature areas to the plugin platform: universal custom fields, a plugin UI system, organisations with multi-tenancy, encrypted secure settings, entity deletion with GDPR anonymisation, a CLI-driven marketplace, self-service authentication, reusable UI components, and accessibility improvements. This post covers the engineering decisions, the things that broke, and what we’d do differently.

Custom Fields: One Table Pair for Everything

GoatFlow Custom Fields

The Problem

GoatFlow had OTRS-compatible dynamic fields on tickets and articles, but nothing for contacts, agents, queues, or organisations. Every plugin that needed additional fields on an entity had to create its own extension tables, manage its own migrations, and render its own form controls.

The Solution

Two new tables — gk_custom_field_def and gk_custom_field_value — using an EAV (Entity-Attribute-Value) pattern with denormalised typed columns. Instead of a single value column, the value table has val_text, val_int, val_decimal, val_date, val_datetime, and val_json. Each field type maps to exactly one column. This means indexed queries work without JSON extraction functions.

Fifteen field types ship out of the box, including three GIS types (point with lat/lng stored in val_decimal/val_decimal2 for bounding-box queries, polygon as GeoJSON in val_json, and structured addresses with postcode extracted to val_text for indexed lookup).

Plugins declare fields in GKRegistration:

CustomFields: []plugin.CustomFieldSpec{
    {
        Name:       "sku_code",
        Label:      "SKU Code",
        EntityType: "contact",
        FieldType:  "text",
        Config:     json.RawMessage(`{"max_length":20}`),
    },
}

Field names are auto-prefixed with the plugin name to prevent collisions. The sandbox enforces prefix filtering — plugins can only access their own fields via CustomFieldsGet/CustomFieldsSet/CustomFieldsQuery.

Legacy OTRS dynamic_field entries auto-migrate on startup. It’s a copy, not a move — the original tables are never modified, so downgrades work.

The Lesson

We initially put the REST API routes at /:entity_type/:id/custom-fields. This wildcard ate our entire ticket API — POST /api/v1/tickets matched as entity_type=tickets. Seven tests failed in ways that looked like auth issues but were actually routing conflicts. The fix was moving to /custom-fields/values/:entity_type/:id. We now have a standing rule: no wildcard route parameters in the first path segment.

Plugin UI System: Three Shells, Not One

The Problem

Plugins could register routes and return HTML wrapped in the base layout. But every plugin got the same chrome — full navigation bar, GoatFlow branding, agent-oriented layout. A customer-facing mobile app doesn’t want a navigation bar designed for desktop agents.

The Solution

Plugins declare independent UIs with one of three shell types:

  • Standard — full GoatFlow chrome, same as native pages
  • Minimal — slim header, optional bottom/top/side navigation, mobile-first
  • None — raw HTML, plugin controls everything

Each UI gets its own routes namespaced under /ui/{plugin}_{ui_id}/, its own branding (logo, colour, favicon), and an optional PWA manifest that generates automatically. Auth is configurable per UI type — session for agent/customer apps, PIN for kiosks, none for public pages.

The implementation uses a PluginCaller interface that the gin handler calls to invoke the plugin function, then wraps the response HTML in the correct shell template based on the UI’s config.

The Lesson

The nonce-based CSP approach sounded right in theory — generate a per-request nonce, add it to every <script> tag, set it in the CSP header. In practice, Alpine.js uses new AsyncFunction() internally (requires unsafe-eval), HTMX injects inline scripts from AJAX responses (no nonces), and the login page rendered through the dynamic engine didn’t have access to the gin context where the nonce was stored. We spent time on a solution that was fundamentally incompatible with our frontend stack. The CSP now uses unsafe-inline and unsafe-eval for scripts — XSS prevention comes from server-side bluemonday sanitisation, not browser-side CSP.

Organisations: Extending Sysconfig, Not Replacing It

The Problem

Multi-tenancy needs per-organisation configuration — different SLA rules, different branding, different feature flags. The obvious approach is a JSON settings column on the organisation table. But GoatFlow already has a database-stored configuration system (sysconfig_default + sysconfig_modified), and adding a second config system means admins learn two places, plugins can’t use ConfigGet, and the sysconfig admin UI doesn’t show org settings.

The Solution

A sysconfig_org table with the same key-value pattern as sysconfig_modified. Resolution cascades through four tiers: user preference → org override → system override → system default. Plugins call ConfigGet as normal — the platform resolves the org scope from the session context automatically.

Automatic query scoping in the SandboxedHostAPI was the second key piece. When a plugin calls DBQuery("SELECT * FROM ticket WHERE queue_id = ?"), the sandbox transparently appends AND org_id = ? for registered org-aware tables. The ScopeQuery function parses SQL to find the main table, checks the OrgAwareTables registry, detects table aliases, and injects the filter in the right position (after WHERE if present, or before ORDER BY/GROUP BY/LIMIT if not).

The Lesson

The org context needed to flow from the gin middleware through to the pongo2 template renderer. But the dynamic engine (YAML routes + plugin routes) creates a separate gin context from the main router. The nonce had the same problem — values set in the main router’s middleware aren’t visible in the dynamic engine’s handlers. We solved it by reading the CSP header from the response (already set by middleware) in the template renderer. For org context, the middleware sets both the gin context and the request context, and the HostAPI reads from the request context which propagates correctly.

Secure Settings: Let the Platform Handle Crypto

The Problem

Plugins that need API keys or webhook secrets had to roll their own encryption or store values in plain text. Rolling your own crypto is how you get crypto bugs.

The Solution

AES-256-GCM encrypted storage via two new HostAPI methods: SecureConfigGet and SecureConfigSet. The platform manages the encryption key (from environment variable or auto-generated), nonce generation (random 12 bytes per encrypt), and authenticated encryption (GCM tag prevents tampering). Secrets are org-scoped with automatic fallback to global values.

The admin display shows ••••••••abcd (last 4 characters) — enough to identify which key it is, not enough to use it.

The Lesson

The sync.Once pattern for key initialisation caused test failures. SetKey (for testing) sets the variable directly, but GetKey uses sync.Once which fires once and returns the cached result forever. We changed GetKey to check for a non-nil key before falling through to the once-initialised value.

Entity Deletion: Two Patterns, One Pipeline

The Problem

GoatFlow had inconsistent deletion: tickets used archive_flag, users used valid_id=2, articles used hard DELETE, and nothing had a recycle bin or GDPR anonymisation.

The Solution

A separate gk_recycle_bin table tracks soft-deleted entities rather than adding a deleted_at column to every existing table (which would be a breaking schema change). Each entity type registers how to soft-delete itself using its existing mechanism. PII anonymisation replaces fields with [DELETED] — irreversible, so business metrics stay accurate but personal data is gone.

Plugin cascade handlers (CascadeSpec in GKRegistration) get called on both soft and hard delete. A tombstone log (gk_deletion_log) records that a deletion happened without recording what was deleted.

The Lesson

The entity-type dispatch functions (softDeleteEntity, restoreEntity, hardDeleteEntity) are large switch statements that are hard to unit test without real entity data. Coverage for the deletion package sits at 47% — the infrastructure around the switch statements (recycle bin CRUD, tombstone logging, cascade dispatch, retention config) is well covered, but the entity-specific SQL operations need integration tests with full fixtures.

Security Fixes Along the Way

Two vulnerabilities found during development:

Stored XSS (gotrs-io/gotrs-ce#176): The ticket create handler and notes handler didn’t sanitise HTML before storage. The ticket detail handler rendered stored HTML with |safe. Defence-in-depth fix: bluemonday sanitisation on both write (four handlers) and read (two renderers). A security researcher was actively trying to chain this with a file upload bypass — we fixed it the same day it was reported.

Null byte injection: shell.php\x00.jpg passed extension validation (filepath.Ext returns .jpg) but the OS truncates at the null byte. Now rejected outright in validateFile() and stripped in sanitizeFilename().

2FA JWT role bug: The 2FA login completion path generated JWTs with role=user and isAdmin=false — it was a copy-pasted code path that never checked admin group membership. Admin users with 2FA enabled couldn’t access any admin API endpoints. Fixed by extracting a shared resolveUserRole() function used by all three login paths. This is the kind of bug that DRY prevents.

Test Infrastructure

Cross-package test pollution was the recurring headache. The comprehensive test suite runs all packages sequentially against one shared database. MCP tests create fixtures in the 80000 ID range, api/v1 tests use 90000. When one package’s cleanup deletes data another package’s sync.Once fixtures depend on, tests fail with confusing auth errors.

The fix: verify-and-recreate. Before each test, check if a key fixture row still exists. If not, re-run the full fixture setup. Combined with TestMain teardown functions that clean up each package’s data after completion, the suite is now reliable.

By the Numbers

  • 9 major feature areas
  • 6 database migrations
  • 6 design specifications
  • 15 custom field types
  • 15 languages with native translations
  • ~3,500 lines of test code
  • 2 security vulnerabilities fixed
  • 1 JWT role bug that only affected admins with 2FA
  • 1 wildcard route that ate the ticket API

What We’d Do Differently

Write the design spec first, every time. The features that had specs (custom fields, plugin UIs, organisations) went smoothly. The ones where we jumped straight to code (CSP nonces) required rework.

Test with the product, not just the code. Our CSP test verified the header string was correct. It didn’t verify the login page actually worked. A test that loads the page in a real browser would have caught the breakage immediately.

One code path for auth token generation. Three separate places generated JWTs with subtly different role logic. The 2FA path used role=user because someone copy-pasted without the admin check. A shared function from the start would have prevented the bug entirely.


GoatFlow is open source under Apache-2.0. Source, containers, and Helm charts: github.com/goatkit/goatflow.