Platform

Keeping systems online and running 24/7!

Responsibilities

  • Manages, maintains and monitors all clusters 24/7

  • Manages, maintains and monitors global Lagoon Infrastructure 24/7

  • Monitor all production sites 24/7

  • Reacts to infrastructure alerts

  • Reacts to outages reported from customers via Client Support Team

  • Provides Out-of-Office Hours Customer Emergency phone support (see Being On-Call)

  • Continuously improves amazee.io Platform

  • Coordinates with external partners (e.g. VSHN, AWS, GCP, Azure, Fastly) to ensure stable operation

  • Guarantees platform and website uptime SLA

  • Coordinates with Lagoon Team for Lagoon Features, Releases, Issues

  • Coordinates with and supports amazee.io Security Team

  • Monitors, analyses, and optimizes infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team

  • Create and update Statuspage entries during outages and maintenance

  • Writes post-mortems for more significant outages on time (see Post Mortem Process)

Non-Responsibilities

  • 1st & 2nd Level Customer Support

  • No direct communication to customers during outages (communication happens via Statuspage)

  • No Support Client Application issues or requests

  • Maintain Lagoon Codebase

  • Create reports on Site Uptime

Workstream

Roles

  • Platform Lead

  • Platform Engineer

Current Staffing