Skip to content

Platform#

Keeping systems online and running 24/7!

Responsibilities#

  • Manages, maintains and monitors all clusters 24/7
  • Manages, maintains and monitors global Lagoon Infrastructure 24/7
  • Monitor all production sites 24/7
  • Reacts to infrastructure alerts
  • Reacts to outages reported from customers via Client Support Team
  • Provides Out-of-Office Hours Customer Emergency phone support (see Being On-Call)
  • Continuously improves amazee.io Platform
  • Coordinates with external partners (e.g. VSHN, AWS, GCP, Azure, Fastly) to ensure stable operation
  • Guarantees platform and website uptime SLA
  • Coordinates with Lagoon Team for Lagoon Features, Releases, Issues
  • Coordinates with and supports amazee.io Security Team
  • Monitors, analyses, and optimizes infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team
  • Create and update Statuspage entries during outages and maintenance
  • Writes post-mortems for more significant outages on time (see Post Mortem Process)

Non-Responsibilities#

  • 1st & 2nd Level Customer Support
  • No direct communication to customers during outages (communication happens via Statuspage)
  • No Support Client Application issues or requests
  • Maintain Lagoon Codebase
  • Create reports on Site Uptime

Workstream#

Roles#

Current Staffing#