Handbook
Search…
Platform
Keeping systems online and running 24/7!

Responsibilities

  • Manages, maintains and monitors all clusters 24/7
  • Manages, maintains and monitors global Lagoon Infrastructure 24/7
  • Monitor all production sites 24/7
  • Reacts to infrastructure alerts
  • Reacts to outages reported from customers via Client Support Team
  • Provides Out-of-Office Hours Customer Emergency phone support (see Being On-Call)
  • Continuously improves amazee.io Platform
  • Coordinates with external partners (e.g. VSHN, AWS, GCP, Azure, Fastly) to ensure stable operation
  • Guarantees platform and website uptime SLA
  • Coordinates with Lagoon Team for Lagoon Features, Releases, Issues
  • Coordinates with and supports amazee.io Security Team
  • Monitors, analyses, and optimizes infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team
  • Create and update Statuspage entries during outages and maintenance
  • Writes post-mortems for more significant outages on time (see Post Mortem Process)

Non-Responsibilities

  • 1st & 2nd Level Customer Support
  • No direct communication to customers during outages (communication happens via Statuspage)
  • No Support Client Application issues or requests
  • Maintain Lagoon Codebase
  • Create reports on Site Uptime

Workstream

Roles

  • Platform Lead
  • Platform Engineer

Current Staffing

Copy link
On this page
Responsibilities
Non-Responsibilities
Workstream
Roles
Current Staffing