Platform#
Keeping systems online and running 24/7!
Responsibilities#
- Manages, maintains and monitors all clusters 24/7
- Manages, maintains and monitors global Lagoon Infrastructure 24/7
- Monitor all production sites 24/7
- Reacts to infrastructure alerts
- Reacts to outages reported from customers via Client Support Team
- Provides Out-of-Office Hours Customer Emergency phone support (see Being On-Call)
- Continuously improves amazee.io Platform
- Coordinates with external partners (e.g. VSHN, AWS, GCP, Azure, Fastly) to ensure stable operation
- Guarantees platform and website uptime SLA
- Coordinates with Lagoon Team for Lagoon Features, Releases, Issues
- Coordinates with and supports amazee.io Security Team
- Monitors, analyses, and optimizes infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT Team
- Create and update Statuspage entries during outages and maintenance
- Writes post-mortems for more significant outages on time (see Post Mortem Process)
Non-Responsibilities#
- 1st & 2nd Level Customer Support
- No direct communication to customers during outages (communication happens via Statuspage)
- No Support Client Application issues or requests
- Maintain Lagoon Codebase
- Create reports on Site Uptime
Workstream#
Roles#
- Platform Lead
- Platform Engineer
- Platform Software Engineer
Current Staffing#
- Bastian Widmer (Team Lead)
- Brittany Mitchell
- Chandana Karunaratne
- Michael Schmid (Management Sponsor)
- Salvatore Pappalardo