Workstream Purpose
Keeping systems online and running 24/7! The purpose of the Platform engineering team is to ensure the availability and health of client infrastructure. We ensure that we have an effective product to sell, and that it remains online, and in continually top condition. We are the
team that allows amazee.io to provide the service that we offer to clients.
Responsibilities
- Manage, maintain and monitor all clusters 24/7
- Manage, maintain and monitor global Lagoon infrastructure 24/7
- Monitor all production sites 24/7
- React to infrastructure alerts
- React to outages reported from clients via Client Support
- Provide emergency phone support outside office hours
- Continuously improve amazee.io platform
- Coordinate with external partners such as AWS, GCP, Azure and Fastly to ensure stable operations
- Guarantee platform and website uptime SLAs
- Coordinate with the Lagoon team for Lagoon features, releases, issues
- Coordinate with and support amazee.io IT and Security team
- Monitor, analyze, and optimize infrastructure costs with the help of knowledge from the Business Operations Team and tooling from the IT and Security team
- Create and update statuspage entries during outages and maintenance
- Write post-mortems for more significant outages on time
Non-Responsibilities
- 1st & 2nd level support
- No direct communication with clients during outages (communication happens via statuspage)