Our client is one of the most frequently used mobile platforms globally. their mobile application has been downloaded onto over 75 million mobile devices, as a service for on-demand transportation and mobile payments platform. They have effectively raised over US $2 billion in the last round of funding. They have experience 3x growth year over year and in the process of building out their Research and Engineering team in Seattle.
We are looking for experienced site reliability engineers to help us operate, troubleshoot, and improve our real time on-demand transport service. Our platform is written mainly in Go and hosted on the AWS cloud. We are heavy users of MySQL, Redis and Kinesis.
Work with engineering teams to design and write code to create systems which are highly available and able to scale seamlessly.
Plan for and eliminate any potential threats to stability, availability or security.
Improve monitoring, alerting and resilience of systems.
Write tools to assist work such as capacity planning or improving the ability to debug production issues over distributed systems.
Contribute to a culture of learning and responsibility by writing detailed postmortem reports.
Tackle live issues on production when on-call with assistance from the rest of the teams.
Experience in designing and writing software for production systems.
Knowledge of Unix fundamentals.
Experience crafting, analyzing, and troubleshooting distributed systems.
Knowledge of TCP/IP networking.
Good written and spoken English.
Experience with AWS
Experience with a configuration management system such as Ansible.
Bachelors degree in Computer Science. Masters preferred.