Come work with the most recognized and popular franchise in the world! As subsidiary of the most profitable children's entertainment brands in existence, this company is responsible for the official website, brand management, licensing, marketing, and development outside of Asia.
After the incredible success of the augmented reality mobile game in 2016, all development was brought in-house and they haven't stopped growing since. Based in downtown Bellevue, they are well known for having a great work/life balance as well as an inclusive & collaborative culture.
The beautiful panoramic view from the office is icing on the cake.
The Operations Engineer will work in a key role that is primarily focused on mitigating customer impact by monitoring our platforms, identifying & resolving issues, and working with development teams on escalations to ensure we are highly performant, available, secure, scaled, and operationalized. Though the primary focus of this role is on monitoring & mitigating issues on our platform, this role with also be involved in & contributing towards creating & updating alerts & health checks, scripting & automation, operational & runbook documentation, and partnering with the Development, Test, and Security teams to put in place the right solutions for the business.
*Support of our development, test, and production environments via investigation & resolution of performance and functional issues with our service stack as well as upstream/downstream systems.
*Triage issues based on a clear understanding of the business problem and impact to ensure appropriate urgency in response.
*Troubleshoot and resolve complex production / application issues identified through alerts to ensure services are highly available & performant.
*Clearly and concisely document issues that cannot be fixed in the NOC and escalate to on call resources.
*Monitoring our platforms & services using enterprise class monitoring tools, reviewing logs and performing validation checks.
*Enhance monitoring of systems, services and hardware to enable expedient identify and resolution of issues.
*Manage incidents during critical issues and ensure notifications to stakeholders are prompt, accurate, and consumable by non-technical audiences.
*Ensuring that the incident ticketing system is regularly checked for high or critical priority tickets that need resolving or escalating to on call teams.
*Work with developers and engineering leads to advocate for operational improvements in our software stack.
*Develop and improve operational documentation while working within the DevOps team to improve the supportability of Production.
*3+ years DevOps / Network Engineering / Application Engineering / Operations Engineering experience.
*2+ years' practical scripting and/or development experience in shell, Python and/or other automation tools.
*Understand and troubleshoot networking problems, configuration, and application workflow changes.
*Experienced with Incident Management practices and processes.
*Experience with monitoring, metrics, and logging tools such as New Relic, Telegraf, Grafana or Nagios/Icinga.
*Experience working in a Linux environment and utilizing infrastructure in Amazon Web Services.