Verticalmove
Apply Now »

Site Reliability Engineer - Dev-Ops/Systems Engineering


« Back to results

Ted Staeb Director of Talent Acquisition

Phone Work 4157957320
Phone Fax
Email: ted@verticalmove.com



Job Info



Category Dev-Ops/Systems Engineering
Employment Type Full-Time Employment
Compensation $0.00 - $0.00
Location United States, CA - 94103


Client Introduction



Our client's creators have a truly impactful past ranging from the "like" you use on Facebook to having being a driving force behind the creation of Google Maps. They have been features in Forbes, Tech Crunch, The New York Times as well as Fortune Magazine!

Their next product, which was acquired by one of the top 5 SaaS companies in the US and the #1 most innovative company worldwide according to Forbes, is now an industry leading productivity software used world-wide.

Their SF-based Site Reliability Engineering (SRE) team makes sure that our product is fast, stable, always available, and well-insulated from unwanted outages or surprises.

Our team focuses on keeping our product running smoothly, consistently, and without manual toil, in both public cloud services or bare-metal on-premise environments. We continually improve our technology and processes to create easy-to-understand robust scaling, observability, and automation. SRE works closely with the rest of engineering to continue shipping new features to delight our customers quickly and with little risk. We work to continually reduce the complexity that comes along with running large featureful applications over the Internet, especially for enterprise customers.

We know we have a keen product, and a fantastic place to work, and we're looking for some great people to join our SRE team to keep everything that way!


Job Description



You would get the opportunity to:

• Continually evolve our operational reliability and simplicity, responding to changes in environment, requirements, or circumstances.
• Maintain our observability and automation at the level where we need it to be, by extending existing infrastructure, setting up open source projects, or even developing custom solutions when necessary.
• Investigate and repair bugs, mysterious occurrences, and production issues throughout the entire system, in concert with product and infrastructure engineers.
• Champion operational excellence and production quality across the entire company, via production readiness reviews, system refactoring projects, and leading by personal example.


Job Responsibilities



• Skills and technologies you'd use (and learn or improve!) here:
• Building efficient scalable products on top of public or privately-run cloud services.
• Understanding, modifying, and writing Python code, both in our product codebase, and in supporting infrastructure for automation.
• Using configuration and orchestration tools to create repeatable, auditable, documented-in-code systems.
• Monitoring, tuning, and administrating SQL databases for scalability, reliability, and performance.
• Designing systems for the sweet spot of long-term scale and reliability, while keeping manual maintenance and complexity costs down, and still shipping at reasonable speeds.


Experience





Required Experience



Things we're looking for in people who we want to join us:
• Keen interest in keeping a holistic view of entire systems in mind: patterns, architectures, data flows, lifecycles, edge cases, and risks.
• Excited about continually reducing complexity, and creating systems that are easily understandable, repeatable, and observable.
• Convinced about the importance of communication (both verbal and written/online), close team collaboration, and sharing information with others (creating documentation, or in-person training).
• Eager to learn best-in-field design and engineering practices from coworkers with a wealth of skills and experience, and getting to add your unique mark to what we're building.
• Drawn to understand (at a rough level) the basic skeleton of the stacks on which your system operates, from both network (SSL/TLS, HTTP, DNS, TCP, IP, CIDR, local networking, global routing) and host (process/daemon, system library, process supervisor, binary packaging, UNIX distribution, kernel) perspectives.
• Committed to focusing on the priorities and needs of our customers, your coworkers, and the direction of the company in general, and aligning your strategic goals to benefit them.

Bonus points (not at all required, but would let you hit the ground running!):
• Experience running medium-to-large user-facing services on public cloud services, particularly AWS.
• Experience running, scaling, tuning, and debugging production SQL databases, particularly MySQL on AWS RDS.
• Experience with configuration and orchestration management tools, particularly Terraform and Docker.
• Experience with and opinions about modern best-practice observability/debugging/logging/monitoring stacks.
• Comfortable writing Python (specifically, Python 3) scripts and libraries from scratch, and modifying existing code.
• Comfortable enough with JavaScript and CSS to understanding and modify our web interfaces for internal-facing tools.


Required Education



Bachelors or above in Computer Science or a similar field

Previous MonthNext Month
SunMonTueWedThuFriSat