Staff Site Reliability Engineer

Date: May 1, 2024

Location: Stockholm, SE, 111 23

Company: Optimizely

At Optimizely, we're on a mission to help people unlock their digital potential. We do that by reinventing how marketing and product teams work to create and optimize digital experiences across all channels. With Optimizely One, our industry-first operating system for marketers, we offer teams flexibility and choice to build their stack their way with our fully SaaS, fully decoupled, and highly composable solution.  


We are proud to help more than 10,000 businesses, including H&M, PayPal, Zoom, and Toyota, enrich their customer lifetime value, increase revenue and grow their brands. Our innovation and excellence have earned us numerous recognitions as a leader by industry analysts such as Gartner, Forrester, and IDC, reinforcing our role as a trailblazer in MarTech. 

 

At our core, we believe work is about more than just numbers -- it's about the people. Our culture is dynamic and constantly evolving, shaped by every employee, their actions and their stories. With over 1500 Optimizers spread across 12 global locations, our diverse team embodies the "One Optimizely" spirit, emphasizing collaboration and continuous improvement, while fostering a culture where every voice is heard and valued. 

 

Join us and become part of a company that's empowering people to unlock their digital potential! 

Introduction

SREs at Optimizely are focused on making us the most reliable, performant, and trustworthy Digital Experience Optimization platform ever. Our engineering teams have built data pipelines that process 10 billion events daily and applications that support powerful experimentation and collaboration workflows at scale.

Our platforms are built on AWS and GCP. We use technologies such as Kafka, Samza, HBase, MySQL, and Postgres. We build and manage our systems using TravisCI, Jenkins, Docker, Kubernetes, Terraform, and Chef.

We use a combination of managed and self-hosted approaches. This is a unique opportunity to lead the engineering organization in areas of standardized automated infrastructure and service provisioning and orchestration, service-oriented architectural excellence, and forward-looking planning and execution of large technical projects.

Job Responsibilities

Assist with defining a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice.
Work on the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges.
Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth.
Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices.
Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization.
Partner closely with security, quality, and product teams to achieve high priority security, privacy, compliance, reliability and business-continuity objectives on our overall roadmap.
Propose and drive large improvements to production systems to achieve significant impact to our business and engineering teams.
Mentor and coach engineers to be curious and effective at discovering and solving technical challenges.
Participate in SRE 24/7/365 on-call rotation.

Knowledge and Experience

You have proven experience (7-10years) demonstrating hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges.
You have deep technical experience with various cloud providers, containerization technologies, automated deployment frameworks, orchestration frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture.
You have the skills to implement load, stress, performance and reliability testing standards at scale to improve service, platform and infrastructure resiliency.
You promote openness, diversity of opinions and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems.
You demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings.
Multiple Cloud experience (AWS, GCP and Azure).
Monitoring expertise with DataDog, New Relic, Nagios; CDN experience is very desirable
AWS IAM, networking, security, architecture and general expertise a must.
You communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization.
You exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and the milestones to get there.
You enable the engineering organization to innovate and deliver with greater speed and safety.
Proven experience demonstrating hands-on business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges.
Proficiency in more than one programming language or infrastructure automation tool including any of: Python, Java, Bash, Terraform, Chef, or similar.
Monitoring expertise (Any of DataDog, New Relic, Nagios, Honeycomb, or similar).
ELK stack for centralized logging.
Ability to proactively look at all systems, tools, processes and architectures with an open mind and make recommendations on scale, reliability, availability and automation is key.

Education

BS CS or equivalent industry experience

Competencies

Displaying Technical Expertise
Critical Thinking
Testing and Troubleshooting
Demonstrating Initiative
Utilizing Feedback

Optimizely is committed to a diverse and inclusive workplace. Optimizely is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.