Image Loading

Staff Site Reliability Engineer

Job Description

Get to know Okta

Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences.

Join our team! We’re building a world where Identity belongs to you.

As a Staff Site Reliability Engineer you will champion all things pertaining to reliability at Okta for Auth0. Working closely with the Product Engineers, Quality Engineers, Platform Engineers and Architecture teams, your primary focus will be on ensuring production systems remain operational at all times, while continually setting and achieving long-term performance, reliability and scalability goals in a platform with an exponential growth plan for the coming years.

With Okta’s increased dedication to ensuring customer availability expectations are exceeded in every way, you will play a key role as we evolve our system architecture to meet the demands of enormous growth and support the hundreds of millions of users who rely on us to provide uninterrupted access to business-critical enterprise and consumer applications.

Skills

  • Exceptional communication skills, including technical writing in the English language
  • Systematic problem-solving approach, coupled with a strong sense of ownership and drive
  • Understanding of microservices, cloud infrastructure (AWS, Azure), databases (SQL, No-SQL, Key/Value), containers (docker, kubernetes), web technologies (web sockets, http) and networking (SSL, routing, VPN)
  • Live and breathe SLIs, SLOs, error budgets and SLAs
  • Strong belief in automating everything and reducing toil for yourself and teammates
  • Loves to work as a team, but is able to work effectively in a remote environment where tasks may be self-driven

Responsibilities

  • Working with the other teams to run, own and improve incident response processes
  • Participate in regular on-call rotations to ensure 24/7 coverage of all critical systems
  • Use existing monitoring tools to identify problems and resolve and/or escalate to service teams
  • Implement changes to enable or improve infrastructure resilience, monitoring, and alerting

Experience

  • 7+ years as a Site Reliability Engineer or in a Cloud Operations/DevOps role
  • 6+ years using golang, shell scripting and terraform
  • 2+ years as software developer in a SaaS environment
  • 4+ years in a production environment supporting large-scale, mission-critical applications

Skills

  • Cloud Operations
  • Devops
  • Site Reliability Engineering
  • SaaS
  • Software Development

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Feb 25, 2025

Experience

7 to 10 Years

Compensation (Annual in Lacs)

Best in the Industry

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent