Job Description
Role is remote, anywhere from India
Want to lead a global team responsible for the most important product features – availability, reliability & security? Sumo’s SRE program focuses on continual data-driven evolution and improvement of the reliability, security, and efficiency of our global scale technological presence. We are looking for a great leader with a passion for site reliability, continuous technology improvement, and reducing the operational workload of our own engineers - as well as our customers who leverage our products for their own monitoring and reliability.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Sumo’s services have reliability, uptime appropriate to users' needs as well as the ability to quickly and continuously deliver value to our customers.
Responsibilities
Reliability Program:
- Drive the program that maintains excellent uptime numbers for our services.
- Manage error budgets and associated policies for key product SLOs.
- Promote blameless post-mortem culture combined with developer operational
Accountability
- Continuously reduce operational workload for engineers by means of infrastructure improvements and automation.
- Cost Efficiency Program:
- Carry out projects that actively reduce our AWS spend.
- Manage AWS resource reservations for our whole infrastructure.
- Observe our current spend on cloud resources and improve our cost monitoring ecosystem.
Application Security Program
- Help product teams develop secure applications for the Sumo Logic platform.
- Integrate and implement solutions improving Sumo Logic’s security posture.
- Lead security reviews and penetration tests at design and implementation stages.
- Partner with the Security Operations Center (SOC) and Compliance team on our security and compliance posture, vulnerability management, and threat modeling of our tech stack.
- Educate product teams on secure development best practices and Quality
- Engineering teams on continuous improvement of security testing.
Team Leadership
- Lead and grow a global team of SREs adept at building extremely high-volume, fault-tolerant, efficient, and scalable backend systems.
Technical Vision
- Partner with our technical leadership team to review choices on an ongoing basis, in anticipation of increased scale and ever-evolving technology to meet the demands of growing business. Leverage technical skills to successfully analyze and improve the efficiency, scalability, and reliability of our backend systems.
Required Qualifications And Skills
- B.S. in Computer Sciences or related discipline (M.S., or Ph.D. is a plus).
- Minimum 8+ years of industry experience with a proven track record of ownership, delivery, and operational excellence.
- Minimum 3+ years in a management role.
- Experience being responsible for key SLOs of a cloud-based SaaS: availability, uptime, performance, and security.
- Experience in multi-threaded programming and distributed systems.
- Object-oriented programming experience, for example in Java, Scala, Golang.
- Experience with high volumes of data using the latest technologies such as Kafka, Kubernetes and Docker.
- Agile software development experience (test-driven development, iterative and incremental development). Experience in big data and/or 24x7 commercial service is highly desirable.
- Hands-on experience with public cloud Infrastructure-as-a-service and Platform-as-a-service offerings - Amazon Web Services, Google Cloud Platform, etc.
About Us
Sumo Logic, Inc. empowers the people who power modern, digital business. Sumo Logic enables customers to deliver reliable and secure cloud-native applications through its Sumo Logic SaaS Analytics Log Platform, which helps practitioners and developers ensure application reliability, secure and protect against modern security threats, and gain insights into their cloud infrastructures. Customers worldwide rely on Sumo Logic to get powerful real-time analytics and insights across observability and security solutions for their cloud-native applications. For more information, visit www.sumologic.com.