Image Loading

Principal Site Reliability Engineer (SRE)

Job Description

  • India

About the Role

  • As the Principal SRE, you will be responsible for leading and driving platform-first initiatives to ensure the scalability, reliability, and performance of our technology platform. You will play a pivotal role in enhancing the availability, reliability, and performance of our critical systems and services.

What will you do?

  • Lead and drive platform-first initiatives, with a focus on scalability, reliability, and performance of our technology platform.
  • Design, build, and maintain robust infrastructure supporting our distributed systems, leveraging technologies such as Kubernetes, Kafka, Postgres, Cassandra, and Redis.
  • Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
  • Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
  • Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
  • Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
  • Align the platform with customer needs and business goals by working closely with cross-functional teams.
  • Develop and maintain CI/CD pipelines for seamless deployment and release management.

What makes you a match?

  • Proven expertise in software development and engineering, with a strong emphasis on building large-scale distributed systems.
  • Proficiency in one of the commonly used programming languages for building distributed systems, such as Golang, Java, or Python.
  • Extensive experience with cloud infrastructure providers (AWS, Azure, or GCP) and developing distributed systems using cloud services.
  • Strong expertise in container orchestration platforms, specifically Kubernetes. CKA certification is a plus.
  • Exceptional problem-solving skills and a passion for developing robust, scalable, and secure solutions.
  • Excellent communication skills to effectively collaborate with cross-functional teams.
  • Ability to share impactful tech stories, demonstrating the results of your technical contributions.

Skills

  • Golang
  • Python
  • Java
  • Kubernetes
  • AWS
  • Distributed Systems

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Oct 12, 2023

Experience

5-10 Years

Compensation (Annual in Lacs)

Best in the Industry

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent