Image Loading

Senior Site Reliability Engineer

Job Description

The driving force behind our award-winning Data Retention platform at Mimecast
Dive into the forefront of innovation with our Data Retention engineering team, taking on the crucial Operations role to help us develop operational aspects of our archiving and security software and its associated platforms. Our team works with cutting edge solutions that empower Mimecast customers to Work Protected™ at large scale.

Why Join Our Team?
“If you are interested in solving and tackling complex problems with ingenuity and implement new ideas to build and scale reliable, high performing software in private and public cloud environments, then data retention team is for you, where we deal with Data ingestion, backup and search for our e-discovery and compliance customers” – Hiring Manager

What You'll Do
As a Site Reliability Engineer within the Data Retention and managed services team, you'll play an integral role in ensuring our code, tools, and deployments are consistent, high quality, and continually optimised. Your responsibilities will include:

  • Deploy, configure and manage AWS infrastructure services using IaC tools such as Terraform and CloudFormation
  • Define and implement standards, automation, and tooling to enable self-service in the cloud
  • Evolve and maintain our Kubernetes platforms in both private and public cloud environments
  • Improve overall engineering processes, deployment pipelines, and operational procedures
  • Innovate and advocate for improvements in CI/CD processes and tools
  • Mentor and guide other engineers, fostering a culture of collaboration, continuous learning, and professional growth
  • Create and maintain comprehensive technical documentation for processes and procedures
  • Provide support during critical incidents and implement preventive measures
  • Configuring and tuning Postgres instances for high availability and security hardening
  • Manage Services/servers on private cloud and tune them to be reliable and scalable

What You'll Bring

  • Domain expertise in DevOps and Site Reliability Engineering
  • Proficiency in configuring, managing, scaling, and monitoring services running on Linux-based servers
  • Hands-on experience creating re-usable IaC using tools like Terraform, CloudFormation, Helm etc
  • Proven ability to create and standardise CI/CD pipelines, automation technologies, and associated tools, optimised for both private and public cloud environments
  • Experience managing container platforms such as Kubernetes, AWS EKS, ECS, Fargate etc
  • Expertise in implementing observability & monitoring tooling for application and infrastructure using tools like Prometheus, OpenTelemetry, Grafana, Elastic, LogScale, and CloudWatch
  • Strong scripting & automation skills in languages commonly used for SRE/DevOps (like Bash, Python, Groovy)
  • Hands-on experience with foundational AWS services like ALB, NLB, ECS, S3, ElastiCache, IAM, CloudWatch
  • Experience with continuous delivery principles and the pragmatics of managing build pipelines, artefact repositories, zero-downtime deployment, and modern cloud best practices
  • Excellent interpersonal, communication and collaboration skills that enable you to work effectively across teams
  • Bias for action and problem solving – eagerness to take initiative and make things happen

Skills

  • AWS
  • Devops
  • CI/CD
  • IaC
  • Kubernetes
  • SRE
  • Python

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Nov 15, 2024

Experience

5-10 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent