Image Loading

Cloud Reliability Engineer- Ops & Automation

Job Description

We are looking for a Cloud Reliability Engineer to join our team and focus on maintaining the reliability and availability of our cloud-based infrastructure. The ideal candidate will work in shifts and handle on-call duties to ensure smooth cloud operations by managing incidents and change requests within defined SLAs. Additionally, you'll contribute to efforts to automate operational tasks, reduce manual interventions, and improve overall efficiency.

Responsibilities

  • Monitor and manage cloud-based infrastructure to ensure high availability, performance, and security.
  • Respond to alerts and incidents, troubleshooting and resolving issues swiftly to minimize downtime.
  • Perform root cause analysis and post-incident reviews to improve system reliability and prevent future incidents.
  • Handle change requests within established SLAs, ensuring seamless updates to the production environment.
  • Participate in a shift-based schedule and on-call rotation to support critical infrastructure.
  • Collaborate with Engineering and Field teams to resolve service requests in a timely manner.
  • Automate routine operational tasks to reduce manual interventions and operational toil.
  • Identify opportunities for further automation in cloud operations and implement solutions to streamline processes.
  • Assist in the optimisation and maintenance of monitoring and alerting systems for cloud environments.

Required Skills/Qualifications

  • Bachelor’s degree in Computer Science, Information Technology, or a related field, or equivalent work experience.
  • 3-5 years of experience in cloud operations, system administration, or related fields.
  • Familiarity with cloud platforms such as AWS, GCP, or Azure.
  • Experience in automating operational tasks using scripting languages (e.g., Python, Bash, etc.).
  • Strong problem-solving skills, particularly in managing incidents under pressure.
  • Understanding of ITIL processes, incident, and change management.
  • Familiarity with monitoring tools and incident management platforms.
  • A proactive mindset focused on improving operational processes and reducing manual work through automation.

Preferred Skills

  • Experience with cloud-native tools (e.g., CloudWatch, Stackdriver) and automation frameworks.
  • Basic knowledge of containers and Kubernetes (EKS/GKE/AKS).
  • Familiarity with Linux systems, cloud networking, and troubleshooting.
  • Experience with CI/CD pipelines and DevOps tools for automation and infrastructure as code.
  • Interest in identifying and implementing automation to reduce operational toil and improve efficiency.

Working Conditions

  • Shift-based work with on-call responsibilities.
  • Fast-paced, collaborative environment with an emphasis on automation and cloud operations.

Skills

  • Cloud Operations
  • System Administration
  • Cloud platform
  • Python
  • Bash
  • Linux systems
  • CI/CD

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Sep 23, 2024

Experience

3 to 5 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent