Saas Talent

Cloud Operations Engineer 2

Job Description

About The Role
As a Cloud Operations Engineer 2 , you'll play a pivotal role in ensuring the reliability, scalability, and performance of our systems. You will work closely with cross-functional teams to identify and resolve infrastructure issues, optimize system performance, and implement automation solutions to streamline processes. This role presents an exciting opportunity for individuals with 4+ years of experience in site reliability engineering to further develop their skills and make a significant impact in a fast-paced environment.

This opportunity is hybrid (Bangalore Based) with 3 days in office and 2 days work from home
What You'll Do

Troubleshooting and resolving issues: Identifying and addressing technical issues, such as performance bottlenecks or system failures, to minimize downtime and maintain service continuity.
Automation and scripting: Developing and implementing automation scripts and tools to streamline operational tasks, improve efficiency, and enhance scalability.
Collaborate with software development and operations teams to design, implement, and maintain highly available and scalable infrastructure solutions.
Monitor system performance, troubleshoot issues, and implement proactive measures to prevent downtime or service disruptions.
Develop and maintain tools for automation, monitoring, and deployment to improve efficiency and reliability.
Participate in incident response and post-mortem analysis to identify root causes and implement preventive measures.
Continuously evaluate and optimize system performance through capacity planning, performance tuning, and infrastructure upgrades.
Stay updated with industry best practices and emerging technologies to drive innovation and improve system reliability.
Work in a 24/7/365 service environment with On-call responsibilities
Willingness to work in 2 shifts to ensure coverage and overlap with both European and United States time zones. Flexibility in working hours may be required to participate in on-call rotations and address critical issues outside of regular business hours.

What You'll Bring

Bachelor's degree in Computer Science, Engineering, or a related field.
4+ years of experience in site reliability engineering, infrastructure operations, or related roles.
Proficiency in scripting and automation using languages such as Python, Shell, or Go.
Hands-on experience with cloud platforms such as AWS, Azure, or GCP. - Strong understanding of Linux/Unix systems and networking concepts.
Experience with containerization technologies like Docker and orchestration tools like Kubernetes.
Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or similar.
Excellent problem-solving skills and the ability to thrive in a fast-paced, collaborative environment.
Strong communication skills with the ability to effectively interact with cross-functional teams.

Bonus Points

Certification in cloud technologies (e.g., AWS Certified Solutions Architect, Azure Administrator Associate).
Experience with a Container orechstration platform (eg. Kubernetes)
Experience with infrastructure as code tools such as Terraform, Ansible, or Puppet.
Familiarity with CI/CD pipelines and version control systems like Git.

Skills

Python
Shell Scripting
GO
Cloud platform
Docker
SRE

Education

Master's Degree
Bachelor's Degree

Job Information

Job Posted Date

Oct 18, 2024

Experience

4 to 8 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift