Image Loading

Senior Site Reliability Engineer

Job Description

Company Description
Nexthink is the leader in digital employee experience management software. The company provides IT leaders with unprecedented insight allowing them to see, diagnose and fix issues at scale impacting employees anywhere, with any application or network, before employees notice the issue. As the first solution to allow IT to progress from reactive problem solving to proactive optimization, Nexthink enables its more than 1,200 customers to provide better digital experiences to more than 15 million employees. Dual headquartered in Lausanne, Switzerland and Boston, Massachusetts, Nexthink has 9 offices worldwide.

Job Description

  • Manage and maintain our Kubernetes clusters, including deployment, configuration, and upgrades. Ensure the stability and scalability of the clusters to accommodate increasing demands.
  • Utilize your hands-on knowledge to automate routine tasks and streamline operations. Implement infrastructure as code (IaC) practices to facilitate rapid and reliable deployments, ensuring efficient resource provisioning and management.
  • Participate in an on-call rotation, responding promptly and resolving critical incidents. Your commitment to running the cloud infrastructure will be crucial to maintaining high availability.
  • Continuously assess the performance of our cloud infrastructure (AWS) and applications. Implement optimizations to enhance system efficiency and reduce response times.
  • Stay current with best practices, tools, and market trends. Evaluate and recommend innovative solutions to be applied in the company.
  • Participate in incident handling
  • Work closely with cloud architects and the team’s technical lead to validate new system architecture proposals to support new features in the cloud
  • Proactively identify potential issues and troubleshoot system anomalies. Collaborate with other teams to address incidents and implement preventive measures to reduce downtime.
  • Set up and maintain comprehensive monitoring and alerting systems to detect anomalies, capacity constraints, and potential performance bottlenecks. Ensure timely responses to alerts and alarms.
  • Maintain accurate and up-to-date documentation of processes, procedures, and troubleshooting guides to facilitate knowledge sharing and standardization.

Qualifications

  • Bachelor’s degree in computer science, Computer Engineering, or related field, or 6+ years relevant work experience.
  • Strong hands-on experience in managing Kubernetes clusters in a production environment.
  • Excellent communication skills and teamwork
  • Knowledge in config automation (Ansible), CI/CD (Jenkins), and IaC (Terraform, Crossplane) for infrastructure management. Also proficient in at least one scripting language (bash, python)
  • Extensive experience in Linux container technologies (e.g., Docker, LXC)
  • Good knowledge of Linux, mainly Debian and CentOS,
  • Familiar with source code management solutions (GitHub, Bitbucket) and the Atlassian suite (JIRA, Confluence)
  • Experience working in an on-call rotation environment and running operations.
  • Proven problem-solving skills and the ability to troubleshoot complex technical issues.
  • Deep commitment to maintaining high system reliability and availability.
  • Extensive experience with AWS cloud computing platform and related services.
  • Intense motivation/curiosity to learn new things and discover new technologies,
  • Be able to work autonomously
  • Knowledge of monitoring systems (e.g., ELK, Prometheus, Kibana, New Relic, Datadog, Pagerduty)
  • Speak professional-level English.

Skills

  • Kubernetes
  • Ansible
  • AWS
  • CI/CD
  • Elastic Stack (ELK)
  • IaC
  • Infrastructure Management

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Oct 08, 2024

Experience

6 to 8 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent