Image Loading

Systems Reliability Engineer

Job Description

About The Role

We are looking for a well-rounded customer facing professional to join our SRE team. The right candidate for this role will be passionate about helping customers deploy/maintain/troubleshoot distributed services/applications in the cloud and on-premise infrastructure. You will be responsible for the big picture of how the services in the ThoughtSpot stack relate to each other and use a breadth of tools and approaches to solve a broad spectrum of problems. You are the ideal candidate if you take ownership of customer issues and see problems through to resolution.

What You'll Do

  • Take a customer-first approach to troubleshoot, debug and diagnose product issues.
  • Ensure prompt and accurate updates, meet SLAs and provide timely resolution to customer issues.
  • Create knowledge-base articles to document knowledge and help customers self service.
  • Understand the requirements and nuances of data centers and public cloud (VMware, AWS, Azure, GCP) features.
  • Maintain, monitor, and troubleshoot ThoughtSpot cloud infrastructure.
  • Work with Engineering teams to defin, and implement tools to enhance debuggability, supportability, availability, scalability, and performance.
  • Be an expert in cloud and on-premise infrastructure by developing automation and best practices.
  • Support a 24x7x365 organization by working rotational shifts and taking on-call responsibilities.
  • Understand cloud NetOps and SecOps aspects.

What You'll Bring

  • 3-9 years of relevant work experience troubleshooting Linux Systems.
  • Experience in virtualization and Cloud technologies
  • Experience in enterprise customer support, on-call rotation for critical SRE systems, leading incident review and root cause analysis.
  • Ability to diagnose technical problems and work with Engineering on escalated issues.
  • Strong problem solving skills, algorithmic thinking and a strong foundation in how systems should work.
  • Understanding of tools & frameworks required to Operate and manage Cloud infrastructure.
  • Strong customer service skills.
  • Solid communication skills and ability to work independently.
  • Ability to leverage automation, monitoring and data analysis to ensure high availability.

Additional Skills

  • Familiarity with programming languages like C/C++, Python, Go or Java.
  • Exposure to infrastructure and service monitoring tools.

Skills

  • Linux systems
  • Cloud technology
  • C/C++
  • Python
  • Reliability Engineering
  • Root Cause
  • Problem Solving

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Sep 23, 2024

Experience

2 to 10 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent