Job Description

Who we are

Mindtickle is the market-leading revenue productivity platform that combines on-the-job learning and deal execution to get more revenue per rep. Mindtickle is recognized as a market leader by top industry analysts and is ranked by G2 as the #1 sales onboarding and training product. We’re honoured to be recognized as a Leader in the first-ever Forrester Wave™: Revenue Enablement Platforms, Q3 2024!

Job Snapshot

As an SRE II, you will play a key role in ensuring our mission-critical systems' reliability, performance, and scalability. You will work closely with engineering teams to design, implement, and maintain infrastructure that supports high-volume data-intensive applications. Your expertise in monitoring, troubleshooting, and automation will drive operational excellence across our distributed environment.

What’s in it for you?

    • Maintain and improve the reliability, availability, and performance of high-volume, data-intensive applications.
    • Design, implement, and enhance monitoring, logging, and alerting solutions at scale.
    • Collaborate with development teams to optimize system architecture and reliability.
    • Manage and troubleshoot distributed systems in a Linux-based production environment.
    • Leverage AWS cloud services to scale infrastructure efficiently.
    • Utilize Kubernetes for container orchestration, ensuring optimal resource utilization and deployment strategies.
    • Implement CI/CD pipelines using GitLab to automate deployments and operational tasks.
    • Use infrastructure as code (IaC) tools such as Terraform and CloudFormation for provisioning and managing cloud resources.
    • Implement observability best practices using Grafana, Prometheus, Thanos, and Loki.
    • Perform root cause analysis (RCA) and proactively address performance bottlenecks and system failures.
    • Ensure security best practices and compliance across all infrastructure components.

We’d love to hear from you, if you:

    • Have 3+ years of experience in Site Reliability Engineering or related fields.
    • Possesses strong Linux fundamentals with a deep understanding of system internals.
    • Expertise in troubleshooting and problem-solving in distributed environments.
    • Have hands-on experience with logging and monitoring solutions at scale.
    • Are proficient in at least one programming language (preferably Python).
    • Have strong experience with AWS services and Kubernetes.
    • Have exposure to CI/CD pipelines, preferably using GitLab CI/CD.
    • Have experience with infrastructure as code (Terraform, CloudFormation).
    • Are familiar with observability tools such as Grafana, Prometheus, Thanos, and Loki.

Preferred Qualifications

    • Experience in performance tuning and capacity planning.
    • Knowledge of incident management and post-mortem analysis processes.
    • Familiarity with security best practices in cloud environments.
    • Experience in automating operational tasks using scripting and configuration management tools.

 

Skills

  • SRE
  • AWS
  • CI/CD
  • Python
  • Grafana
  • Cloud Formation

Education

  • Master's Degree
  • Bachelor's Degree

Job Information

Job Posted Date

Apr 14, 2025

Experience

3 to 7 Years

Compensation (Annual in Lacs)

Best in the Industry

Work Type

Permanent

Type Of Work

8 hour shift

Category

Information Technology

Copyright © 2022 All Rights Reserved. Saas Talent