Saas Talent

Cloud Engineer II, Site Reliability Engineering

Job Description

Elevate your career journey by embracing a new challenge with Kinaxis. We are experts in tech, but it’s really our people who give us passion to always seek ways to do things better. As such, we’re serious about your career growth and professional development, because People matter at Kinaxis.

In 1984, we started out as a team of three engineers based in Ottawa, Canada. Today, we have grown to become a global organization with over 2000 employees around the world, and support 40,000+ users in over 100 countries. As a global leader in end-to-end supply chain management, we enable supply chain excellence for all industries. We are expanding our team in Chennai and around the world as we continue to innovate and revolutionize how we support our customers.

Our journey in India began in 2020 and we have been growing steadily since then! Building a high-trust and high-performance culture is important to us and we are proud to be Great Place to Work® CertifiedTM. Our state-of-the-art office, located in the World Trade Centre in Chennai, offers our growing team space for expansion and collaboration.

Location
Chennai, India

About The Team
The Site Reliability Engineering team is responsible for the delivery, management, and monitoring of Kinaxis products and cloud infrastructure in our production offerings. We are responsible for ensuring service availability and performance to our customers globally, 7x24x365. We develop automation geared towards production operations using modern platform tooling like GitHub Actions CI/CD pipelines, Terraform, ArgoCD and Ansible .

What you will do

Deliver customer excellence, making sure that Kinaxis meets all SLA objectives .
Apply software engineering principles to operational challenges with a focus on automation, self- healing and monitoring solutions.
Manage the lifecycle of customer production systems; deploy, upgrade, configure, decommission.
Triage service requests and incidents.
Excel at overcoming operational challenges.
Support workload migrations from our physical data centers to cloud environments.
Participate in an on-call rotation:
Investigate and resolve incidents.
Provide root cause analysis relating to production systems.
Work in a highly collaborative team.
T roubleshoot issues within the cloud platform .
Log defects and provide explanations for product behavior , recommend workarounds to customer support.
Keep apprised of technology trends and identify and recommend opportunities to leverage them.

Technologies we use

Prior experience working in ITIL-based methodologies, including Incident and Change Management.
Practical experience with managing:
Applications (Windows, Linux)
Containers (Helm, Docker).
Orchestration/Automation (Kubernetes, Ansible, Terraform, Jenkins).
System monitoring and centralized logging (Datadog, Prometheus, ELK).
In-depth and proactive communication and documentation skills.
Ability to work independently, and as part of a team.
Expertise in one major cloud vendor : GCP, AWS, Azure
Experience adher ing to all security and confidentiality practices and policies.
Must fulfill all security and confidentiality thresholds for this position (SOC 2, etc. )
Ability to plan and provide well thought out recommendations on technical issues.
Intermediate to advanced skills in information systems, and cloud related industry software tools, services, and hardware, and Kinaxis product knowledge is required .
Experience working with ticketing and self-service platforms such as ServiceNow.

What we are looking for

Bachelor's degree/diploma in Computer Science, Engineering, or equivalent related discipline.
Prior experience in an infrastructure engineering or site reliability engineering role.
3+ years of experience deploying and supporting distributed systems.
3+ years of experience with managing public cloud platforms (both console and API) like GCP, Azure or AWS.
3+ years of experience developing in Ansible, Terraform, PowerShell, Bash and Python.
Strong knowledge of system design to manage operational and reliability trade-offs.
A nalytical, system, and design thinking skills with an inventive approach to work through deep, ambiguous, and progressively complex problems.
Highly adaptable and able to pivot based on prioritization and needs of the business; proactively solicits feedback to ensure alignment.
Agile and resilient in managing multiple projects with multiple sources of information
A clear, concise, and professional communicator with the ability to present information and demonstrate knowledge to stakeholders at varying levels within the business.

Skills

Operating Systems
Cloud services
Kubernates
Terraform
ITIL
Bash
powershell
Distributed Systems

Education

Master's Degree
Bachelor's Degree

Job Information

Job Posted Date

Dec 23, 2024

Experience

3 to 7 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift