Saas Talent

Senior Site Reliability Engineer

Job Description

Pune

Have you ever wanted to be on the ground floor of a well-funded, rapidly growing global startup that is disrupting the grocery industry? We are a dedicated team of professionals with a passion for grocery and who help grocers thrive by making sure our team at Takeoff thrives. Our core values are what drive our decisions every day. We foster an accessible, approachable, and supportive environment and work together to reach new milestones and motivate each other towards excellence. Our team is on a mission to transform the grocery industry for the better.

Are you looking to make an impact daily and help us disrupt a 100 year old industry? If so, please continue reading!

Takeoff Technologies, a Massachusetts-based tech company, is the creator of the world’s first automated micro-fulfillment center (MFCs) that transforms the way people access groceries. Our solution provides retailers with the most cost-efficient way to fulfill their online grocery orders, using automated, hyperlocal micro-fulfillment centers.

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that our services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users' needs, and a fast rate of improvement. Additionally, SREs will keep an ever-watchful eye on the capacity and performance of our system. Much of our software development focuses on optimizing existing systems, building infrastructure, and eliminating work through automation.

*This is a hybrid role where some time can be spent in our Pune office and some time spent working remotely.

Requirements:

Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of our services.
Lead sustainable incident response, blameless postmortems, and production improvements.
Provide guidance to other team members on managing end-to-end availability and performance of business-critical services, building automation to prevent problem recurrence, and building automated responses for applicable service conditions.
Mentor and train other team members on design techniques and coding standards, and cultivate innovation and collaboration across multiple teams.
Design, build and maintain CI/CD, testing, and operations infrastructure for our systems
Manage individual project priorities, deadlines, and deliverables.

Basic Qualifications:

Bachelor's degree in Computer Science or a related field
8+ years of experience in SRE, Systems Engineering, or DevOps role
4+ years of hands-on experience working with cloud technologies (GCP preferred)
4+ years of hands-on experience in one or more programming languages. Python, Go experience is preferred
3+ years of hands-on experience in Unix/Linux platforms
3+ years of experience, both building end-to-end automated CI/CD pipelines, as well as application and operations support
Experience with container technology, including Kubernetes and Docker
Experience with provisioning infrastructure through IAC (preferably Terraform) and cloud automation principles
Experience with configuration management, monitoring tools like ELK, Grafana, DataDog

Preferred Qualifications:

Knowledge of automation tools such as Puppet, Chef, Ansible, Salt, etc. in a production environment
Familiarity with NoSQL technologies
Good understanding of networking and related protocols; must have a strong understanding of fundamentals (HTTP, DNS, TLS)

Skills

SRE
System Engineer
Devops
Python
GO
Linux/Unix
CI/CD
Kubernetes

Education

Master's Degree
Bachelor's Degree

Job Information

Job Posted Date

Apr 24, 2024

Experience

8 to 12 Years

Compensation (Annual in Lacs)

₹ Market Standard

Work Type

Permanent

Type Of Work

8 hour shift