Job Description
Have you ever wanted to be on the ground floor of a well-funded, rapidly growing global startup that is disrupting the grocery industry? We are a dedicated team of professionals with a passion for grocery and who help grocers thrive by making sure our team at Takeoff thrives. Our core values are what drive our decisions every day. We foster an accessible, approachable, and supportive environment and work together to reach new milestones and motivate each other towards excellence. Our team is on a mission to transform the grocery industry for the better.
Are you looking to make an impact daily and help us disrupt a 100 year old industry? If so, please continue reading!
Takeoff Technologies, a Massachusetts-based tech company, is the creator of the world’s first automated micro-fulfillment center (MFCs) that transforms the way people access groceries. Our solution provides retailers with the most cost-efficient way to fulfill their online grocery orders, using automated, hyperlocal micro-fulfillment centers.
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that our services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to users' needs, and a fast rate of improvement. Additionally, SREs will keep an ever-watchful eye on the capacity and performance of our system. Much of our software development focuses on optimizing existing systems, building infrastructure, and eliminating work through automation.
*This is a hybrid role where some time can be spent in our Pune office and some time spent working remotely.
Requirements:
- Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of our services.
- Lead sustainable incident response, blameless postmortems, and production improvements.
- Provide guidance to other team members on managing end-to-end availability and performance of business-critical services, building automation to prevent problem recurrence, and building automated responses for applicable service conditions.
- Mentor and train other team members on design techniques and coding standards, and cultivate innovation and collaboration across multiple teams.
- Design, build and maintain CI/CD, testing, and operations infrastructure for our systems
- Manage individual project priorities, deadlines, and deliverables.
Basic Qualifications:
- Bachelor's degree in Computer Science or a related field
- 8+ years of experience in SRE, Systems Engineering, or DevOps role
- 4+ years of hands-on experience working with cloud technologies (GCP preferred)
- 4+ years of hands-on experience in one or more programming languages. Python, Go experience is preferred
- 3+ years of hands-on experience in Unix/Linux platforms
- 3+ years of experience, both building end-to-end automated CI/CD pipelines, as well as application and operations support
- Experience with container technology, including Kubernetes and Docker
- Experience with provisioning infrastructure through IAC (preferably Terraform) and cloud automation principles
- Experience with configuration management, monitoring tools like ELK, Grafana, DataDog
Preferred Qualifications:
- Knowledge of automation tools such as Puppet, Chef, Ansible, Salt, etc. in a production environment
- Familiarity with NoSQL technologies
- Good understanding of networking and related protocols; must have a strong understanding of fundamentals (HTTP, DNS, TLS)