Job Description
Company Name: Asimily
Website: https://asimily.com/
About Company
Asimily is an IoT analytics startup focused on solving security and operational use cases for connected devices in specific verticals - healthcare, buildings, industrial control systems, etc.) We are funded by a top-tier investor, have built the product, and have customers we are working with. Founders have deep experience in products, the market, and technology with a strong Engineering leader. One of the founders has run the connected device business unit at a Fortune 500 company.
Looking for a Site Reliability Engineer to ensure reliability, scalability, and performance of systems and applications. Design, implement, and maintain robust infrastructure and monitoring solutions. Collaborate cross-functionally to identify and resolve performance issues and drive system improvements.
Role: Site Reliability Engineer (Rotation US Shift)
Experience: 4 to 7 Years
Location: Pune- Hadapsar (Hybrid )
Work Time: Rotation US Shift (Time-7.00 pm to 2.00 am IST)
Role & Responsibility of Site Reliability Engineer
- Incident Response and Observability: Handle on-call duties, troubleshoot Linux systems, and work with monitoring tools like Prometheus/Grafana.
- Operational Efficiency: Lead efforts to maintain the high availability of multiple products in a multi-region cloud environment, focusing on improving operational performance.
- Automation: Develop tools and automation to enhance operational efficiency and reduce issue resolution times.
- Cloud Operations: Set up and maintain automation runbooks and tools for monitoring and maintaining cloud-based applications.
- Remote Work and On-call Duties: Participate in regular on-call responsibilities and contribute to our start-up environment across global time zones.
Key Responsibilities:
- Architect, build and maintain scalable infrastructure on GCP, AWS, and on-premises environments.
- Develop and implement monitoring and alerting solutions for hybrid infrastructure.
- Use Ansible for configuration management to automate deployment and scaling processes.
- Collaborate with cross-functional teams to establish best practices for system reliability and performance.
- Participate in bi-weekly on-call rotations, responding to critical incidents during India and USA daytime shifts.
Qualifications and Skills:
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
- Proven experience as a Site Reliability Engineer or similar role with hybrid cloud and on-premises infrastructure.
- Strong proficiency in a programming language (e.g., Python, Java) and scripting/automation expertise.
- Deep understanding of cloud platforms, particularly GCP, with hands-on experience in services like Compute Engine, Cloud Storage, and Pub/Sub.
- Experience with Docker and container orchestration solutions (Kubernetes experience is a plus but not required).
- Proficiency in Ansible for automating deployment and configuration tasks.
- Familiarity with monitoring and logging tools like Prometheus, Grafana, ELK stack.
- Understanding of networking principles and protocols (TCP/IP, DNS, HTTP, load balancing, VPNs).
- Experience with Unix-based operating systems, particularly Linux.
- Working experience with version control systems like Git.
- Knowledge of Google Cloud Infrastructure and developing Google Cloud CLI scripts is an advantage.
- Experience with Postgres database management.
- Strong problem-solving skills and the ability to troubleshoot complex issues.
- Effective communication and collaboration skills, especially in remote teams.
Preferred Qualifications:
- Relevant certifications such as Google Cloud Professional Cloud Architect.
- Previous experience working in a globally distributed on-call environment with hybrid cloud and on-premises infrastructure.
- Knowledge of infrastructure security best practices and tools, including IAM and security groups specific to GCP.
- Experience with CI/CD toolsets such as Jenkins or GitLab.
- Experience with Asana project management tools will be an added advantage.
Soft Skills:
- Comfortable working in a fast-paced and dynamic environment, willing to collaborate diligently in a cross-functional, multi-geo team setup to meet project deadlines.
- Demonstrates patience and tolerance when troubleshooting issues.
- Exhibits excellent communication skills (written, verbal, & virtual) and has a strong drive, self-motivation, logical thinking, and attention to detail.
- Passionate about adopting new technologies, software, and processes, with the ability to multitask effectively in a fast-paced environment with multiple deadlines.