Site Reliability Engineering Manager
10 Years of Experience
Pune, Maharashtra, India
88882*****
65
-
Not Available
With a robust 9-year tenure in IT, I lead a high-achieving SRE team dedicated to streamlining processes and ensuring comprehensive observability across cloud platforms. As the SRE Manager, I steer the enhancement of system reliability through strategic leadership. Leveraging extensive experience, my focus remains on fine-tuning complex IT infrastructures for peak efficiency and exceptional performance. Our commitment to innovation drives us to explore cutting-edge technology, empowering us to deliver impactful solutions that optimize system performance. CERTIFICATIONS: ● HashiCorp Certified Terraform Associate 003 ● AWS Certified Solutions Architect – Associate ● Gremlin Certified Chaos Engineering Practitioner ● Red Hat Certified System Administrator ● ITIL® Foundation Certificate in ITSM ● All 4 PagerDuty Certifications ● 5 SumoLogic Certifications TECHNICAL SKILLS: ● Cloud: Amazon Web Services (AWS) ● Programming: Python ● Infrastructure as Code: Terraform, Cloudformation ● OS: Unix/Linux ● Scripting: Shell Scripting ● Container Orchestration: Kubernetes and Docker ● CI/CD: Jenkins and Git/GitLab ● Observability Tools: Prometheus, Grafana, AppDynamics, DataDog, SumoLogic ● Incident Management: PagerDuty ● Agile Tools: Polarian, Jira, Azure DevOps
Siemens Digital Industries Software, IT Services & Solutions, Information Technology & Services
Siemens Digital Industries Software
NICE CXone
Siemens Digital Industries Software, Siemens Digital Industries Software, NICE CXone, Amdocs, Tata Consultancy Services
Job Title : Site Reliability Engineering Manager
Company name : Siemens Digital Industries Software
Period : October 2023 - Present
Summary : ● Defining and meticulously monitoring Service Level Indicators (SLI) and Service Level Objectives (SLO) to gauge performance and reliability accurately. Concurrently, managing Error Budgets to balance innovation with reliability effectively.
● Spearheading automation initiatives to streamline operational workflows, enhance efficiency within the SRE team, and minimize manual interventions.
● Implementing robust monitoring practices (Metric, Log, Traces, Synthetic, RUM, APM) to establish comprehensive observability and swiftly detect anomalies, ensuring system reliability.
● Orchestrating incident management processes to promptly mitigate service disruptions, focusing on Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR) Management. Conducting Postmortems and Capacity Planning to prevent recurrence.
● Overseeing change management processes to minimize disruptions and uphold reliability in production systems.
● Architecting scalable and reliable distributed systems, advocating immutable infrastructure practices, and promoting DevOps methodologies to improve software delivery and reliability.
● Advocating for chaos engineering practices to systematically test system resilience and enhance overall reliability.
● Developing comprehensive disaster recovery plans to ensure business continuity, including risk identification, critical system prioritization, robust backups, and regular testing.
● Collaborating closely with security teams to embed best practices into operational workflows, implementing controls, conducting audits, and ensuring compliance.
● Investing in the professional growth of SRE team members through mentorship, training, performance evaluations, and career development plans to nurture excellence and support long-term aspirations.
● Ensuring effective communication with stakeholders, providing regular updates on system metrics, gathering feedback, and transparently addressing concerns to maintain alignment with business objectives.
Location : Pune, Maharashtra, India
Job Title : Site Reliability Engineer
Company name : Siemens Digital Industries Software
Period : September 2021 - January 2024
Summary : Responsibilities:
● Lead requirement gathering for new software-as-a-service (SaaS) products and onboard them to the site reliability engineering (SRE).
● Design and implement a comprehensive SRE strategy for new SaaS products.
● Ensure high availability, performance, and reliability of systems through proactive monitoring and troubleshooting.
● Set up full stack observability tooling (infrastructure, application, business transaction, synthetic monitoring).
● Automate tasks and processes to improve efficiency and reduce TOIL.
● Implement SRE ideology, including Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets.
● Participate in incident response and postmortem analysis.
● Assume on-call responsibilities to support production systems.
● Manage and maintain infrastructure, including implementing security measures.
● Collaborate with other teams, such as development and operations, to ensure smooth system operation.
● Implement and maintain continuous integration and delivery pipelines.
● Manage and maintain documentation, including Standard Operating Procedures and Runbooks.
● Participate in capacity planning and resource optimization.
Location : Pune, Maharashtra, India
Job Title : Site Reliability Engineer
Company name : NICE CXone
Period : October 2019 - September 2021
Summary : Responsibilities:
● Implement the SRE solution for CXOne product and lead requirement gathering to identify their reliability needs.
● Automate and develop infrastructure to improve the efficiency and effectiveness of existing systems, with the aim of reducing manual labor (TOIL).
● Manage systems at scale and ensure the reliability and high uptime of internally critical services and externally visible systems.
● Monitor system capacity and performance constantly.
● Participate in incident response and postmortem analysis.
● Assume on-call responsibilities to support production systems
● Collaborate with and contribute to other teams within the organization.
● Assist with the deployment of new products and services.
● Facilitate communication and collaboration between the Network Operations Center (NOC) and Research and Development (R&D) teams.
● Introduce Chaos Engineering and conduct drills to test and improve the resilience of systems.
● Implement Service Level Indicators (SLIs), Service Level Objectives (SLOs), and an Error Budget to measure and improve the reliability of systems.
Location : Pune Area, India
Job Title : Site Reliability Engineer
Company name : Amdocs
Period : February 2019 - October 2019
Summary : Responsibilities:
● Provide third level support to Fraud View Application on Unix/Windows platform.
● Automating the repetitive task to make application streamlined and more efficient.
● Troubleshoot, debug, evaluate and resolve computer-identified alarms.
● Perform deep dive for issue and perform root cause analysis.
● Change management, Incident management, Problem management for Globe Telecom service.
● Coordinating with all stakeholders to ensure timed delivery of the changes and resolution of production issues.
● Create and maintain documentation for new business process, knowledge articles and operating procedures.
Location : Pune Area, India
Job Title : Production Support Engineer
Company name : Tata Consultancy Services
Period : September 2014 - February 2019
Summary : Responsibilities:
● Provide second level support to enterprise service bus applications on Unix/Windows platform.
● Troubleshoot, debug, evaluate and resolve computer-identified alarms.
● Perform deep dive for issue and take part in root cause analysis meetings.
● Change management, Incident management, Problem management for J
Title : HashiCorp Certified: Terraform Associate (003)
Period : September 2023 - September 2025
Summary : 61d986ac-04c4-4ea7-950f-4e698b8f3dcf, credly.com, https://www.credly.com/badges/61d986ac-04c4-4ea7-950f-4e698b8f3dcf
Issuing Authority : HashiCorp
Title : Sumo Logic Fundamentals Certified
Period : August 2022 - August 2024
Summary : 3s4nqhqfoxpg, skilljar.com, https://verify.skilljar.com/c/3s4nqhqfoxpg
Issuing Authority : Sumo Logic
Title : Incident Responder Certification
Period : June 2022 - Present
Summary : 78d1674c-b510-4c9b-93a3-039a4b3ed672, credly.com, https://www.credly.com/badges/78d1674c-b510-4c9b-93a3-039a4b3ed672?source=linked_in_profile
Issuing Authority : PagerDuty
Title : PagerDuty API Certification
Period : June 2022 - Present
Summary : f4e2d897-af2e-4b91-b26e-c8ebbb75fb0d, credly.com, https://www.credly.com/badges/f4e2d897-af2e-4b91-b26e-c8ebbb75fb0d?source=linked_in_profile
Issuing Authority : PagerDuty
Title : PagerDuty Customer Service Operations Certification
Period : June 2022 - Present
Summary : 5a65de2d-d2a2-4b42-ba98-dd255b6fa57e, credly.com, https://www.credly.com/badges/5a65de2d-d2a2-4b42-ba98-dd255b6fa57e?source=linked_in_profile
Issuing Authority : PagerDuty
Title : PagerDuty Foundational Practitioner Certification
Period : June 2022 - Present
Summary : a8bc205d-5b95-4d80-973c-75cb7213b57b, credly.com, https://www.credly.com/badges/a8bc205d-5b95-4d80-973c-75cb7213b57b?source=linked_in_profile
Issuing Authority : PagerDuty
Title : Prometheus | The Complete Hands-On for Monitoring & Alerting
Period : October 2021 - Present
Summary : UC-26560953-df36-4c29-baf5-fcd128755f79, ude.my, https://ude.my/UC-26560953-df36-4c29-baf5-fcd128755f79
Issuing Authority : Udemy
Title : Gremlin Certified Chaos Engineering Practitioner
Period : June 2021 - Present
Summary : 33883034, credential.net, https://www.credential.net/11608eca-d133-410d-9265-887c38bb1338#gs.4cgjbk
Issuing Authority : Gremlin
Title : Leadership Fundamentals
Period : September 2020 - Present
Summary : linkedin.com, https://www.linkedin.com/learning/certificates/f7ad576585194cedfbbae1ca9c3a9b378fd49d4d5265ca420bab8de9abd39544?trk=backfilled_certificate
Issuing Authority : LinkedIn
Title : Leadership Foundations: Leadership Styles and Models
Period : August 2020 - Present
Summary : linkedin.com, https://www.linkedin.com/learning/certificates/d21e7dc77c49351f05607045872613443a7bf9e6890bd7eea2963214dc75043c?trk=backfilled_certificate
Issuing Authority : LinkedIn
Title : Site Reliability Engineering: Service-Level Agreements and Objectives
Period : June 2020 - Present
Summary : linkedin.com, https://www.linkedin.com/learning/certificates/6157ef71c02158792ff4870fffa9a8915521cac03b8b83c25563a7b08c305b70?trk=backfilled_certificate
Issuing Authority : LinkedIn
Title : DevOps Foundations: Site Reliability Engineering
Period : May 2020 - Present
Summary : linkedin.com, https://www.linkedin.com/learning/certificates/5ca67aa84a10495929ba2fd9eddc461b46b7b5a0ff53f10447f9a779a9eb4e31?trk=backfilled_certificate
Issuing Authority : LinkedIn
Title : Leading without Formal Authority
Period : May 2020 - Present
Summary : linkedin.com, http://www.linkedin.com/learning/leading-without-formal-authority?trk=flagship-lil_details_certification
Issuing Authority : LinkedIn
Title : Red Hat System Administrator
Period : January 2018 - Present
Summary : 180-019-770, redhat.com, https://www.redhat.com/rhtapps/services/verify?certId=180-019-770
Issuing Authority : Red Hat
Title : ITIL® Foundation Certificate in IT Service Management
Period : April 2016 - Present
Summary : GR750235527SD, google.com, https://drive.google.com/file/d/0B5z_VLgwn8lHRnFTMkV4Z3ZRVG1ibHRFTEItNldHZTVKMVZz/view?usp=sharing
Issuing Authority : PeopleCert
Title : AWS Certified Solutions Architect – Associate
Period : November 2020 - November 2023
Summary : 2e0cf673-3e40-4623-a265-c599006f5da5, youracclaim.com, https://www.youracclaim.com/badges/2e0cf673-3e40-4623-a265-c599006f5da5?source=linked_in_profile
Issuing Authority : Amazon Web Services (AWS)
Title : Sumo Logic Metrics Mastery Certified
Period : September 2022 - September 2023
Summary : fc3toxyft6h4, skilljar.com, https://verify.skilljar.com/c/fc3toxyft6h4
Issuing Authority : Sumo Logic
Title : Sumo Logic Search Mastery Certified
Period : September 2022 - September 2023
Summary : 5bqutvbzwtzu, skilljar.com, https://verify.skilljar.com/c/5bqutvbzwtzu
Issuing Authority : Sumo Logic
Title : Sumo Logic Administration Certified
Period : August 2022 - August 2023
Summary : co5ibmhxix9i, skilljar.com, https://verify.skilljar.com/c/co5ibmhxix9i
Issuing Authority : Sumo Logic
Title : Sumo Logic Cloud Observability Fundamentals Certified
Period : August 2022 - August 2023
Summary : b8r5mcyaahjn, skilljar.com, https://verify.skilljar.com/c/b8r5mcyaahjn
Issuing Authority : Sumo Logic
English , Hindi , Marathi
Show More