Site Reliability Engineering Lead
3 weeks ago
COMPANY OVERVIEW
A well-established client of us in Kuala Lumpur is seeking for Site Reliability Engineering Lead.
JOB RESPONSIBILITIES
- Team Leadership:
Lead and mentor a team of SREs, fostering a culture of ownership, collaboration, and continuous improvement.
Define clear goals, performance metrics, and development plans for the team. - System Reliability & Performance:
Design and implement strategies to improve system reliability, scalability, and performance.
Conduct root cause analysis of production incidents and develop preventive solutions. - Infrastructure Management:
Oversee the deployment, monitoring, and management of production environments.
Collaborate with development teams to design cloud-native infrastructure and architecture. - Automation & CI/CD:
Drive automation of operational processes, reducing manual intervention and response times.
Optimize CI/CD pipelines to ensure smooth and rapid deployments. - Incident Management:
Establish incident response protocols and lead efforts during major incidents.
Ensure robust monitoring and alerting systems are in place to proactively detect issues. - Collaboration & Communication:
Act as a liaison between engineering, operations, and other teams to align objectives.
Share insights and best practices with internal stakeholders to enhance overall system resilience.
JOB REQUIREMENTS
- Technical Expertise:
Strong experience with cloud platforms (AWS, Azure, Google Cloud) and infrastructure-as-code tools (Terraform, Ansible, etc.).
Proficiency in programming/scripting languages (Python, Go, Shell, etc.).
Deep knowledge of Kubernetes, containerization, and distributed systems. - Leadership Skills:
Proven track record of leading SRE or DevOps teams and managing large-scale production environments.
Strong decision-making, prioritization, and problem-solving capabilities. - Monitoring & Metrics:
Expertise in implementing and using monitoring tools (Prometheus, Grafana, Datadog, etc.) and logging systems.
Familiarity with service-level objectives (SLOs), service-level agreements (SLAs), and error budgets. - Soft Skills:
Excellent communication and collaboration skills to work across cross-functional teams.
Ability to mentor and upskill team members, fostering a learning-oriented culture. - Experience:
At least 8 years of experience in SRE, DevOps, or related roles with a focus on reliability engineering.
#J-18808-Ljbffr
-
Site Reliability Engineering Lead
2 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia Tetrasparks Sdn Bhd Full time1 week ago Be among the first 25 applicantsDirect message the job poster from Tetrasparks Sdn BhdHuman Resources Recruiter at Tetrasparks Sdn BhdWe are seeking an experienced and dynamic Site Reliability Engineering Lead to drive the reliability, scalability, and performance of our production systems in I-Gaming industry. In this role, you will lead a team...
-
Site Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full timeA successful Site Reliability Engineer will possess:Hands-on experience with Azure to manage and optimize cloud infrastructure.Exceptional problem-solving abilities, with the resilience to perform well under pressure.Proficiency in Infrastructure as Code (e.g., Terraform) to automate and streamline processes.Expertise in containerization technologies like...
-
Site Reliability Engineering Director
7 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia SWIFT Full timeAbout the PositionWe are seeking a highly skilled Senior Site Reliability Engineer to lead our team responsible for ensuring the reliability, uptime, and performance of our mission-critical systems. As a Senior SRE Manager, you will be responsible for providing technical leadership, mentorship, and guidance to your team members, as well as collaborating with...
-
Reliability Engineering Expert Lead
2 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia beBee Careers Full timeJob Summary:We are seeking a highly skilled and experienced Site Reliability Engineering Lead to join our team. As a key member of our organization, you will be responsible for leading a team of engineers in designing, implementing, and maintaining high-quality cloud infrastructure solutions.Main Responsibilities:Lead a team of engineers in the design and...
-
Site Reliability Engineer
2 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full timeEnsure high availability and performance of AWS infrastructure, applications, and services.Design, Architect & Implement auto-scaling, load balancing, and failover strategies in AWS environments.Automate repetitive tasks, workflows, and deployments using scripting or various technologies (e.g., PowerShell, Python, Terraform, Ansible, etc).Deploy...
-
Site Reliability Engineer
3 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia Unison Consulting Full timeSite Reliability Engineer (Mandarin Speaker)5 days ago Be among the first 25 applicantsOverview: As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive...
-
Site Reliability Engineer
3 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Randstad (Schweiz) AG Full timeSite Reliability Engineer / SRE (Hybrid) | KLHybridOverview:As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system...
-
Site Reliability Engineer Professional
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full timeA highly skilled Site Reliability Engineer will play a pivotal role in ensuring the smooth operation of our cloud infrastructure. They will be responsible for managing and optimizing our Azure environment to meet the evolving needs of our business.This is an exciting opportunity for a technical expert to join our team at Businesslist, where we are dedicated...
-
Site Reliability Engineer
3 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Hunters International Sdn Bhd Full timeOverview:As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system architecture and design principles, focusing on...
-
Site Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia ServeDeck Innovation Sdn Bhd Full timeDirect message the job poster from ServeDeck Innovation Sdn BhdHR Professional | Talent Acquisition | Labor Law SavvyResponsibilitiesDesign, manage and optimize Continuous Integration and Continuous Delivery.Monitor application and infrastructure (cloud) to ensure system security and availability using the real time dashboard and active alerting...