Site Reliability Engineer

2 weeks ago


Kuala Lumpur, Kuala Lumpur, Malaysia Hunters International Sdn Bhd Full time

Overview:

As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system architecture and design principles, focusing on key SRE practices such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and the reduction of operational toil. You will collaborate closely with diverse teams to drive reliability improvements and foster a culture of continuous learning and accountability.

Responsibilities:

  • Design and implement resilient system architectures that support high availability and scalability.
  • Develop automation tools and scripts to enhance operational efficiency and reduce manual effort.
  • Define, track, and analyze SLOs and SLIs to ensure reliability and performance meet business needs.
  • Conduct thorough post-mortem analyses following incidents, driving continuous improvement through root cause identification and solution implementation.
  • Collaborate with development and operations teams to establish best practices in system reliability and incident management.
  • Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).
  • Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery.
  • Identify and troubleshoot performance bottlenecks across systems, providing actionable recommendations for enhancements.
  • Maintain detailed documentation of processes and incident responses to support knowledge sharing and compliance.

Requirements:

  • Proficiency in Mandarin is a must, in order to liaise with stakeholders from China.
  • Proficiency in programming languages such as Python, Golang, Java, or similar, focusing on operational efficiency.
  • Demonstrated experience in system architecture and design, prioritizing reliability and scalability.
  • Strong understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post-mortems.
  • Experience with cloud environments (e.g., AWS, Azure, Google Cloud) and their operational management.
  • Strong expertise in Linux system administration.
  • Proven experience in troubleshooting application support issues with a focus on performance and connectivity.
  • Familiarity with networking concepts and effective troubleshooting techniques.
  • Excellent problem-solving abilities and a proactive approach to operational challenges.
  • Ability to work independently while effectively collaborating within a team environment.
  • Familiarity with monitoring tools and performance optimization techniques.
  • Experience in scripting or automation for system administration tasks.
  • Hands-on knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and their services.
  • Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization.

Location:

To be based in client's site (Klang Valley)

Remuneration:

Up to MYR 19,000 (Based on relevant experience)

Consultant in Charge:

Rodney Chong | rodney.chong@hunters-in.com | 016 838 2188

This is a contract position with the possibility to be absorbed as a permanent staff.

#J-18808-Ljbffr
  • Reliability Engineer

    2 weeks ago


    Kuala Lumpur, Kuala Lumpur, Malaysia The Chemical Engineer Full time

    About usAt ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world's largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for.The success of our Upstream, Product Solutions and Low Carbon...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full time

    A successful Site Reliability Engineer will possess:Hands-on experience with Azure to manage and optimize cloud infrastructure.Exceptional problem-solving abilities, with the resilience to perform well under pressure.Proficiency in Infrastructure as Code (e.g., Terraform) to automate and streamline processes.Expertise in containerization technologies like...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Teknowiz Full time

    Site Reliability Engineer (DevOps Consultant)We are urgently hiring for one of our Big4 clients in Malaysia.Job Title: Site Reliability Engineer (DevOps)Location: KL/Johor/Penang (Onsite)Job Overview: As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help...

  • Reliability Leader

    2 weeks ago


    Kuala Lumpur, Kuala Lumpur, Malaysia The Chemical Engineer Full time

    About ExxonMobilWe are a leading energy and chemical company that drives innovation to advance modern living and a net-zero future.As an experienced Reliability Engineer, you will provide strategic leadership in the areas of reliability engineering, maintenance optimization, and asset management.You will support technology development and implementation to...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Aarorn Technologies Sdn Bhd Full time

    - ONSITE - Language proficiency - Mandarin - 5+ years exp as SREOverview:Site Reliability Engineer (SRE)As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Ibroad Solutions Full time

    About the RoleIbroad Solutions is seeking a highly skilled Site Reliability Engineer/DevOps Specialist to join our team. This role involves designing and implementing resilient system architectures that support high availability and scalability.The ideal candidate will have a strong understanding of SRE principles, including SLOs, SLIs, toil reduction, and...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Ant International Full time

    Senior Site Reliability Engineer (DevOps)Ant International powers the future of global commerce with digital innovation for everyone and every business to thrive. In close collaboration with partners, we support merchants of all sizes worldwide to realize their growth aspirations through a comprehensive range of tech-driven digital payment and financial...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Randstad (Schweiz) AG Full time

    Site Reliability Engineer / SRE (Hybrid) | KLHybridOverview:As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full time

    A highly skilled Site Reliability Engineer will play a pivotal role in ensuring the smooth operation of our cloud infrastructure. They will be responsible for managing and optimizing our Azure environment to meet the evolving needs of our business.This is an exciting opportunity for a technical expert to join our team at Businesslist, where we are dedicated...


  • Kuala Lumpur, Kuala Lumpur, Malaysia KTH HR CONSULTING ZONE Full time

    As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system architecture and design principles, focusing on key SRE...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Ant International Full time

    Senior Site Reliability EngineerAt Ant International, we empower merchants of all sizes to thrive in a rapidly changing global commerce landscape. Our comprehensive range of digital payment and financial services solutions enables businesses to realize their growth aspirations.We are seeking a seasoned Senior Site Reliability Engineer for our Malaysia Tech...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Smart Teq Solution Sdn Bhd Full time

    We're seeking an experienced Site Reliability Engineer Lead to join our team at Smart Teq Solution Sdn Bhd.About the RoleThis role involves ensuring all our infrastructure runs at optimal condition.Provide deployment, patches, and updates on all services running on public cloud and on-premise.Work closely with developers to provide complete, up-to-date, and...


  • Kuala Lumpur, Kuala Lumpur, Malaysia iSoftStone Full time

    iSoftStone WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, MalaysiaSite Reliability EngineeriSoftStone WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia2 weeks ago Be among the first 25 applicantsEnsure the stability of Alibaba apsara stack and cloud services running on it. Carry health check, operation & maintenance, troubleshooting tasks....


  • Kuala Lumpur, Kuala Lumpur, Malaysia ServeDeck Innovation Sdn Bhd Full time

    Direct message the job poster from ServeDeck Innovation Sdn BhdHR Professional | Talent Acquisition | Labor Law SavvyResponsibilitiesDesign, manage and optimize Continuous Integration and Continuous Delivery.Monitor application and infrastructure (cloud) to ensure system security and availability using the real time dashboard and active alerting...


  • Kuala Lumpur, Kuala Lumpur, Malaysia AIA Hong Kong and Macau Full time

    Site Reliability Engineer, Principal page is loadedSite Reliability Engineer, PrincipalApply locations Kuala Lumpur, MY-AIA Malaysia time type Full time posted on Posted 30+ Days Ago job requisition id JR-45612At AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.As pioneering innovators for over 100 years,...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Smart Teq Solution Sdn Bhd Full time

    Get AI-powered advice on this job and more exclusive features.ResponsibilitiesEnsure all our infrastructure are running at optimal condition.Provide deployment, patches and updates on all services that are running on public cloud and on premise.Identify and resolve support tickets that are related to our infrastructure and services.Work closely with...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Hunters International Sdn Bhd Full time

    Overview:Hunters International Sdn Bhd is a leading provider of specialized recruitment services. As a Site Reliability Engineer, you will play a crucial role in maintaining the reliability and performance of our critical systems.We are seeking an experienced system architect to join our team. Your expertise will help bridge the gap between development and...


  • Kuala Lumpur, Kuala Lumpur, Malaysia WCC Full time

    Join us in providing work that mattersWCC has changed lives since 1996. We are a group of highly ambitious professionals who believe in the greater story. WCC is more than just a software organization; we are a community that strives for improving human life. We provide software that matters.Our product is an advanced Search and Match engine used in...


  • Kuala Lumpur, Kuala Lumpur, Malaysia JAC Recruitment Full time

    COMPANY OVERVIEWA well-established client of us in Kuala Lumpur is seeking for Site Reliability Engineering Lead.JOB RESPONSIBILITIESTeam Leadership:Lead and mentor a team of SREs, fostering a culture of ownership, collaboration, and continuous improvement.Define clear goals, performance metrics, and development plans for the team.System Reliability &...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Randstad (Schweiz) AG Full time

    Job Description:This is a key role in maintaining the reliability and performance of critical services. The Site Reliability Engineer will bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure.