Site Reliability Engineer

7 days ago


Kuala Lumpur, Kuala Lumpur, Malaysia KTH HR CONSULTING ZONE Full time

As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system architecture and design principles, focusing on key SRE practices such as Service Level Objectives (SLOs), Service Level Indicators (SLIs), and the reduction of operational toil. You will collaborate closely with diverse teams to drive reliability improvements and foster a culture of continuous learning and accountability.

Key Responsibilities:

  1. Design and implement resilient system architectures that support high availability and scalability.
  2. Develop automation tools and scripts to enhance operational efficiency and reduce manual effort.
  3. Define, track, and analyze SLOs and SLIs to ensure reliability and performance meet business needs.
  4. Conduct thorough post-mortem analyses following incidents, driving continuous improvement through root cause identification and solution implementation.
  5. Collaborate with development and operations teams to establish best practices in system reliability and incident management.
  6. Troubleshoot and resolve issues related to database performance, network connectivity, and deployment failures, including diagnosing problems at the underlying platform level (e.g., Kubernetes, virtual machines).
  7. Ensure that issues are resolved within the stipulated Service Level Agreements (SLAs), maintaining high standards of service delivery.
  8. Identify and troubleshoot performance bottlenecks across systems, providing actionable recommendations for enhancements.
  9. Maintain detailed documentation of processes and incident responses to support knowledge sharing and compliance.

Qualifications:

  1. Proficiency in programming languages such as Python, Golang, Java, or similar, focusing on operational efficiency.
  2. Demonstrated experience in system architecture and design, prioritizing reliability, and scalability.
  3. Strong understanding of SRE principles, including SLOs, SLIs, toil reduction, and incident post-mortems.
  4. Experience with cloud environments (e.g., AWS, Azure, Google Cloud) and their operational management.
  5. Strong expertise in Linux system administration.
  6. Proven experience in troubleshooting application support issues with a focus on performance and connectivity.
  7. Familiarity with networking concepts and effective troubleshooting techniques.
  8. Excellent problem-solving abilities and a proactive approach to operational challenges.
  9. Ability to work independently while effectively collaborating within a team environment.

Preferred Skills:

  1. Familiarity with monitoring tools and performance optimization techniques.
  2. Experience in scripting or automation for system administration tasks.
  3. Knowledge of networking concepts and troubleshooting methodologies.
  4. Hands-on knowledge of cloud platforms (e.g., AWS, Azure, Google Cloud) and their services.
  5. Familiarity with DevOps practices and frameworks, including CI/CD, infrastructure as code, and containerization.
Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology

Industries

Human Resources Services

#J-18808-Ljbffr

  • Kuala Lumpur, Kuala Lumpur, Malaysia Chinasoft International (CSI) Full time

    Company DescriptionWe suggest you enter details here.Role DescriptionThis is a full-time on-site role for a Site Reliability Engineer based in WP. Kuala Lumpur. The Site Reliability Engineer will be responsible for maintaining system reliability and availability. Daily tasks will include troubleshooting issues, ensuring proper infrastructure setup, and...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Teknowiz Full time

    Site Reliability Engineer (DevOps Consultant)We are urgently hiring for one of our Big4 clients in Malaysia.Job Title: Site Reliability Engineer (DevOps)Location: KL/Johor/Penang (Onsite)Job Overview: As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help...


  • Kuala Lumpur, Kuala Lumpur, Malaysia MindPec Solutions Full time

    Role: Senior Linux Site Reliability EngineerClient: IT ServicesWorking Mode: On Site (Monday to Friday 9AM to 6 PM)Job Type: PermanentExperience: More than 3 years experience in Site Reliability Engineer or DevOps Engineer.Applicants: Open to Local candidates who can speak English & Chinese.JOB DESCRIPTIONDevelop, deploy, and maintain reliable,...


  • Kuala Lumpur, Kuala Lumpur, Malaysia PERSOLKELLY Malaysia Full time

    This job is for a Site Reliability Engineer focusing on network systems. You might like this job because you'll automate tasks, ensure services run smoothly with minimal downtime, and work with coding languages like Java or Python in a dynamic tech environmentJob Description:Ability to debug scripts and automate routine tasks in OS, network, database or...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Ant International Full time

    Senior Site Reliability Engineer (DevOps)Ant International powers the future of global commerce with digital innovation for everyone and every business to thrive. In close collaboration with partners, we support merchants of all sizes worldwide to realize their growth aspirations through a comprehensive range of tech-driven digital payment and financial...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Glints Full time

    Glints Federal Territory of Kuala Lumpur, MalaysiaSite Reliability EngineerReady to elevate your career with a globally recognized professional services firm? We are seeking a skilled DevOps / SRE Specialist to join our team. You'll be at the forefront of transforming business challenges into cutting-edge technology solutions, working alongside diverse...


  • Kuala Lumpur, Kuala Lumpur, Malaysia iSoftStone Full time

    iSoftStone WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, MalaysiaSite Reliability EngineeriSoftStone WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia2 weeks ago Be among the first 25 applicantsEnsure the stability of Alibaba apsara stack and cloud services running on it. Carry health check, operation & maintenance, troubleshooting tasks....


  • Kuala Lumpur, Kuala Lumpur, Malaysia AIA Hong Kong and Macau Full time

    Site Reliability Engineer, Principal page is loadedSite Reliability Engineer, PrincipalApply locations Kuala Lumpur, MY-AIA Malaysia time type Full time posted on Posted 30+ Days Ago job requisition id JR-45612At AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.As pioneering innovators for over 100 years,...


  • Kuala Lumpur, Kuala Lumpur, Malaysia WCC Full time

    Join us in providing work that mattersWCC has changed lives since 1996. We are a group of highly ambitious professionals who believe in the greater story. WCC is more than just a software organization; we are a community that strives for improving human life. We provide software that matters.Our product is an advanced Search and Match engine used in...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Razer Inc. Full time

    About Razer Inc.Razer is a leading gaming company that empowers gamers and tech enthusiasts around the world. We strive to create innovative products and experiences that inspire creativity and self-expression.As a Senior Site Reliability Engineer, you will be part of our team responsible for ensuring the reliability and scalability of our infrastructure.You...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Razer Inc. Full time

    Senior Site Reliability EngineerApply locations: Bangsar SouthTime type: Full timePosted on: Posted 30+ Days AgoJob requisition id: JR2024004634Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Unison Consulting Full time

    We are seeking a skilled Site Reliability Engineer (SRE) to join our team at Unison Consulting. As an SRE, you will be responsible for ensuring the reliability and performance of our services.ResponsibilitiesDesign and Implement Automated Operational ProcessesManage and Monitor Linux-based Systems for Performance, Availability, and SecurityCollaborate with...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Jones Lang LaSalle Incorporated Full time

    Reliability Engineering & Operations DirectorJob OverviewSolicitar remote type On-site locations Kuala Lumpur, Malaysia time type Full time posted on Publicado ayer job requisition id REQ420122JLL empowers you to shape a brighter way.Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class...


  • Kuala Lumpur, Kuala Lumpur, Malaysia DUG Technology Full time

    We are a technology company at the forefront of high-performance computing (HPC) with a strong foundation in applied physics. Our innovative hardware and software solutions for the global technology and resource sectors enable our clients to leverage large and complex data sets.We operate three world-class green supercomputer clusters, running a large suite...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Jones Lang LaSalle Incorporated Full time

    Job OverviewSolicit applications for a Reliability Engineering & Operations Director position at Jones Lang LaSalle Incorporated.This role is responsible for developing and implementing reliability-centered maintenance strategies across the region, overseeing preventive and predictive maintenance programs for all facilities and assets, and supporting the...


  • Kuala Lumpur, Kuala Lumpur, Malaysia PEOPLE PROFILERS Full time

    Job Description:We are seeking an experienced Reliability Engineering Specialist to join our team. In this role, you will be responsible for the management and delivery of a system(s) within a platform leveraging agile practices.The ideal candidate will have at least 3 - 6 years of relevant experience in DevOps, SRE, and a full understanding of Site...


  • Kuala Lumpur, Kuala Lumpur, Malaysia JLL Full time

    JLL empowers you to shape a brighter way.Our people at JLL and JLL Technologies are shaping the future of real estate for a better world by combining world class services, advisory and technology for our clients. We are committed to hiring the best, most talented people and empowering them to thrive, grow meaningful careers and to find a place where they...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Teknowiz Full time

    About TeknowizWe are a forward-thinking company that delivers innovative IT solutions to our clients. Our mission is to provide top-notch services that meet the evolving needs of modern businesses.As a Site Reliability Engineer at Teknowiz, your primary focus will be on maintaining the uptime and performance of critical systems. This involves designing and...

  • Reliability Engineer

    2 weeks ago


    Kuala Lumpur, Kuala Lumpur, Malaysia Exxon Mobil Full time

    At ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world's largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for.The success of our Upstream, Product Solutions and Low Carbon...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Exxon Mobil Full time

    At ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world's largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for.The success of our Upstream, Product Solutions and Low Carbon...