VCF Platform Reliability Engineer

3 days ago


Greater Kuala Lumpur, Malaysia Chemcastle Sdn Bhd Full time 800,000 - 1,000,000 per year

Description:

POSITION OVERVIEW : Software Development Analyst

POSITION GENERAL DUTIES AND TASKS :

A Site Reliability Engineer (SRE) for VMware Cloud Foundation (VCF) focuses on ensuring the reliability, availability, and performance of the VCF platform through automation, monitoring, and proactive problem-solving. This role involves developing and implementing strategies to improve the platform's stability, collaborating with development and operations teams, and contributing to the overall VCF roadmap.

Key Responsibilities:

Platform Reliability and Availability:

Design and implement strategies to ensure high availability and performance of VCF, including monitoring, alerting, and incident response.

Automation and Tooling:

Develop and maintain automation scripts and tools to streamline operations, improve efficiency, and reduce manual intervention.

Performance Monitoring and Optimization:

Monitor system performance, identify bottlenecks, and implement solutions to optimize resource utilization and overall performance.

Incident Management and Resolution:

Participate in incident response, troubleshoot complex issues, and contribute to post-incident reviews to prevent recurrence.

Collaboration and Communication:

Work closely with development, operations, and other teams to ensure seamless integration and efficient operations.

Documentation and Knowledge Sharing:

Create and maintain comprehensive documentation for system configurations, troubleshooting procedures, and operational best practices.

Continuous Improvement:

Identify areas for improvement in the VCF platform and work with relevant teams to implement changes and enhancements.

Required Skills and Experience:

Strong understanding of VMware Cloud Foundation components, including vSphere, vSAN, NSX, and vRealize Suite.

5-6 years of relevant experience in VCF is required.

Proficiency in scripting languages such as Python, Go, or PowerShell.

Experience with automation tools and frameworks, such as Ansible, Terraform, or SaltStack.

Solid understanding of cloud computing concepts and principles.

Experience with monitoring and alerting tools, such as Prometheus, Grafana, or vRealize Operations.

Strong problem-solving and troubleshooting skills.

Excellent communication and collaboration skills.



  • Greater Kuala Lumpur, Malaysia Chemcastle Sdn Bhd Full time 60,000 - 120,000 per year

    Description:POSITION OVERVIEW : Software Development SpecialistPOSITION GENERAL DUTIES AND TASKS :A Site Reliability Engineer (SRE) for VMware Cloud Foundation (VCF) focuses on ensuring the reliability, availability, and performance of the VCF platform through automation, monitoring, and proactive problem-solving. This role involves developing and...


  • Kuala Lumpur, Kuala Lumpur, Malaysia POWER IT SERVICES Full time 120,000 - 240,000 per year

    Platform Reliability EngineerExperience: 5–7 YearsRole Overview:Responsible for engineering, operating, and maintaining internal container platforms (Broadcom Tanzu, Kubernetes) with a focus on reliability, resiliency, security, and automation.Key Responsibilities:Manage and optimize Tanzu Application Service and Kubernetes clusters.Ensure platform...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Tata Consultancy Services (TCS) Full time 90,000 - 120,000 per year

    Roles & Responsibilities:Job Purpose:Platform Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining GEL's internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security. As a Senior PRE within GEL's Infrastructure team, you will play a pivotal role in designing,...


  • Kuala Lumpur, Kuala Lumpur, Malaysia HCLTech Full time 120,000 - 240,000 per year

    About the role:Platform Reliability Engineer (PRE) is responsible for engineering, operating, and maintaining GEL's internal container platform and its supporting infrastructure, with a strong focus on reliability, resiliency, and security. As a Senior PRE within GEL's Infrastructure team, you will play a pivotal role in designing, building, and operating...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Abhidi Solution Sdn Bhd Full time 72,000 - 120,000 per year

    Key Responsibilities:Maintain and optimize internal container platform (Tanzu + Kubernetes), including provisioning, monitoring, outage response, and capacity planning.Improve platform reliability, security, and scalability by identifying issues, applying Tanzu updates, and automating manual workflows.Participate in 24/7 on-call rotation to resolve...


  • Kuala Lumpur, Kuala Lumpur, Malaysia Abhidi Solution Full time 80,000 - 120,000 per year

    Please find the Skill set requirement for Tanzu (TAS & TKGi)TASTanzu Application Service (TAS), BOSH, Ops Manager, CF Cli.Different types of Tiles using Tanzu platform (Bosh director, antivirus, harbor, compliance scanner, healthwatch)TKGiTanzu Kubernetes Grid integration (basic Architecture).Kubernetes - Lifecycle activities include upgrading the cluster...

  • Platform Engineer

    3 days ago


    Kuala Lumpur, Kuala Lumpur, Malaysia Hyred APAC Full time 100,000 - 150,000 per year

    About the clientOur client specialises in building Agentic AI systems.Role DescriptionOur client is looking to onboard a Platform Engineer to serve as one of the principal architects of SupplyOS, their agentic AI platform that powers procurement, inventory, manufacturing, and logistics operations across Asia.You will help define the long term technical...

  • Reliability Engineer

    16 hours ago


    Kuala Lumpur Centre, Kuala Lumpur, Malaysia ExxonMobil Malaysia Full time 120,000 - 240,000 per year

    As an experienced Reliability Engineer, you will provide strategic, cross-functional, and technical leadership in the areas of reliability engineering, maintenance optimization, and asset management through all stages of project development and asset operation. You will support technology development and implementation to enhance reliability and...

  • Reliability Engineer

    16 hours ago


    Kuala Lumpur, Kuala Lumpur, Malaysia ExxonMobil Full time 150,000 - 250,000 per year

    Location:Kuala Lumpur, 14, MYCompany: ExxonMobilAbout usAt ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world's largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for.The success...


  • Greater Kuala Lumpur, Malaysia Cognizant Full time 100,000 - 120,000 per year

    Job Title : Integration Platform Engineer ExpertLocation : Puchong, MalaysiaThe Data Integration product provides ETL (Extract Transform & Load) platforms with Informatica PowerCenter, PCCE, CDI and Talend Cloud for several entities around the worldResponsibilities:Management and creation / update of jobs TWS (Tivoli IBM);Installation and upgrade of...