Data Center System Operations Engineer

1 day ago


Johor Bahru, Johor, Malaysia CREW by HRNET Full time

Our client is seeking Data Center System Operations Engineer to join our team to support the daily operation in a state-of-the-art GPU cluster. This role is to ensure the reliability, scalability, and efficiency of our data center operations, supporting high-performance GPU infrastructure for cutting-edge AI workloads.

  • Work location: Kulai, Johor Bahru
  • Working hours: Monday-Friday, 9am-6pm

Key Responsibilities

  • Oversee daily operations of GPU clusters and data center systems
  • Monitor system health, performance, and capacity using industry-standard tools and frameworks
  • Respond to and resolve operational incidents, ensuring minimal downtime and maximum availability
  • Manage the deployment, configuration, and optimization of GPU servers, network devices, and supporting infrastructure (e.g. CPU servers and storage)
  • Perform hardware diagnostics and preventative maintenance for GPU servers, storage, and networking equipment
  • Troubleshoot system issues related to hardware, operating systems, and applications
  • Work closely with cross-functional teams, including network engineers, system administrators, and developers, to support AI workloads
  • Maintain accurate documentation for system configurations, processes, and incident reports
  • Implement and enforce security best practices in system operations
  • Identify and propose improvements to enhance system performance, reduce costs, and optimize resource utilization

Requirements

  • Bachelor's degree in Computer Science, Information Technology, Electrical Engineering, or a related field. Equivalent experience will be considered
  • 2+ years of experience in system operations within IT infrastructure or cloud services
  • Hands-on experience in IT hardware replacement
  • Experience in data center operations, system administration, or a similar role
  • Familiarity with GPU hardware (e.g., NVIDIA GPUs) and AI/ML workloads is a strong advantage
  • Experience with storage systems (e.g., NVMe, SAN, NAS), networking concepts, and protocols (e.g., TCP/IP, RDMA) will be advantageous
  • Knowledgeable in operating ticketing system and troubleshooting process in CPU/GPU cluster
  • Familiarity with networking concepts, including TCP/IP, VLANs, and load balancing
  • Understanding of Linux fundamentals and Kubernetes environments
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana) and logging frameworks

*

Shortlisted candidates will be contacted

*

Jeremiah Lim (R

CREW by HRnet | HRnet Ventures Pte Ltd

EA24C2435



  • Johor Bahru, Johor, Malaysia OzGroup Full time 60,000 - 120,000 per year

    Role DescriptionWe are seeking a dedicated Data Center IT Engineer for a full-time, on-site role located in Johor Bahru. The responsibilities include maintaining IT systems and infrastructure in the data center.Handle incident tickets according to SOPs, including incident acceptance, recording, routing, escalation, tracking, and summarizing.Daily tasks...


  • Johor Bahru, Johor, Malaysia Extreme Broadband Sdn Bhd Full time

    Key Responsibilities:Register customers and manage the exchange of access cards.Escort customers and assist with rack operations (open/close racks in JB1 & JB2).Handle customer material storage and ensure proper documentation.Apply for work permits and loading bay permits as required.Assist with customer equipment installation and removal.Communicate...


  • Johor Bahru, Johor, Malaysia FARBEN INFORMATION (SINGAPORE) PTE. LTD. Full time 96,000 - 180,000 per year

    Job SummaryThe Data Center Facilities Engineer is responsible for the daily operation, maintenance, and optimization of data center infrastructure, including electrical, HVAC, and ELV systems. This role ensures facility reliability, safety, and efficiency through proactive monitoring, risk management, and continuous improvement.Perform daily operation,...


  • Johor Bahru, Johor, Malaysia DayOne Data Center Full time 14,400 - 120,000 per year

    Job Title: Call Center Specialist (Mandarin and Cantonese Speaker)Location: DayOne, Nusajaya Tech Park, JohorJob OverviewWe are seeking an experienced and dynamic Call Center Specialist who is fluent in Mandarin (and Cantonese spoken, written, and reading). This role is critical to our front-line service excellence and requires a candidate with strong...


  • Johor Bahru, Johor, Malaysia Huawei Technologies (Malaysia) Sdn. Bhd Full time 120,000 - 240,000 per year

    Job Description: Data Center facility maintenance, and issue handling, Perform as Senior Technical Engineer on overall Data Center Facility maintenance, including but not limited to customer communication and issue handling, routine maintenance operations execution (Software Update, Health Check, Rectification & Precaution, License Management, EOS Management...


  • Johor Bahru, Johor, Malaysia DayOne Data Center Full time 40,000 - 60,000 per year

    Job Title: Call Center Specialist (Mandarin and Cantonese Speaker)Location: DayOne, Nusajaya Tech Park, JohorJob OverviewWe are seeking an experienced and dynamic Call Center Specialist who is fluent in Mandarin (and Cantonese spoken, written, and reading). This role is critical to our front-line service excellence and requires a candidate with strong...


  • Johor Bahru, Johor, Malaysia Sourceo Full time

    Key QualificationsExperience:5 years of engineering experience with 2+ years of experience in commissioning or project engineering within data centers, critical infrastructure, or similar mission-critical environments.Experience commissioning complex systems such as electrical power distribution, HVAC, fire suppression, cooling, and backup power systems in a...


  • Johor Bahru, Johor, Malaysia GLOBAL DEVELOPMENT TECHNOLOGY SDN. Full time 90,000 - 120,000 per year

    Responsibilities:The incumbent will need to build the relationship with colocation (COLO) service providers and work with them to manage the critical electrical/ mechanical systems within data centers, with responding emergent failure, managing and mitigating the risk and tracking all daily maintenances of critical equipment.The incumbent will also evaluate...


  • Johor Bahru, Johor, Malaysia EZSVS Full time 60,000 - 100,200 per year

    JOB DESCRIPTIONDatacenter On-site Engineers will work within the server or network rack lifecycle process with a focus on helping buildout data centers and daily operation. This position will work to assist manufacturers or system engineers to resolve server or network device failures or provide other on-site supports.RESPONSIBILITIESBe the first point of...


  • Johor Bahru, Johor, Malaysia Extreme Broadband Sdn Bhd Full time 24,000 - 36,000 per year

    Responsibilities:Monitoring and troubleshooting server, storage, and networking equipment for malfunctions or failuresInstalling and configuring new servers, storage devices, routers, switches, and other network componentsConducting capacity assessments of existing infrastructure to ensure that it can support future growthMaintaining detailed documentation...