Senior SRE Software Engineer, Storage and Data

Found in: Talent MY C2 - 1 week ago


Bandar Baru Bangi, Malaysia NVIDIA Full time

SRE at NVIDIA ensures that our DGX Cloud platform continues to be reliable and performant to meet the needs of our users. You will play a critical role in ensuring the reliability, availability, and performance of storage infrastructures for NVIDIA DGX GPU cloud platforms. To collaborate with cross-functional teams to design, build, and maintain scalable and fault-tolerant storage solutions that support our mission-critical applications and services. Your expertise in storage systems and reliability engineering will be instrumental in minimizing downtime, improving system efficiency, and enhancing the overall user experience.

SRE is also a mindset and a set of engineering approaches to running efficient production systems, with a focus on eliminating manual work through modern automation practices and performance tuning. We promote self-direction to work on meaningful projects while striving to build an environment that provides the support and mentorship needed to learn and grow.

What You Will Be Doing:

Develop strategies to ensure the reliability and availability of storage systems, including redundancy, failover, and disaster recovery plans.

Continuously analyze and fine-tune storage systems for optimal performance, including throughput optimization, caching, and latency reduction. Identify and resolve performance bottlenecks to enhance overall system efficiency.

Develop and maintain automation scripts and tools to streamline storage provisioning, configuration, and maintenance tasks.

Implement monitoring and alerting systems to proactively identify and address issues.

Participate in on-call rotation to respond to storage-related incidents promptly conduct root cause analysis of outages and implement preventive measures.

Collaborate with cross-functional teams, including Compute SRE, development, and networking, to ensure seamless integration of large-scale storage solutions.

Work with AI/ML workloads to capture and correlate behavior in large clusters and workflows, which are otherwise hard to understand.

What We Need To See:

BS degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), with 5+ years equivalent practical experience.

Proven experience in storage system administration and site reliability engineering.

Experience with Git, RESTFul API, Linux service operation, networking, complexity analysis, AWS S3, software design, and maintaining large-scale Linux based systems.

Experience in one or more of the following languages: Ansible, Bash, Python, Go, YAML, Java

Good knowledge of infrastructure configuration management tools like Ansible, Chef, Puppet, and Terraform.

Experience in using observability and tracing-related tools like InfluxDB, Prometheus, and Elastic(OpenSearch) stack, Grafana. 

Ways to stand out from the crowd:

Experience with storage solutions like: OpenStack Swift(object), AWS S3(object), DDN, Lustre.

Strong Linux and network troubleshooting skills by running various commands and tools.

Demonstrated experience in having an SRE mindset, customer-first approach, and focus on customer satisfaction and passion for ensuring customer success..

Interest in crafting, analyzing, and fixing large-scale distributed systems. Strong debugging skills with a systematic problem-solving approach to identify complex problems.

Experience in using or running large private and public cloud systems based on Kubernetes, OpenStack, and Docker.


  • Senior System Software Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are seeking senior software engineers to work on next-generation graphics and computing products. Our charter is to build the most stressful set of applications a GPU or high performance computing server would see in its life cycle. The best candidates will have strong C++ programming skills, thorough knowledge of graphics concepts and algorithms, a...

  • Senior Software Engineer – Omniverse

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA is a worldwide technology company. Our work in visual computing - the art and science of computer graphics - has led to thousands of patented inventions, breakthrough technologies, deep industry relationships and a globally recognized brand. NVIDIA’s Simulation Technology division builds industry leading technology for high-end virtual experiences...

  • Senior Tool and Methodology Development Software Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers,...

  • Senior Design Engineer, Coherent High Speed Interconnect

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA is looking for a Senior Design Engineer for our Coherent High Speed Interconnect team! The NVLINK-C2C enables the creation of a new class of integrated products with NVIDIA partners, built via chiplets, allowing NVIDIA GPUs, DPUs, and CPUs to be coherently interconnected with custom silicon. To learn more about NVIDIA's ultra-fast chip interconnect...

  • Senior Architect

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are looking for a senior architect for the Multimedia IP team in Taiwan. The team is involved in IP developments for multiple SOC product lines. What you will be doing: Define future aspects of our display subsystem/IP. Hardware architecture end-to-end lifecycle ownership. Cross-disciplinary role driving...

  • Senior ASIC Physical Design Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    ASIC-PD team is hiring both junior and senior engineers, whose work scope is physical design from RTL to GSDII: design quality check, synthesis, formal check, partitioning, constraint (for both design and process), async check, timing analysis/fixing/signoff, also all related flow. Join us, you will work together with expertise in all these areas; you will...

  • Senior Verification Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers,...

  • Senior SOCD Design Automation Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    The NVIDIA SOC Design (SOCD) team is looking for a highly motivated and creative design automation engineer. As someone who is passionate about design automation, you will develop key aspects of our in-house front-end chip development Tools, Flow, and Methodology (TFM) to enable the vision of our chip development and assembly. Our mission is to enable SOCD...

  • Senior Mask Layout Design Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are looking for a Senior Mask Layout Design Engineer – someone who is excited to join a growing group of diverse individuals responsible for handling challenging high-speed mixed-signal circuit designs. NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined...

  • AI Algorithms SW Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA is searching for Deep Learning algorithms architect to develop artificial intelligence (AI), computer vision algorithms and applications for our Metropolis for Factories and Manufacturing platforms. Artificial Intelligence is transforming how we collect, inspect, and analyze different kinds of sensor data that impacts everything from manufacturing...

  • AI Algorithms SW Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA is searching for Deep Learning algorithms architect to develop artificial intelligence (AI), computer vision algorithms and applications for our Metropolis for Factories and Manufacturing platforms. Artificial Intelligence is transforming how we collect, inspect, and analyze different kinds of sensor data that impacts everything from manufacturing...

  • System Level Test Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    Do you want to join the team that is responsible for creating the diagnostic test programs which serve as a final test for millions of chips before they are sent to our customers? We bring together different pieces of software, firmware, and settings, working through issues with several teams to deliver these programs to our manufacturing partners. We...

  • Senior Mixed Signal Design Verification Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are looking for an Engineer to verify the design and implementation of the world’s leading SoC's and GPU's. This position offers the opportunity to have real impact in a dynamic, technology-focused company impacting product lines ranging from consumer graphics to self-driving cars and the growing field of artificial intelligence. We have crafted a team...

  • Operation Planning Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are looking for a RDSS Intern to join NVIDIA's Taiwan Inventory Control team based in the Hsinchu Office. We play a key role in the execution of all NVIDIA's manufacturing plans in Taiwan by maintaining the accuracy, integrity, and timeliness of all inventory records and transactions. In additional to acquiring relevant technical knowledge, you will also...

  • Test Engineer, CatDV

    Found in: Talent MY C2 - 1 week ago


    Bangi, Malaysia Quantum Corporation Full time

    As a valuable member of the CatDV team, your role will involve actively contributing to our software release process and enhancing our automation testing efforts. Your contributions will play a pivotal role in ensuring consistent and high-quality product releases, ultimately elevating the overall quality of our software and documentation. Job...

  • NPI Test Engineer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing — with the GPU acting as the brains of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are...

  • Senior Mixed Signal/Analog Circuit Designer

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are looking for a Senior Mixed-Signal / Analog Circuit Designer – someone who is excited to join a growing group of diverse individuals responsible for handling challenging high-speed mixed-signal circuit designs. NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 sparked the growth of the PC gaming market,...

  • Research Intern, Design Automation

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    NVIDIA Design Automation research group is seeking research interns in the areas of AI for chip design and GPU accelerated EDA. Deep learning and GPU advancement are changing the landscape of chip design. In this internship, you will gain a broad perspective across areas including EDA algorithms and software, machine learning, and VLSI chip design...

  • Factory Planner

    Found in: Talent MY C2 - 1 week ago


    Bandar Baru Bangi, Malaysia NVIDIA Full time

    We are looking for a RDSS Intern to join NVIDIA's Taiwan Inventory Control team based in the Hsinchu Office. We play a key role in the execution of all NVIDIA's manufacturing plans in Taiwan by maintaining the accuracy, integrity, and timeliness of all inventory records and transactions. In additional to acquiring relevant technical knowledge, you will also...

  • Internal Audit Senior

    Found in: Talent MY C2 - 1 week ago


    Bangi, Malaysia Quantum Corporation Full time

    The Internal Audit Senior will play a crucial role in supporting comprehensive audit programs, focusing on operational and compliance effectiveness, including adherence to the Sarbanes-Oxley Act (SOX), across the organization. This position involves actively participating in risk assessments, evaluating internal controls, and contributing strategic insights...