Site Reliability Engineer
3 weeks ago
Get AI-powered advice on this job and more exclusive features.
Responsibilities- Ensure all our infrastructure are running at optimal condition.
- Provide deployment, patches and updates on all services that are running on public cloud and on premise.
- Identify and resolve support tickets that are related to our infrastructure and services.
- Work closely with developers to provide complete, up-to-date, and readable documentation.
- Develop SRE task-related documentation for future reference and better tracing.
- Monitor our services using Grafana and identify bottlenecks if any. Provide immediate action and troubleshooting when necessary.
- Maintain and enhance our monitoring system including but not limited to Grafana, Victoria Metrics, and Alert Manager.
- Work closely with cross-department teams to provide updates and patches on our services using our CI/CD tools.
- Analyze system logs to provide better understanding of service outages and issues.
- Perform preventive maintenance on our system and infrastructure.
- Always willing to learn new technologies and tools.
- Having 1 year or more in DevOps, Network Engineering, or SRE related fields is required.
- Familiar with Linux and networking related skills.
- Able to work and solve problems independently when required.
- Having hands-on experience with bash scripting.
- Brief understanding of how cloud infrastructure (Alicloud, AWS, GCP, etc.) works.
- Able to be on call.
- Willing to learn new technologies such as Grafana, Terraform, Gitlab CI/CD, ArgoCD, and Ansible.
- Understand how Docker and Kubernetes work.
- Programming experience (Python and Golang).
- Brief understanding of Terraform, Ansible, and Packer is a plus.
- Hands-on knowledge in cloud computing, Kubernetes, Gitlab, etc.
- Hands-on knowledge in Terraform and Ansible related skills.
Entry level
Employment TypeFull-time
Job FunctionInformation Technology
IndustriesIT Services and IT Consulting
Referrals increase your chances of interviewing at Smart Teq Solution Sdn Bhd by 2x.
Get notified about new Site Reliability Engineer jobs in Greater Kuala Lumpur.
#J-18808-Ljbffr-
Reliability Engineer
3 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia The Chemical Engineer Full timeAbout usAt ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world's largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for.The success of our Upstream, Product Solutions and Low Carbon...
-
Site Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full timeA successful Site Reliability Engineer will possess:Hands-on experience with Azure to manage and optimize cloud infrastructure.Exceptional problem-solving abilities, with the resilience to perform well under pressure.Proficiency in Infrastructure as Code (e.g., Terraform) to automate and streamline processes.Expertise in containerization technologies like...
-
Site Reliability Engineer
3 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full timeEnsure high availability and performance of AWS infrastructure, applications, and services.Design, Architect & Implement auto-scaling, load balancing, and failover strategies in AWS environments.Automate repetitive tasks, workflows, and deployments using scripting or various technologies (e.g., PowerShell, Python, Terraform, Ansible, etc).Deploy...
-
Site Reliability Engineer
4 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia Unison Consulting Full timeSite Reliability Engineer (Mandarin Speaker)5 days ago Be among the first 25 applicantsOverview: As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive...
-
Site Reliability Engineer
3 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Randstad (Schweiz) AG Full timeSite Reliability Engineer / SRE (Hybrid) | KLHybridOverview:As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system...
-
Site Reliability Engineer Professional
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Businesslist Full timeA highly skilled Site Reliability Engineer will play a pivotal role in ensuring the smooth operation of our cloud infrastructure. They will be responsible for managing and optimizing our Azure environment to meet the evolving needs of our business.This is an exciting opportunity for a technical expert to join our team at Businesslist, where we are dedicated...
-
Site Reliability Engineering Director
7 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia SWIFT Full timeAbout the PositionWe are seeking a highly skilled Senior Site Reliability Engineer to lead our team responsible for ensuring the reliability, uptime, and performance of our mission-critical systems. As a Senior SRE Manager, you will be responsible for providing technical leadership, mentorship, and guidance to your team members, as well as collaborating with...
-
Site Reliability Engineer
3 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia Hunters International Sdn Bhd Full timeOverview:As a Site Reliability Engineer (SRE), you will play a key role in maintaining the reliability and performance of critical services. Your expertise will help bridge the gap between development and operations, ensuring robust, scalable, and responsive infrastructure. This role emphasizes strong system architecture and design principles, focusing on...
-
Site Reliability Engineer
4 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia iSoftStone Full timeiSoftStone WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, MalaysiaSite Reliability EngineeriSoftStone WP. Kuala Lumpur, Federal Territory of Kuala Lumpur, Malaysia2 weeks ago Be among the first 25 applicantsEnsure the stability of Alibaba apsara stack and cloud services running on it. Carry health check, operation & maintenance, troubleshooting tasks....
-
Site Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia ServeDeck Innovation Sdn Bhd Full timeDirect message the job poster from ServeDeck Innovation Sdn BhdHR Professional | Talent Acquisition | Labor Law SavvyResponsibilitiesDesign, manage and optimize Continuous Integration and Continuous Delivery.Monitor application and infrastructure (cloud) to ensure system security and availability using the real time dashboard and active alerting...