System Reliability Engineer, Consultant
1 week ago
At AIA we've started an exciting movement to create a healthier, more sustainable future for everyone.
As pioneering innovators for over 100 years, we're now transforming our organisation to be faster, simpler and more connected. Because we want to be even better equipped to develop digital solutions and experiences that help more people live Healthier, Longer, Better Lives.
To get there, we need people with tech/digital/analytics expertise and passion to help develop positive, sustainable change through digitally enhanced experiences that will impact the lives of millions of people and create a healthier future for everyone.
If you believe in developing a better tomorrow, read on.
About the Role
To ensure the reliability, scalability, and performance of enterprise systems and services by applying software engineering principles to operations. The System / Site Reliability Engineer will collaborate with development and operations teams to build robust automation, monitor system health, respond to incidents, and continuously improve service availability and efficiency. This role is critical in bridging the gap between software development and IT operations, fostering a culture of resilience, observability, and proactive problem-solving.Job Description
Ensure System Reliability and Availability
- Oversee application performance, report any deviation and issue
- Collaborate with application engineers and developers in root cause identification
Incident Management and Root Cause Analysis
- Participate in incident response efforts for production outages as Subject Matter Advisor
- Provide insights from monitoring and in-depth code/database review
- Assist Application Operation post-mortems review
Automation and Tooling
- Automate operational tasks such as monitoring, and recovery.
- Develop scripts and tools to reduce manual toil and improve efficiency.
Monitoring and Observability
- Implement robust telemetry systems to monitor application health, latency, and error rates.
- Manage Dynatrace platform and integration with all application services
- Assist Application team in dashboarding design and setup
Security and Compliance
- Collaborate with Security teams to ensure systems meet regulatory and security standards (e.g., PCI-DSS, GDPR).
- Implement access controls, encryption, and audit mechanisms as and where required by the scope of SRE team
Capacity Planning and Performance Optimization
- Assist in analyzing usage trends to forecast demand and scale infrastructure accordingly.
- Participate in resource optimization utilization to balance cost and performance.
Work closely with development, QA, and infrastructure teams to embed reliability into the SDLC. Promote SRE principles across teams to foster a culture of resilience and accountability.
Maintain clear operational documentation, runbooks, and architecture diagrams.
Job Requirements
Bachelor's degree in Computer Science, Software Engineering, Information Technology, or a related field.
3–5 years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles.
Prior experience supporting front-end applications in production environments, preferably in financial services or regulated industries.
Frontend Performance Monitoring; Ability to instrument front-end code for custom metrics and traces.
Experience with Real User Monitoring (RUM), Synthetic Monitoring, and Application Performance Monitoring (APM) tools (e.g., New Relic, Dynatrace, Datadog).
Proficiency in setting up dashboards and alerts using tools like Dynatrace, Grafana, Prometheus, Elastic Stack, or Splunk.
Familiarity with OpenTelemetry standards for distributed tracing.
Scripting skills in Python, Bash, or JavaScript for automation and tooling.
Experience with CI/CD pipelines (e.g., GitHub Flow).
Hands-on experience with cloud platforms (AWS, Azure).
Familiarity with containerization (Docker) and orchestration (Kubernetes).
Understanding of secure coding practices for front-end applications.
Awareness of financial compliance standards (e.g., PCI-DSS).
Build a career with us as we help our customers and the community live Healthier, Longer, Better Lives.
You must provide all requested information, including Personal Data, to be considered for this career opportunity. Failure to provide such information may influence the processing and outcome of your application. You are responsible for ensuring that the information you submit is accurate and up-to-date.
-
Site Reliability Engineering Manager
7 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia Avensys Consulting Full timeIf you are passionate about playing a key role in the success of a Top-Notch company, we want to hear from youOur client's project is a well-established brand in Information Technology Industry who is now looking for a passionate and driven Site Reliability Engineering Lead. This is an exciting opportunity to expand your skill set, achieve job satisfaction...
-
Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia ExxonMobil Full time 150,000 - 250,000 per yearLocation:Kuala Lumpur, 14, MYCompany: ExxonMobilAbout usAt ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world's largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for.The success...
-
Reliability Engineer
2 weeks ago
Kuala Lumpur Centre, Kuala Lumpur, Malaysia ExxonMobil Malaysia Full time 120,000 - 240,000 per yearAs an experienced Reliability Engineer, you will provide strategic, cross-functional, and technical leadership in the areas of reliability engineering, maintenance optimization, and asset management through all stages of project development and asset operation. You will support technology development and implementation to enhance reliability and...
-
Reliability Qualification Engineer
1 week ago
Kuala Lumpur, Kuala Lumpur, Malaysia NXP Semiconductors Full timeReview reliability qualification strategies to meet the NXP Reliability Policy and Requirement (RPR) and Customer specific requirement.Peer of local and global reliability qualification process regarding to reliability knowledge to participate a thorough risk-analysis based on FMEA analysis, RKM, knowledge of structural similarity rules and applicable...
-
Business Strategy and Development, Consultant
2 weeks ago
Kuala Lumpur, MY-AIA Malaysia AIA Full timeFIND YOUR 'BETTER' AT AIAWe don't simply believe in being 'The Best'. We believe in better - because there's no limit to how far 'better' can take us.We believe in empowering every one of our people to find their 'better' - in the work they do, the career they build, the life they live and the difference they make. So that together we can support even more...
-
Associate Director, System Support
1 week ago
Kuala Lumpur, MY-AIA Malaysia AIA Full timeFIND YOUR 'BETTER' AT AIAWe don't simply believe in being 'The Best'. We believe in better - because there's no limit to how far 'better' can take us.We believe in empowering every one of our people to find their 'better' - in the work they do, the career they build, the life they live and the difference they make. So that together we can support even more...
-
Platform Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia POWER IT SERVICES Full time 120,000 - 240,000 per yearPlatform Reliability EngineerExperience: 5–7 YearsRole Overview:Responsible for engineering, operating, and maintaining internal container platforms (Broadcom Tanzu, Kubernetes) with a focus on reliability, resiliency, security, and automation.Key Responsibilities:Manage and optimize Tanzu Application Service and Kubernetes clusters.Ensure platform...
-
Senior System Engineer
5 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia PT. Wide Technologies Indonesia Full timeWe are an established IT Consulting firm, partnering with top regional and global clients to deliver robust and scalable technology solutions.To support our rapid growth, we are seeking an experienced Senior/Lead System Engineer/Implementation Engineer to take technical ownership of application and system deployment initiatives.This role is hands-on and...
-
Site Reliability Engineer
7 days ago
Kuala Lumpur, Kuala Lumpur, Malaysia PERSOL APAC Full timeAbout the company:We have partnered with a renowned global leader in information and communications technology (ICT) infrastructure and smart devices. They are providing full-stack, all-scenario solution for products and services carriers, enterprises, governments, and individual consumers worldwide.Our client is looking for enthusiasticSite Reliability...
-
Reliability Engineer
2 weeks ago
Kuala Lumpur, Kuala Lumpur, Malaysia John Crane Full time 60,000 - 120,000 per yearCompany DescriptionJohn Crane, a business of Smiths Group, is a global leader in mission-critical flow control solutions for energy and process industries that enable efficient and sustainable operations. Our products include mechanical seals and systems, couplings, bearings, filtration systems, and predictive digital monitoring technologies.We have a global...