Site Reliability Engineer

1 month ago


Malaysia The Hague Tech Full time

Experian Decision Analytics (DA) integrates predictive data and analytics into valuable business decisions that provide greater insight into decision performance and helps companies keep pace with changing business priorities. By applying expert consulting, analytical tools, software and systems to convert data into valuable business decisions.

Our expertise spans a variety of industries and we provide software to some of the world's largest finance, telcos and other blue-chip companies. The crown jewel in our software suite is PowerCurve which provides best-in-class decisioning applied across the whole customer life-cycle from customer acquisition to in-life and collections, as well as in fraud detection and identity resolution systems. PowerCurve is able to execute on hosted and cloud platforms.

The ideal candidate will have experience of operations, a passion for automation and an interest in software development or they will have experience of software development, a passion for automation and an interest in operational excellence. If you have incident manager skills and are able to manage rationally and calmly during a crisis that would be an added bonus. There is an expectation to work occasional peak weekends as well as some on call requirements. This is the beginning of a growing team and we are looking for individuals to grow with it.

As a senior engineer we will be looking to you to role model behaviours as well as give technical leadership within the team.

You will lead the team's technical vision bridging the gap across platforms, infrastructure, automation and software. You will be able to review and design nonfunctional requirements, prioritise key areas of operational architecture and guide both operational staff and software feature engineers on SRE best practice.

Job Description

Primary Accountabilities:

  • Uptime of Experian One – Experian's Cloud SaaS offering for Decision Analytics.
  • Enhancing and automation of the Monitoring and Alerting of our platform
  • Responding to incidents and restoring service, but also identifying issues before they happen.
  • Gaining a strong understanding of the systems to efficiently triage issues and find owners for problem resolution
  • An ability to identify an issue or a manual process and ensure that they never occur again
  • Incident management; able to co-ordinate others and be co-ordinated during service disruptions with a focus on restoring availability
  • Ability to write complex queries using various tools, an ability to lead others to excel in this field
  • Reviewing systems designs and implementations to identify resiliency, scalability and monitoring issues prior to implementation
  • Strong Knowledge of Kubernetes, Infrastructure as Code, High availability principles.
  • Excellent communication skills in English with colleagues across the globe.

Working Practices and Relationships:

  • Strong relationships with other members of the SRE team, primarily based in Kuala Lumpur but also London, Arizona, Sofia
  • Working relationships with colleagues in other departments, third parties who support backing applications.
  • Collaborative relationships with developers, security and architects to influence them to build resilient, maintainable solutions
  • Proficiency in one programming or scripting language and willingness to apply software development best practices to an operational role
  • Work closely with the development team to bring new software releases to production more effectively
  • Develop and deploy OSS tools for use within the Cloud Operations group
  • Develop, deploy and manage a highly scalable and high-availability platform which monitor and maintain service performance and availability metrics
  • Automate everything from deployment, monitoring, management and incident response - treat 'Everything as Code' Take ownership of our configuration management platforms
  • Collaborate with developers to bring new features and services into production
  • Assist identifying and mitigating security threats to comply with a strict security compliance
  • Develop and improve operational practices and procedures
  • Produce high-level design documentation where required
  • Innovation in all areas- people, process, technology.
  • Ensure close collaboration between Development and Operations, enabling smoother operation between team.

More about you

  • Direct experience of supporting complex, highly scaled systems in production
  • Linux knowledge , experience troubleshooting and predicting issues in advance
  • Networking, troubleshooting and monitoring
  • Cloud Native application designs for high performance, scalability and resilience
  • Incident Management and co-ordination, Blameless PIRs
  • Kubernetes, OpenShift , Splunk, Dynatrace, Thousand Eyes, ServiceNow, Jira, Jenkins, Python, Prometheus
  • Infrastructure as Code , Git Ops
  • Line management, coaching or mentoring.

Key Behaviors

  • Excellent communication skills. Written and verbal fluency in English is required
  • Highly organized and with a good attention to detail
  • #CustomerObsessed
  • Working across boundaries - geographically, teams, language and cultural
  • Curious and willing and able to learn new technologies and practices
  • Cloud aware, you understand how cloud technologies differ from other technical approaches and are able to explain these to others.
  • Lives and breathes availability and operational excellence in technology

Is this you?

  • You strive to remove repetitive tasks from your daily existence
  • You are a keen follower of technology trends
  • You believe that software is to be used not to be admired.
  • You solve for the future as well as the immediate
  • You empower others to deliver
  • You develop trust, you make conflict constructive, create commitment, drive accountability and drive results
  • You are articulate, clear, concise, and you can tailor your approach to the audience
  • You can manage stakeholders at all levels and influence decision making

Why this role is critical to us

As part of the next phase in DA growth, Engineering Services is looking to expand the Site Reliability Engineering team to offer round the global cover. As an organisation we are fully convinced that everything should be automated and that software should run software and believe in the Site Reliability Engineering model. We have established a platform using cutting edge technology, such as Kubernetes, containers, pipelines and monitoring. Continuous improvements are required to ensure our platform and tools are reliable and scalable to meet the increasing demand in various regions in future. This position is crucial for us to maintain the pace in scaling the SRE team.

Qualifications
  • Proven track record of managing AWS workloads, with particular focus on ECS and EKS clusters
  • Good understanding of key architectural principles such as high availability, disaster recovery and platform resiliency
  • DevOps skillset including Terraform and Jenkins
  • Excellent interpersonal and communication skills
  • Ability to innovate and show initiative
Additional Information

Experian Careers - Creating a better tomorrow together

#J-18808-Ljbffr

  • Malaysia Guidewire Software Full time

    The Opportunity We are searching for a Site Reliability Engineer eager for a rare chance to transform insurance with the industry's leading cloud platform. As a member of the SRE-Application team, you'll be responsible for building and evolving our SRE practice for the applications running on our Guidewire Cloud Platform. This is an opportunity to apply...


  • Malaysia Experian Health Full time

    Ready to make a difference? Experian has evolved into a global tech company and leader in data and analytics. We’re passionate about unlocking the power of data in order to transform lives and create opportunities for consumers, businesses and society. We’re a constituent of the FTSE 30 and for more than 125 years we’ve helped economies and...


  • Malaysia Experian Health Full time

    Ready to make a difference? Experian has evolved into a global tech company and leader in data and analytics. We're passionate about unlocking the power of data in order to transform lives and create opportunities for consumers, businesses and society. We're a constituent of the FTSE 30 and for more than 125 years we've helped economies and communities...


  • Malaysia The Hague Tech Full time

    Experian Decision Analytics (DA) integrates predictive data and analytics into valuable business decisions that provide greater insight into decision performance and helps companies keep pace with changing business priorities. By applying expert consulting, analytical tools, software and systems to convert data into valuable business decisions. Our...


  • Malaysia Glints Full time

    About the Role As a SeniorSite Reliability Engineer (SRE) Consultant , you will be critical in developing technology-enabled solutions for our clients' complex business problems. You will utilize your technical expertise, analytical abilities, and communication skills to support project teams in delivering digital solutions, improving performance, and...


  • Malaysia Career Horizons Full time

    Company overview Join an established MNC Telco company in their team based in Petaling Jaya. Due to ambitious growth plans, they are expanding their operations within the region and are looking to bring onboard professionals to support the IT team. The new role Your role as a Site Reliability Engineer encompasses the whole lifecycle of services from...


  • Malaysia The Chemical Engineer Full time

    About us At ExxonMobil, our vision is to lead in energy innovations that advance modern living and a net-zero future. As one of the world’s largest publicly traded energy and chemical companies, we are powered by a unique and diverse workforce fueled by the pride in what we do and what we stand for. The success of our Upstream, Product Solutions and Low...

  • Reliability Engineer

    2 months ago


    Malaysia ExxonMobil Corporation Full time

    What you will do: Equipment Strategy development Fleet and RCM(Reliability Centered Maintenance) based Reliability modeling and prediction Support in investigations using Fault Tree Analysis, Weibull Tree Analysis, Root-Cause failure Analysis Single Point of failure studies LCCA (Life Cycle Cost Analysis) Unplanned Loss Analysis RAM(Reliability...


  • Malaysia Glints Full time

    About the Role As a SeniorSite Reliability Engineer (SRE) Consultant , you will be critical in developing technology-enabled solutions for our clients' complex business problems. You will utilize your technical expertise, analytical abilities, and communication skills to support project teams in delivering digital solutions, improving performance,...


  • Malaysia Guidewire Software Full time

    The Opportunity We are searching for a Site Reliability Engineer eager for a rare chance to transform insurance with the industry's leading cloud platform. As a member of the SRE-Application team, you'll be responsible for building and evolving our SRE practice for the applications running on our Guidewire Cloud Platform. This is an opportunity to...


  • Malaysia Power Integrations Full time

    Job Description: Collaborate with the burn in board/Apps design team to drive the readiness of automotive burn in hardware, ensuring seamless operation in external burn in facilities.Coordinate with package houses (OSATs) and external burn in labs to oversee smooth Automotive burn in operations, requiring regular travel for on-site support, debugging, and...


  • Malaysia Power Integrations Full time

    DescriptionJob Description: Collaborate with the burn in board/Apps design team to drive the readiness of automotive burn in hardware, ensuring seamless operation in external burn in facilities.Coordinate with package houses (OSATs) and external burn in labs to oversee smooth Automotive burn in operations, requiring regular travel for on-site support,...


  • Malaysia Intel GmbH Full time

    Job Description This is an exciting opportunity to participate in the transformation of the automotive industry as it transitions away from gasoline and diesel vehicles to Electric and Autonomous Vehicles (EV and AV). In this role, you will be providing technical support to car manufacturers and their partners, helping to solve our customers' technical...


  • Malaysia Celestica Inc. Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Summary Systems & Applications Design is an interdisciplinary field of engineering that works with cross functional teams of other engineers, customers, supply chain and project leadership to ensure robust and high...


  • Malaysia Celestica Inc. Full time

    Press Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Summary Systems & Applications Design is an interdisciplinary field of engineering that works with cross functional teams of other engineers, customers, supply chain and project leadership to ensure robust and high...


  • Malaysia GSR Construction Sdn Bhd Full time

    Job Responsibilities Update daily progress report to project manager Attend regular client consultant meetings Supervise and instruct the construction team as well as subcontractors Complete quality assurance and maintain the quality of work Control material wastage and delivery of materials Ensure all site materials delivered to site are handled and...

  • Site Quality Engineer

    2 months ago


    Malaysia ersg Ltd Full time

    Key Responsibilities: On site quality management of utility scale Solar PV and Battery Storage projects. Reviewing installation works against quality and contractual requirements in coordination with project engineering team. Responsibility for ensuring the highest HSQE standards are always maintained on site and in compliance with UK HSE and CDM...

  • Site Quality Engineer

    4 weeks ago


    Malaysia ersg Ltd Full time

    Key Responsibilities:On site quality management of utility scale Solar PV and Battery Storage projects. Reviewing installation works against quality and contractual requirements in coordination with project engineering team. Responsibility for ensuring the highest HSQE standards are always maintained on site and in compliance with UK HSE and CDM...

  • Site Engineer

    1 week ago


    Ayer Hitam Ayer Hitam Johor Malaysia 84000 TS Dynamic Construction Sdn Bhd Full time

    Coordinate, supervise and execute construction activities, inspect and control quality of construction before deliveries.Monitoring site progress, implement day to day activities report and prepare site report regularly.Report any discrepancies or delay in works and follow-up procedures.Ensure smooth operations and supervisory works at project site.Carry out...

  • Site Engineer

    1 week ago


    Ayer Hitam Ayer Hitam Johor Malaysia TS Dynamic Construction Sdn Bhd Full time

    Coordinate, supervise and execute construction activities, inspect and control quality of construction before deliveries.Monitoring site progress, implement day to day activities report and prepare site report regularly.Report any discrepancies or delay in works and follow-up procedures.Ensure smooth operations and supervisory works at project site.Carry out...