Cloud Service Reliability Engineer / Team Lead, Jobs, 4208

Cloud Service Reliability Engineer / Team Lead - Glasgow

Sorry, this advert is now closed. Click here to view our live vacancies.

This technology infrastructure organization is delivering a wide range of products and services, and partnering with all lines of business to provide high quality service delivery, exceptional project execution and financially disciplined approaches and processes in the most cost effective manner.   The objective is to balance both business alignment and the centralized delivery of core products and services.  The group is designed to address the unique infrastructure needs of specific lines of business and the demand to leverage economies of scale across the firm.

Cloud Development

Cloud Development is a small team responsible for architecting, designing and implementing a new, cutting edge, cloud platform for transforming our business applications into scalable, elastic systems that can be instantiated on demand.  The team is comprised of three pillars: Client Engagement, Core Engineering and Service Reliability Engineering (SRE). 

Job Summary

Service Reliability Engineer is a highly technical role on the SRE team, responsible for operating the cloud platform and developing and implementing the operational tools and processes necessary to improve reliability.  Cloud operations include deployment and maintenance of the platform infrastructure as well as diagnosis and remediation of support issues, or escalation to Core Engineering or Client Engagement where appropriate.  The Service Reliability Engineer is expected to drive technical innovation and efficiency in infrastructure operations via hands on development of tools and automation and by providing feedback and design contributions to the portfolio of cloud products.

Core Responsibilities

  • Development and operation of support tools for control, instrumentation and investigation.
  • Development and execution of tests to measure and ensure levels of availability and performance.
  • Development and documentation of mechanisms for deployment, monitoring and maintenance.
  • Diagnosis and remediation of service level issues and engagement with internal support and engineering teams and third party suppliers as required.
  • Deployment, upgrade, configuration and maintenance of the cloud platform.
  • Tracking and management of platform usage and capacity.

Requirements

  • 7 to 10 years of software development experience in Java/J2EE technologies with Tomcat application server
  • Experience with Java frameworks (Spring,Hibernate,  Junit, JDBC )
  • Experience with Build tools (Maven/ Jenkins), IDEs (Eclipse), Source Code mgmt. (GIT/Stash)
  • Strong analytical and problem solving abilities.
  • Good organizational, management, written and communication skills.
  • The candidate must be a self starter who is able to work in a fast paced, results driven environment.
  • Base understanding of operating systems, storage, networking, virtualization, web, database and messaging services with the ability to dive deep into any of these areas when necessary.
  • UNIX/Linux Systems Administration, troubleshooting, performance analysis, shell scripting a plus
  • Experience with cloud technologies like Open Stack and Cloud Foundry a plus.
  • Experience with automated build and configuration management systems like Ansible, Puppet and Chef a plus.
  • Familiarity with the ELK stack (Elastic Search, Logstash, Kibana) a plus.
  • Ability to configure, tune and diagnose issues regarding CEPH storage a plus.
  • Experience with other programming languages like Python a plus.
  • Experience with data services like MySql databases or Messaging a plus
  • Bachelor’s degree in Computer Science, Information Systems or related field.  Advanced degree a plus.