Site Reliability Engineer – Hosting Operations, Vancouver, BC

Full time H1B1 Recruiting in IT/ QA/ Graphics Email Job
  • Share:

Job Detail

  • Career Level Staff
  • Experience 3 Years
  • Gender Any
  • Industry Information Technology
  • Qualifications Bachelor's Degree

Job Description

Join Absolute’s Cloud Engineering & Hosting Operations teams and be part of our new core infrastructure and cloud initiatives. Our team is building the foundation for the next generation of the company services on top of Kubernetes and AWS. This is a global high throughput set of services that process data for hundreds of millions of devices per day. Be part of architecting the core application stack for throughput and resilience at Absolute.

Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems. SRE ensures that all services – both our internally critical and our externally visible systems – have reliability and uptime appropriate to users’ needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.

Do you have a passion for our core mission:

  • Solve challenges of scalability and efficiency at scale in cloud environments.
  • Digging to the root cause of production and scalability issues and ensuring they are fixed for life.
  • Automate all the things! Be passionate about automation, from infrastructure-as-code through to continuous integration and deployments.
  • Building high availability, high resilience systems based on Linux and open source technologies.
  • Security best practices through continuous monitoring, architecture, networking, and automation.

Accountabilities Will Include:

  • Management and development of our production observability infrastructure.
  • Support the build out and ongoing maintenance of core application components such as Kubernetes, Kafka, ElasticSearch and other highly scalable systems.
  • Support the core Linux and on-call teams around incident management, investigations and remediations of production issues.
  • Work with cloud engineering and product development teams to educate on sound operational practices.
  • Work with and promote Observability at all layers of the infrastructure, from hardware to network to containers to application layers.

What You’ll Need:

  • Experience with configuration management systems Ansible and Puppet.
  • Hands-on technical experience in Kubernetes orchestration.
  • Comfort with frequent, incremental code testing and deployment.
  • Experience with current observability tools such as Prometheus, Thanos, Grafana, Jaeger, etc.
  • Ability to use a wide variety of open source technologies and cloud services;
  • Experience with AWS.
  • Solid network fundamentals and the building blocks of highly available cloud infrastructure such as load balancers, network filtering and security, proxies, service mesh architectures.

Other jobs you may like