Back to feed

[Remote] Staff Site Reliability Engineer - Kubernetes

Remote Full-time Live

Note: The job is a remote job and is open to candidates in USA. Okta is a company focused on securing identities in the AI era, and they are seeking a Staff Site Reliability Engineer to build and manage Kubernetes platforms. The role involves architecting reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation.

Responsibilities

  • Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimized for production workloads, providing high resilience and operational efficiency
  • AWS Infrastructure Management: Build, manage, and optimize AWS cloud infrastructure, including EKS,ECS, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS
  • Helm Management: Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments
  • Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands
  • Istio Service Mesh Management: Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained traffic management, service discovery, and policy enforcement
  • Platform Automation & Scaling: Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime
  • Incident Management & Troubleshooting: Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner
  • Security & Compliance: Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks
  • Documentation & Knowledge Sharing: Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams

Skills

  • 4+ years of experience with Kubernetes/Helm
  • 4+ years of Experience with Terraform
  • 5+ years of Experience with AWS
  • Experience with multi-region cloud environments
  • Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures
  • Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage)
  • Hands-on experience with Helm for Kubernetes application deployment and management
  • Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage
  • Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features
  • Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker)
  • Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation
  • Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack
  • Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks)
  • Familiarity with Docker and containerization principles
  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent professional experience)
  • Certifications (Preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable

Benefits

  • Equity (where applicable)
  • Bonus
  • Benefits, including health, dental and vision insurance
  • 401(k)
  • Flexible spending account
  • Paid leave (including PTO and parental leave)
  • Immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one

Company Overview

  • Okta is a management platform that secures critical resources from cloud to ground for workforce and customers. It was founded in 2009, and is headquartered in San Francisco, California, USA, with a workforce of 5001-10000 employees. Its website is http://www.okta.com.
  • Apply To This Job

    On the same wavelength