[Remote] Staff Site Reliability Engineer - Kubernetes
Note: The job is a remote job and is open to candidates in USA. Okta is a company focused on securing identities in the AI era, and they are seeking a Staff Site Reliability Engineer to build and manage Kubernetes platforms. The role involves architecting reliable, scalable, and secure Kubernetes-based platforms on AWS, ensuring high availability and performance while optimizing costs and automation.
Responsibilities
- Kubernetes Platform Creation: Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms. Ensure clusters are optimized for production workloads, providing high resilience and operational efficiency
- AWS Infrastructure Management: Build, manage, and optimize AWS cloud infrastructure, including EKS,ECS, S3, VPCs, RDS, IAM, and more. Implement best practices for cost management, scaling, and security within AWS
- Helm Management: Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters. Create, maintain, and manage Helm charts for production-ready deployments
- Karpenter Implementation: Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands
- Istio Service Mesh Management: Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters. Enable fine-grained traffic management, service discovery, and policy enforcement
- Platform Automation & Scaling: Automate the deployment, scaling, and management of infrastructure and applications. Work with CI/CD pipelines to ensure a seamless flow from development to production with minimal downtime
- Incident Management & Troubleshooting: Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security in a timely and effective manner
- Security & Compliance: Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks
- Documentation & Knowledge Sharing: Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices. Promote knowledge sharing across teams
Skills
- 4+ years of experience with Kubernetes/Helm
- 4+ years of Experience with Terraform
- 5+ years of Experience with AWS
- Experience with multi-region cloud environments
- Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures
- Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage)
- Hands-on experience with Helm for Kubernetes application deployment and management
- Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage
- Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features
- Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker)
- Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation
- Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack
- Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks)
- Familiarity with Docker and containerization principles
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent professional experience)
- Certifications (Preferred): CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application Developer), or AWS Certified DevOps Engineer are highly desirable
Benefits
- Equity (where applicable)
- Bonus
- Benefits, including health, dental and vision insurance
- 401(k)
- Flexible spending account
- Paid leave (including PTO and parental leave)
- Immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one
Company Overview