[Remote] Site Reliability Engineer II, tvScientific
Note: The job is a remote job and is open to candidates in USA. Pinterest is a platform that inspires creativity and helps users plan for memorable experiences. They are seeking a Site Reliability Engineer to operate, scale, and enhance a cloud-native platform on AWS, focusing on improving infrastructure reliability and operational maturity.
Responsibilities
- Ensuring the reliability, availability, and performance of production infrastructure and platform services
- Operating and scaling Kubernetes platforms, including governance and support for multi-tenant workloads
- Managing GitOps-based deployment workflows using ArgoCD and Helm
- Supporting infrastructure provisioning and change management through Terraform/Terragrunt
- Building and supporting CI/CD automation and deployment workflows using GitHub Actions
- Participating in incident response, root cause analysis, and post-incident improvement initiatives
- Reducing operational toil through scripting, tooling, and process automation
- Advancing observability practices across logs, metrics, traces, dashboards, and alerting
- Supporting secure secrets integration, IAM-aware operations, and platform guardrails
- Partnering closely with application, security, and platform teams to improve reliability and delivery outcomes
Skills
- 4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure
- Strong hands-on experience operating AWS in production environments
- Good expertise in Kubernetes, including cluster operations, troubleshooting, workload reliability, and platform administration
- Experience with Kubernetes multi-tenancy, including namespaces, RBAC, quotas, policies, and tenant isolation patterns
- Experience implementing and operating ArgoCD within a GitOps delivery model
- Strong hands-on experience with Helm
- Experience with Terraform/Terragrunt for infrastructure provisioning and environment management
- Solid scripting and automation skills using Bash and/or Python
- Experience building, maintaining, or supporting CI/CD pipelines, ideally using GitHub Actions
- Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems
- Experience with monitoring, alerting, and observability in production environments
- Demonstrated ownership mindset with experience handling incidents and resolving production issues
- Strong collaboration and communication skills, with the ability to work effectively across engineering, security, and platform teams
- Bachelor's degree in computer science, engineering, a related field or equivalent experience
- Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
- Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
- High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables
Benefits
- The position is also eligible for equity.
- Information regarding the culture at Pinterest and benefits available for this position can be found here.
Company Overview