Back to feed

Cloud Engineer

Remote Full-time Live

Position Overview:

ShyftLabs is seeking a highly skilled Cloud Engineer (Senior, Data Platforms) to join our team and lead the design, implementation, and management of cloud infrastructure for our innovative GenAI applications. This role will be instrumental in building a robust platform that enables rapid experimentation and deployment while maintaining enterprise-grade security and reliability.

ShyftLabs is a growing data product company founded in early 2020 and works primarily with Fortune 500 companies. We deliver digital solutions built to help accelerate the growth of businesses in various industries, by focusing on creating value through innovation.

Job Responsibilities

  • Cloud Infrastructure Management

  • Design, provision, and maintain cloud resources across AWS (primary), with capabilities to work in Azure and Google Cloud environments
  • Manage end-to-end infrastructure for full-stack GenAI applications including:
  • Database systems (Aurora, RDS, DynamoDB, DocumentDB, etc.)
  • Security groups and IAM policies
  • VPC architecture and network design
  • Container orchestration (ECS, EKS, Lambda)
  • Storage solutions (S3, EFS, etc.)
  • CDN configuration (CloudFront)
  • DNS management (Route53)
  • Load balancing and auto-scaling
  • Data & AI Platforms

  • Design feature stores, vector stores, data ingestion frameworks, and lakehouse architectures
  • Manage data governance, lineage, masking, and access controls around data products
  • Serverless Architecture

  • Design and implement serverless solutions using AWS Lambda, API Gateway, and EventBridge
  • Optimize serverless applications for performance, cost, and scalability
  • Implement event-driven architectures and asynchronous processing patterns
  • Manage serverless deployment pipelines and monitoring
  • Disaster Recovery & High Availability

  • Architect and implement comprehensive disaster recovery strategies
  • Design multi-region failover capabilities with automated recovery procedures
  • Implement RTO/RPO requirements through backup strategies and replication
  • Build auto-failover mechanisms using Route53 health checks and failover routing
  • Create and maintain disaster recovery runbooks and testing procedures
  • Ensure data durability through cross-region replication and backup strategies
  • Platform Development

  • Build and maintain a self-service platform enabling rapid experimentation and testing of GenAI applications
  • Implement Infrastructure as Code (IaC) using Terraform for consistent and repeatable deployments
  • Create streamlined CI/CD pipelines that support local-to-dev-to-prod workflows
  • Design systems that minimize deployment time and maximize developer productivity
  • Establish quick feedback loops between development and deployment
  • Monitoring & Operations

  • Implement comprehensive monitoring, observability, and alerting solutions
  • Set up logging aggregation and analysis tools
  • Ensure high availability and disaster recovery capabilities Optimize cloud costs while maintaining performance
  • DevOps Excellence
  • Champion DevOps best practices across the organization
  • Automate infrastructure provisioning and application deployment
  • Implement security best practices and compliance requirements
  • Create documentation and runbooks for operational procedures
  • Basic Qualifications

  • Technical Skills
  • 5+ years of hands-on experience with AWS services
  • 2+ years of hands-on experience with Databricks
  • Expert-level knowledge of AWS core services (EC2, VPC, IAM, S3, RDS, Lambda, ECS/EKS)
  • Expert-level knowledge of Databricks capabilities
  • Familiarity with SageMaker, Bedrock, or Anthropic/Claude API integration
  • Strong proficiency with Terraform for infrastructure automation
  • Demonstrated experience with containerization (Docker, Kubernetes)
  • Solid understanding of networking concepts (subnets, routing, security groups, VPN)
  • Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline)
  • Proficiency in scripting languages (Python, Bash, PowerShell)
  • Serverless & Event-Driven Architecture

  • Extensive experience with AWS Lambda, API Gateway, ECS, Step Functions
  • Knowledge of serverless frameworks (SAM, Serverless Framework)
  • Experience with event-driven patterns using SNS, SQS, EventBridge
  • Understanding of serverless best practices and optimization techniques
  • Disaster Recovery & Business Continuity

  • Proven experience designing and implementing DR strategies in AWS
  • Expertise in multi-region architectures and data replication
  • Experience with AWS backup services and cross-region failover
  • Knowledge of RTO/RPO planning and implementation
  • Hands-on experience with Route53 health checks and failover routing policies
  • Cloud Platform Experience

  • Primary: AWS (extensive experience required)
  • Secondary: Azure and Google Cloud Platform (working knowledge)
  • Multi-cloud architecture understanding
  • Monitoring & Observability

  • Experience with monitoring tools (CloudWatch, Datadog, Prometheus, Grafana)
  • Log management systems (ELK stack, Splunk, CloudWatch Logs) APM tools and distributed tracing
  • Preferred Qualifications

  • AWS certifications (Solutions Architect, DevOps Engineer)
  • Databricks Certifications
  • Experience with open-source LLMs, embedding models, and RAG-based applications
  • Experience with chaos engineering and resilience testing
  • Knowledge of security frameworks and compliance (SOC2, HIPAA, PCI)
  • Experience implementing complex build systems for mono-repo micro-services architectures
  • Background in building developer platforms or internal tools Experience with Infrastructure as Code testing frameworks
  • Additional Information

    We are proud to offer a competitive salary alongside a strong healthcare insurance and benefits package. The role is preferably hybrid, with 2 days per week spent in the office, and flexibility for client engagement needs. We pride ourselves on the growth of our employees, offering extensive learning and development resources.

    ShyftLabs is an equal-opportunity employer committed to creating a safe, diverse and inclusive environment. We encourage qualified applicants of all backgrounds including ethnicity, religion, disability status, gender identity, sexual orientation, family status, age, nationality, and education levels to apply. If you are contacted for an interview and require accommodation during the interviewing process, please let us know.

    Apply to this Job

    On the same wavelength

    VP Sales EMEA

    Remote Full-time

    Store Leader

    Remote Full-time

    Assistant Manager

    Remote Full-time

    Assistant Merchandiser

    Remote Full-time

    Business Development Representative (BDR)

    Remote Full-time

    Research Associate, Molecular & Cellular Biology

    Remote Full-time

    Director of Salesforce Architecture

    Remote Full-time

    People Business Partner - Engineering, Product, Design

    Remote Full-time

    Senior Genesys Engage Engineer (Houston, TX)

    Remote Full-time

    Google CCAI Solutions Lead

    Remote Full-time

    Claims Representative I, Auto

    Remote Full-time

    Business Analyst - (Insurance/ Reinsurance Data Governance) - Hybrid or Remote

    Remote Full-time

    Operations Supply Chain Manager

    Remote Full-time

    Experienced Customer Service Representative – Remote Work Opportunity with arenaflex for Delivering Exceptional Support and Driving Customer Satisfaction

    Remote Full-time

    [Remote] Sr Platform Engineer-1

    Remote Full-time

    Experienced Customer Service Representative (Remote) – Deliver Exceptional Client Experiences at arenaflex

    Remote Full-time

    Remote Live Chat Customer Support Specialist – arenaflex Virtual Service Team – Flexible Hours & Growth Opportunities

    Remote Full-time

    Experienced Live Chat Support Specialist – Remote, Flexible Hours, No Experience Required

    Remote Full-time

    Operation Intake Coordinator I - Data Entry Specialist at blithequark: Transforming Healthcare through Efficient Document Management

    Remote Full-time

    Experienced Part-Time Remote Customer Service Representative – Streaming Entertainment Expert

    Remote Full-time