Jobs
>
Infrastructure Engineer, Cloud Operations

Infrastructure Engineer, Cloud Operations

Indefinite
Full time
Jaipur, India

Copado is a leading provider of DevOps and release management solutions, committed to helping organizations accelerate their digital transformations. Our mission is to empower businesses to deliver high-quality applications at speed and scale. With a strong focus on automation, CI/CD, and cloud technologies, we provide comprehensive solutions that streamline development, testing, and deployment processes. At Copado, we are dedicated to innovation, excellence, and customer success.

The mission of the Infrastructure Engineer is to ensure the reliability, security, and efficiency of Copado's existing multi-cloud environment. This role is dedicated to the operational excellence of our current GCP and AWS infrastructure. You will serve as a key operational hand, maintaining the health of our cloud resources, troubleshooting production incidents, and executing infrastructure updates to support the stability of our backend services and CI/CD pipelines.

What Success Looks Like

At the end of the first 12-18 months, success will be measured by the following outcomes (listed in order of importance):

Operational Stability & Uptime: Maintained 99.9% availability for core cloud services by proactively monitoring GCP and AWS environments and resolving infrastructure alerts swiftly.
Infrastructure Maintenance: Successfully executed routine maintenance tasks, including patching, security updates, and resource resizing across existing EC2/Compute Engine instances and managed databases.
Incident Response & Troubleshooting: Acted as a primary responder for infrastructure incidents, reducing Mean Time to Resolution (MTTR) through effective root cause analysis and documentation.
Pipeline Support: Maintained and optimized existing CI/CD pipelines (GitLab/Azure DevOps) to ensure smooth deployment of updates for engineering teams.
Cost & Resource Optimization: Identified and remediated idle or over-provisioned resources in AWS and GCP, contributing to monthly cloud cost efficiencies.
IaC Execution: Implemented defined infrastructure changes using existing Infrastructure-as-Code (Terraform/CloudFormation) templates to support feature releases.
Monitoring & Alerting: Tuned and improved dashboards (e.g., GCP Cloud Logging, AWS CloudWatch, Prometheus, Grafana, and Loki) to reduce noise and ensure critical alerts are actionable.
Security Compliance: Assisted in the execution of security audits and successfully implemented required policy changes (IAM adjustments, security group updates) as defined by the security team.

What You'll Be Doing

Serve as a key operational hand, maintaining the health of cloud resources, troubleshooting production incidents, and executing infrastructure updates.
Automate routine operational tasks and maintenance jobs using scripting languages like Bash and Python.
Manage and troubleshoot running containers (Docker/Kubernetes) in a production environment, including hands-on work with managed solutions like GKE and EKS.
Read, understand, and modify existing Terraform or CloudFormation scripts to apply updates.
Set up and respond to alerts in tools like CloudWatch, Prometheus, or Datadog.
Troubleshoot failed builds or deployments within CI/CD pipelines (GitLab/GitHub Actions/Azure DevOps).
Communicate status updates clearly during incidents and work collaboratively with development teams to resolve blockers.

Required Technical Competencies

Cloud Proficiency (GCP & AWS): Solid hands-on experience with core services (Compute, Storage, Networking, IAM) in both GCP and AWS, focused on administration and maintenance rather than complex architecture.
Operational Scripting: Proficiency in scripting (Bash, Python) to automate routine operational tasks and maintenance jobs.
Container Operations: Competence in managing and troubleshooting running containers (Docker/Kubernetes) in a production environment, with experience in GKE and EKS.
Infrastructure as Code (IaC) Usage: Ability to read, understand, and modify existing Terraform or CloudFormation scripts to apply updates.
Monitoring & Observability: Experience setting up and responding to alerts in tools like CloudWatch, Prometheus, or Datadog.
CI/CD Familiarity: Understanding of how pipelines work (GitLab/GitHub Actions/Azure DevOps) and ability to troubleshoot failed builds or deployments.

Required Cultural Competencies

Reliability Focused: Prioritizes stability and uptime above all; is thorough and cautious when making changes to production environments.
Problem Solver: Enjoys digging into logs and metrics to find the root cause of an issue.
Collaborative Communication: Clearly communicates status updates during incidents and works well with development teams to resolve blockers.
Bias for Action: Takes ownership of operational tickets and drives them to resolution without needing constant supervision.
Eagerness to Learn: Actively seeks to deepen knowledge of cloud architecture and DevOps best practices.

Indefinite
Full time
Jaipur, India

Apply now

This website uses cookies

FACTORIAL uses cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners who may combine it with other information that you've provided to them or that they've collected from your use of their services.