Back to jobs

DevOps Engineer

micro1
EMEA (Remote)
Contract
Applications go directly to the hiring team

Full Description

Job Title: DevOps Engineer

Job Type: Contractor

Location: Remote

Job Summary:

Join our customer's team as a DevOps Engineer on a specialized, high-intensity project dedicated to training and optimizing AI models within advanced containerized environments. This is an expert-level, terminal-intensive engagement where your ability to troubleshoot, recover, and optimize dynamic infrastructure will directly influence project success. Demonstrate elite technical execution in a role that offers the potential for further engagement or transition to subsequent project phases.

Key Responsibilities:

* Architect, maintain, and optimize containerized environments for large-scale AI model training and data processing.

* Rapidly diagnose and resolve issues in live systems, employing advanced terminal-native problem-solving skills.

* Implement dynamic infrastructure recovery protocols to ensure high system availability and resilience.

* Collaborate closely with cross-functional teams to streamline CI/CD pipelines and automate critical workflows.

* Manage, monitor, and troubleshoot long-running server processes, proactively identifying and addressing resource bottlenecks and failures.

* Replan and recover interrupted processes in Dockerized sandboxes, minimizing downtime and maximizing efficiency.

* Contribute technical expertise to system builds, server administration, and infrastructure management for cutting-edge AI workloads.

Required Skills and Qualifications:

* Proven expertise as a DevOps Engineer in production environments, with hands-on terminal proficiency.

* Mastery of dynamic infrastructure recovery, error resolution, and live process management in complex, containerized setups.

* Demonstrated skill in Docker, Kubernetes, and related container orchestration technologies.

* Strong programming abilities in Python, with proficiency in Bash scripting and familiarity with JavaScript/TypeScript, Go, Rust, or C/C++.

* Experience with build systems, package managers, databases, web servers, ML frameworks, version control, and cryptography tools.

* Exceptional troubleshooting ability, especially in multi-step, mid-execution replanning scenarios.

* Systems-first mindset with a passion for optimizing large-scale, mission-critical environments.

Preferred Qualifications:

* Hands-on experience supporting AI/ML model training pipelines or high-availability compute clusters.

* Background in security, cryptography, or compliance within containerized or cloud-driven environments.

* Contributions to open-source DevOps tooling or infrastructure projects.

Applications go to the hiring team directly