HPC Operations Lead
LTMFull Description
HPC Operations Lead / HPC Service Manager (Contract)
Pyrénées-Atlantiques, France | 3 months (renewable) | Start: ASAP
About Us
LTM is a global technology consulting and digital solutions company partner to more than 700+ clients and powered by nearly 90,000 talented professionals across more than 30 countries in the world. Technology expertise: ERP, analytics, AI, cloud computing, and cybersecurity. For more information, please visit www.ltm.com.
Job Description
We are looking for an experienced HPC Operations Lead / Service Manager to pilot and coordinate the end‑to‑end operations of large‑scale High Performance Computing (HPC) platforms, both on‑premise and cloud (Azure HPC).
This is a governance and orchestration role, not hands‑on engineering. You will act as the central point of coordination between HPC users, operations teams (L1/L2), expert providers (L3), and hardware vendors, ensuring reliable, secure, and high‑performance HPC service delivery.
Location
* Onsite - Pyrénées-Atlantiques, France
Client Expectations
* Senior HPC operations background
* Clear leadership and pilotage capability
* Immediate availability
* Comfortable working in a bilingual (FR/EN) environment
Key Responsibilities
HPC Operations & Service Pilotage
* Own daily production and service continuity of HPC platforms (on‑prem & cloud)
* Ensure availability, performance, and operational continuity (MCO)
* Coordinate security compliance (MCS) with cybersecurity teams
* Lead incident, problem, and change management across stakeholders
Vendor & Stakeholder Coordination
* Act as the primary interface between users, operations teams, vendors, and experts
* Lead and animate Steering, Technical, and Operational Committees
* Ensure SLA adherence and delivery commitments
User & Business Engagement
* Maintain strong relationships with scientific, R&D, and AI users
* Align HPC operations with business workloads and expectations
* Provide clear service reporting and operational visibility
Governance & Continuous Improvement
* Monitor capacity, usage, and performance trends
* Anticipate risks (capacity, obsolescence, bottlenecks)
* Coordinate platform transitions, upgrades, and next‑generation systems
* Ensure production readiness for new infrastructure and releases
Technical Environment (Operational Understanding Required)
* Large‑scale HPC clusters (CPU & GPU), on‑prem & Azure HPC
* Very large Lustre storage environments
* Schedulers: SLURM, LSF
* OS: RHEL, SUSE SLES
* HPC ecosystem: MPI, Lustre, CUDA awareness
* Monitoring & observability: Grafana, Nagios, Elastic, etc.
Note: deep hands‑on engineering is not required; strong operational understanding is essential.
Required Experience & Skills
Must‑Have
* Strong experience in HPC operations or HPC service delivery
* Proven operational pilotage / service coordination background
* Experience managing complex environments with multiple vendors
* Solid incident, change, and service governance experience
* Comfortable in business‑critical, high‑visibility environments
Soft Skills
* Strong communication and leadership skills
* Ability to orchestrate teams and “make others do”
* Structured, assertive, and delivery‑focused mindset
Languages
* French: Fluent
* English: Fluent
Benefits Package
* Excellent daily rate, flexible working, learning, development and professional growth opportunities, collaborative and inclusive work environment.
#HPCOperationsLead, #HPCLead, #HPCServiceDeliveryLead, #OperationalPilotage, #ServiceCoordination, #HPCVendorManagement, #jobPyrénées-Atlantiques