Job title: Machine Learning Engineer – Lumen Enterprise Models (SWE-focused LLMs)

Location: London; full in-office working as default

Start date: ASAP

Reports to: CEO

Compensation: £80,000 - £110,000

___________________________________________________________________________

Cosine at a glance

At Cosine, we’re building autonomous AI engineers that plan, write, and ship code inside real development workflows.

Cosine is designed for on-premise and virtual private cloud (VPC) deployments, including fully air-gapped environments. We build our agent tooling entirely in-house and post-train open-source models to deliver reliable, enterprise-grade coding performance in security-critical settings.

In 2024, Cosine achieved a 72% score on OpenAI’s SWE-Lancer benchmark, placing us among the strongest real-world software-engineering AI systems evaluated.

YC-backed and well-funded, Cosine was founded by experienced operators focused on building dependable, production-grade AI.

This role is based in our Hoxton office, five days a week, because close collaboration, fast feedback, and shared context matter for the problems we’re solving.

___________________________________________________________________________

The role

We’re looking for an ML engineer to own large-scale training of our Lumen Enterprise models – our open‑source–based software engineering LLMs.

You’ll work on supervised fine-tuning (SFT), and reinforcement learning (RL) and continued pretraining on top of open-source base models to push state-of-the-art performance on real software engineering tasks: reading and modifying large codebases, using tools, and reasoning about complex systems.

If you enjoy working close to the metal with PyTorch and distributed training, and you like making big models actually work in practice, this role is for you.

___________________________________________________________________________

About The Role

In this role you will:

* Take open-source base models (code + general LLMs) and turn them into high-performance Lumen Enterprise SWE agents via SFT and RL.

* Design and run large-scale training experiments on multi-node GPU clusters, including long-context training and MoE-style architectures.

* Build and iterate on large-scale RL loops where models write code, run tests or tools, and get rewarded (or penalized) accordingly.

* Work hands-on across the stack: custom PyTorch dataloaders, distributed training primitives, RL objectives, and evaluation on real-world repos and tasks.

You’ll collaborate closely with infra, product, and research to decide what to train next, how to train it, and how to measure whether it’s actually better for engineers.

What You’ll Do

* Participate in end-to-end training of Lumen Enterprise SWE models:

* Supervised fine-tuning on curated code and conversation datasets.

* RL on top of those models to align them with software-engineering objectives.

* Occasional continued pretraining on domain-specific / long-context corpora.

* Design, implement, and iterate on RL training pipelines

* Build and maintain large-scale PyTorch training code:

* Write and optimize custom dataloaders and batching strategies

* Use PyTorch distributed primitives (DDP/FSDP and related) to scale training.

* Operate large multi-node training jobs:

* Launch and debug multi-GPU, multi-node runs (Slurm, k8s or similar schedulers).

* Diagnose issues around NCCL, hangs, load balancing, and performance regressions.

* Track experiment configs, checkpoints, and metrics across many runs.

* Work on long-context and code-focused training:

* Train models on long-context data (e.g. long documents, repos, multi-file tasks) and understand the tradeoffs between context length, batch size, and stability.

* Ideate on novel and opinionated reward functions for the training of SWE agents

* Improve evaluation for SWE models:

* Help maintain/extend an evaluation suite for code models (unit tests, benchmark suites, repo-level tasks).

* Analyze failure modes and feed them back into data and training plans.

* Collaborate:

* Work closely with infra engineers on performance and reliability.

* Stay up to date with the latest research in the space, sharing knowledge throughout the team at lunch and learns and regular stand ups.

___________________________________________________________________________

What We’re Looking For

* Strong experience training deep learning models in production:

* Typically 3–5+ years working as an ML engineer / applied scientist, including hands-on responsibility for training and shipping models.

* Deep proficiency with PyTorch and its primitives:

* Comfort implementing custom training loops, losses, and dataloaders.

* Hands-on experience with torch.distributed (DDP/FSDP-style training, distributed data loading, gradient scaling, etc.).

* Experience training large sequence models or LLMs:

* Have trained models at ≥70B parameters end-to-end on multi-GPU setups.

* Understand practical issues: stability, init, scaling laws, gradient accumulation, curriculum and sampling strategies.

* Experience with SFT and RL on top of LLMs:

* Have implemented or meaningfully modified at least one RLVR system (e.g. PPO-style, GRPO-style, or similar).

* Comfortable working with advantages, policy ratios, KL penalties, and sequence-level rewards.

* Strong software engineering background:

* You can read, debug, and write non-trivial production code (Python, plus familiarity with at least one of: TypeScript, Go).

* You care about code quality, correctness, and maintainability as much as model metrics.

* High level of Git proficiency.

* Distributed systems / training ops experience:

* Practical experience running multi-node jobs on GPU clusters (Slurm, Kubernetes, or managed cloud equivalents).

* Familiarity with GPU performance tuning: memory usage, mixed precision, throughput vs. latency tradeoffs.

* Data engineering instincts:

* Comfortable working with large-scale datasets, object storage, dataset sharding, and filtering.

* Know that data quality and sampling strategies matter as much as architecture.

* Clear communication and ownership:

* Can take a vague modelling goal (“make Lumen Enterprise better at X”) and turn it into a concrete plan of experiments.

* Comfortable documenting decisions and walking others through tradeoffs.

Nice to have

* You don’t need all of these, but the more you have, the more you’ll hit the ground running:

* Continued pretraining and long-context experience:

* Have run continued pretraining on domain-specific or long-context corpora.

* Familiarity with techniques like RoPE scaling, YaRN-style extrapolation, context parallelism, or similar.

* Code-focused RL and evaluation:

* Experience building RL loops where rewards come from code execution (tests, linters, static analysis, fuzzing, runtime traces).

* Familiarity with evaluation benchmarks for code models (e.g. HumanEval, MBPP, SWE-bench, or internal equivalents).

* Experience with modern LLM training stacks:

* Experience with large MoE models and expert/tensor parallelism is a plus.

* Serving and online training:

* Experience in tuning inference tasks for opensource frameworks, e.g. VLLM, SGLang, etc.

* Safety, robustness, and reward shaping:

* Experience with LLM-as-a-judge, reward hacking detection, or robustness evaluation.

* Open-source contributions or research:

* Contributions to open-source LLM tooling, RL libraries, or relevant research papers in LLM training / RLHF / code models.

___________________________________________________________________________

Why join Cosine

* Direct impact: Your work directly shapes the next generations of Lumen Enterprise SWE models that engineers use every day.

* Real scale: You’ll work with large, modern open-source models, long context lengths, and multi-node training runs.

* Full-stack ML engineering: From custom PyTorch code and distributed systems to data curation, RL design and MLOps.

* Research + pragmatism: You’ll stay close to the latest literature in SFT, and code LLMs, but you’ll be judged by shipped improvements, not just ideas.

If this sounds like a fit, this is a role where you can meaningfully push the frontier of open-source–based software engineering models.

___________________________________________________________________________

Cosine is an equal opportunity employer.

We value diverse backgrounds, perspectives, and ways of thinking, and we’re committed to creating an inclusive and respectful workplace.

We encourage applications from anyone who meets the role requirements, even if you don’t meet every single qualification. If you need reasonable adjustments at any stage of the hiring process, we’re happy to discuss them.

___________________________________________________________________________

Compensation, Benefits & Ways Of Working

We’re an in-office team, five days a week, by design. We believe the work we’re doing benefits from being together, collaborating closely, and building shared context.

What You Can Expect

* Competitive salary, benchmarked to the market

* Equity / share options, so you share in the upside you help create

* 30 days’ holiday + bank holidays

* Genuine 9–5 working hours — we don’t expect late nights or weekend work

* Work hard in the office, collaborate closely, and switch off properly

* Dog-friendly office — bring your dog to work

* Daily lunch provided

* Monthly team breakfasts

* Monthly socials

* Pension

* High-quality equipment to do your best work

We care about focus, sustainability, and doing great work — not performative overwork. We value people who show up, contribute thoughtfully, collaborate well with their colleagues, and then go home.

This role won’t suit everyone. But if you want structure, clarity, strong collaboration, and a team that takes both the work and work-life balance seriously, it’s a great place to be.

___________________________________________________________________________