(Senior) Data Engineer (f/m/x) - Remote Sensing & AI Pipelines
LiveEOFull Description
Build the Market Leader in Satellite Analytics with us at LiveEO
We are looking for a Senior Data Engineer to build the high-performance data backbone for our multitemporal, multimodal Earth observation models. While our ML Engineers focus on model architecture, you will own the infrastructure, ingestion, and refinement pipelines that combine very high-resolution optical and Synthetic Aperture Radar (SAR) data into production-ready datasets.
This is a high-impact role at the intersection of Big Data and AI. You will ensure that our "data engine" is scalable, deterministic, and capable of handling petabytes of geospatial information to enable semantic understanding across sensors and time.
LiveEO is a young, dynamic team that thrives on big challenges and fast learning cycles—we move quickly, stay curious, and genuinely enjoy building together. We’re on a mission to break the “curse of Earth Observation”: turning incredible satellite data into reliable, actionable decisions that people can trust and use in real operations. In this role, you’ll work in a fun, high-ownership environment where ambitious technical problems (multimodal SAR/optical foundation models) meet real-world impact—and where your ideas can go from whiteboard to production in tight, collaborative iterations.
You’ll sit within LiveEO’s AI team and partner closely with downstream product teams to translate model capabilities into measurable business value and production-ready workflows. You’ll also work hand-in-hand with our dedicated data annotation team to define labeling guidelines, drive feedback loops on data quality, and ensure training/evaluation datasets reflect real-world edge cases.
Tech stack & tools, which potential candidate will work with:
* Ray (distributed compute)
* Prefect (workflow orchestration)
* AWS (cloud infrastructure)
* Datastores: PostgreSQL (metadata / operational data)
* Python (core development)
* PyTorch + PyTorch Lightning (model training, experimentation)
* Databricks + MLflow (experiment tracking, model registry)
* Geospatial stack: GDAL, Rasterio, GeoPandas, STAC (EO data handling and standardization)
Your challenge
* Build Scalable Data Pipelines: Design and maintain robust ETL/ELT workflows using Prefect and Ray to ingest, process, and standardize massive volumes of satellite imagery.
* EO Data Management: Own the standardization of high-resolution SAR and optical imagery, focusing on normalization, tiling/chipping, and co-registration sanity checks to ensure data integrity.
* Infrastructure & Tooling: Optimize our cloud-native stack on AWS, leveraging Databricks and PostgreSQL to manage metadata and operational data stores.
* Collaborative AI Support: Partner closely with ML Engineers to deliver production-ready data components and inference interfaces that downstream teams can depend on.
* Data Quality & Diagnostics: Work hand-in-hand with the data annotation team to automate feedback loops on data quality and ensure datasets reflect real-world edge cases.
* System Reliability: Implement monitoring signals and deterministic evaluation frameworks to ensure pipeline reproducibility across various geographies and acquisition conditions.
Your profile
* Strong Software Engineering: Mastery of Python with a focus on clean, maintainable, and testable code.
* Data Orchestration & Compute: Proficiency in using Prefect (or Airflow) and distributed computing frameworks like Ray or Anyscale.
* Cloud & Big Data: Deep expertise in AWS infrastructure and Databricks for large-scale data processing.
* Database Management: Strong knowledge of PostgreSQL and managing complex metadata at scale.
* Pragmatic Delivery: A mindset that balances building robust, long-term infrastructure with the need for practical, iterative delivery.
* Geospatial Stack: Experience with GDAL, Rasterio, GeoPandas, and STAC for handling Earth Observation data is a plus.
* ML Integration: Familiarity with PyTorch Lightning and MLflow to better support the ML R&D lifecycle is a plus.
* SAR Experience: Basic knowledge of SAR preprocessing libraries and data formats is a plus.
Your Benefits
* The opportunity to create a product that can improve business processes and lives across the globe.
* Flexible working hours and hybrid work model - we trust our employees to get their work done while maintaining a healthy work-life balance.
* We empower employees to drive their own career development, take initiative and have the freedom to be creative and bold.
* Not an overtime culture - we take care that overtime is done only as a necessity and always offset with time off and rest.
* A collaborative and learning environment - frequent internal workshops, knowledge sharing sessions, journal clubs and hackathons.
* Office located in the centre of Berlin Kreuzberg with free fruit, nuts and drinks.
* Potential to participate in the employee stock option program.
* Urban Sports membership and BVG subsidy, corporate pension program.
* A diverse and vibrant international environment of 30+ different nationalities.
About Us
LiveEO is a well funded startup founded in 2018 and based in Berlin. Our primary service is modelling risk to our customers’ assets and infrastructure from vegetation, ground deformation and change detection. We currently have around 160 employees from all over the world with a variety of backgrounds