Data Engineer
GEP WorldwideFull Description
Overview
GEP is a global leader in procurement and supply chain transformation, we help enterprises harness the power of AI and digital technology to stay ahead in the new economy. Through AI-driven solutions, we enable businesses to operate with greater efficiency and effectiveness, gain competitive advantage, and maximize both business and shareholder value. We partner with Fortune 500 and Global 2000 enterprises across industries to build high-performing, resilient and sustainable supply chains.
We're hiring a Data Engineer to join our team in the US. You'll be part of the team that builds and maintains the data infrastructure — the pipelines, platforms, and systems that keep data flowing reliably from source to insight.
What You Will Do:
* Design, build, and maintain data pipelines that move and transform large volumes of data reliably, at scale, every day
* Embed AI into your engineering work — whether that's RAG pipelines, LLM-driven workflows, or model scoring built directly into the data systems you own
* Build the data infrastructure that Data Scientists need to train, retrain, and run ML models in production
* Work with cloud platforms (AWS, Azure, or GCP) to design and operate cloud-native data solutions
* Use orchestration tools like Apache Airflow to schedule, monitor, and manage pipeline workflows
* Set up data quality checks and automated testing so problems are caught before they reach downstream systems
* Collaborate closely with Data Scientists, Engineers, and Product teams to turn business requirements into data infrastructure
* Contribute to a codebase that is production-grade — reviewed, tested, documented, and built to last
What You Should Bring:
Required:
* Hands-on Data Engineering experience building and owning production pipelines
* Strong Python skills used for real pipeline and data engineering work, not just analysis
* PySpark or Apache Spark experience for processing data at scale
* Solid SQL skills — schema design, query optimisation, working with relational databases
* Experience with at least one major cloud platform (AWS, Azure, or GCP) using data services, not just compute
* A track record of AI embedded in your data engineering work — RAG pipelines, LLM-driven workflows, ML model infrastructure, or similar. This is required, not a bonus.
Nice to have:
* Hands-on experience with Databricks or Microsoft Fabric (either is fine — we care more about platform depth than the specific tool)
* Apache Airflow for pipeline orchestration
* MLOps experience — model versioning, monitoring, and deployment
* Exposure to LangChain, vector databases, or similar GenAI tooling
* A BS or MS in Computer Science, Engineering, or a related field
Salary Range: $90,000 – $105,000 annually, based on experience and qualifications.
Additional Compensation: Eligible for a performance-based bonus tied to individual and company performance.
Benefits: Comprehensive health coverage (medical, dental, and vision), 401(k) plan with a 3% company match, paid time off, company holidays, and professional development opportunities.