About KPIT

KPIT is reimagining the future of mobility, forging ahead with group companies and partners to shape a world that is cleaner, smarter, and safer.

With over 25 years of specialized expertise in Mobility, KPIT is accelerating the transformation towards Software and AI-Defined Vehicles through its advanced solutions, platforms, and products—propelled by mobility-infused AI frameworks, software craftsmanship, and systems integration mastery.

Vision in Motion

Fueled by 2000+ vehicle production programs and powering 20+ million vehicles on the road with KPIT software, our experience is unmatched. At the same time, we push boundaries, developing solutions that enable Mobility OEMs to innovate at speed and scale.

Job Summary:

Leads projects for design, development and maintenance of a data and analytics platform. Effectively and efficiently process, store and make data available to analysts and other consumers. Works with key business stakeholders, IT experts and subject-matter experts to plan, design and deliver optimal analytics and data science solutions. Works on one or many product teams at a time.

Responsibilities:

* Designs and automates deployment of our distributed system for ingesting and transforming data from various types of sources (relational, event-based, unstructured).

* Own end-to-end delivery of AI and ML solutions from problem definition to production deployment.

* Build and maintain data pipelines using PySpark and Spark in Azure Databricks.

* Designs and implements framework to continuously monitor and troubleshoot data quality and data integrity issues.

* Implements data governance processes and methods for managing metadata, access, retention to data for internal and external users.

* Designs and provide guidance on building reliable, efficient, scalable and quality data pipelines with monitoring and alert mechanisms that combine a variety of sources using ETL/ELT tools or scripting languages.

* Perform feature engineering and data preparation for machine learning models.

* Deploy ML models and AI agents for real business use cases.

* Design and implement agentic AI workflows including multi-step reasoning and tool usage.

* Track experiments, manage models, and support deployment using MLflow.

* Define and execute model evaluation frameworks including both ML and AI agent performance.

* Designs and implements physical data models to define the database structure, optimizing database performance through efficient indexing and table relationships.

* Participates in optimizing, testing, and troubleshooting of data pipelines.

* Designs, develops and operates large-scale data storage and processing solutions using different distributed and cloud-based platforms for storing data (e.g. Azure, AWS, Spark, PySpark, Scala, advanced SQL, Hadoop, others).

* Uses innovative and modern tools, techniques and architectures to partially or completely automate the most common, repeatable and tedious data preparation and integration tasks to minimize manual and error-prone processes and improve productivity. Assists with renovating the data management infrastructure to drive automation in data integration and management.

* Ensures the timeliness and success of critical analytics initiatives by using agile development technologies such as DevOps, Scrum, Kanban.

* Coaches and develops less experienced team members.

* Work independently on ambiguous business problems and convert them into scalable solutions.

* Collaborate with other data engineers, analysts, solution architects and business teams to deliver solutions.

* Guide and support team members on data engineering, ML, and AI best practices.

* Write clean, production-ready, and well-documented code.

Requirements:

* College, university, or equivalent degree in a relevant technical discipline, or relevant equivalent experience required.

* At least 5 years of experience in data engineering with a strong background on Azure Databricks and Scala/Python.

* Experience in handling unstructured data processing and transformation with programming knowledge.

* Hands-on experience in building data pipelines using Scala/Python.

* Big data technologies such as Apache Spark, Structured Streaming, Advanced SQL, Databricks, Delta Lake, Azure/AWS.

* Strong analytical and problem-solving skills with the ability to troubleshoot Spark applications and resolve data pipeline issues.

* Familiarity with version control systems like Git and CI/CD pipelines.

* Experience with Azure Databricks and MLflow.

* Good understanding of ML workflows, model development, and evaluation.

* Knowledge of MLOps fundamentals such as CI/CD, versioning, and monitoring.

* Ability to build end-to-end data and ML solutions.

* Exposure to production ML or AI systems.

* Understanding of data engineering and data modeling basics.

* Ability to work independently on loosely defined problems.

* Strong problem-solving and communication skills.

* Mentoring experience is a plus.

* Experience with AI agents, LLMs, or agentic AI systems is preferred.

Compensation and Benefits:

Along with competitive pay, as a full-time KPIT employee, you are eligible for the following benefits:

* Geo Blue PPO and HSA plan.

* MetLife – Dental and Vision plan.

* Healthcare and Dependent care flexible spending account(FSA).

* 401k with employer match.

* Company-paid Basic Life and Long-term disability insurance.

* Voluntary benefits include Critical Illness, Hospital indemnity, accident insurance, theft, and legal service.

* Employee Assistance Program.

* Paid Holidays.

* Employee discounts and perks.

* Gym benefit.

Data Engineer

Full Description