This role bridges traditional data engineering (ETL/ELT) and data science, ensuring data is clean, accessible, and optimized for model training, real-time inference, and Generative AI (GenAI) workloads.

Responsibilities

* AI/ML Pipeline Development: Design and maintain robust ETL/ELT pipelines specifically for feeding data into machine learning models.

* Data Preparation & Feature Engineering: Automate data preprocessing (normalization, encoding, augmentation) and collaborate with data scientists to create feature stores.

* Infrastructure Optimization: Implement data lakes, warehouses, and vector databases (e.g., Pinecone, Weaviate) optimized for AI workloads.

* MLOps & Deployment: Implement MLOps practices, including CI/CD, model versioning, monitoring model performance, and automating retrain workflows.

* Real-time Streaming: Implement real-time data streaming for live inference using tools like Kafka or Flink.

* Data Governance & Security: Ensure data quality, lineage tracking, and compliance with privacy standards (GDPR, CCPA) in AI contexts.

Qualifications

* Education: Bachelor’s or Master’s in Computer Science, Data Engineering, or a related field.

* Experience: Generally 3–7+ years in data engineering or related ML roles.

Required Skills

* Programming: High proficiency in Python and SQL; experience with Scala or Java is often preferred.

* Big Data Technologies: Experience with Apache Spark, Hadoop, Hive, and Kafka.

* Cloud Platforms: Proficiency in AWS, Azure, or GCP cloud services.

* ML Frameworks: Familiarity with TensorFlow, PyTorch, or Scikit-learn.

* Data Tools: Experience with Airflow, dbt, and MLflow.

Data Engineer with AI/ML

Full Description