Data Engineer with AI/ML
SharpAtomsFull Description
This role bridges traditional data engineering (ETL/ELT) and data science, ensuring data is clean, accessible, and optimized for model training, real-time inference, and Generative AI (GenAI) workloads.
Responsibilities
* AI/ML Pipeline Development: Design and maintain robust ETL/ELT pipelines specifically for feeding data into machine learning models.
* Data Preparation & Feature Engineering: Automate data preprocessing (normalization, encoding, augmentation) and collaborate with data scientists to create feature stores.
* Infrastructure Optimization: Implement data lakes, warehouses, and vector databases (e.g., Pinecone, Weaviate) optimized for AI workloads.
* MLOps & Deployment: Implement MLOps practices, including CI/CD, model versioning, monitoring model performance, and automating retrain workflows.
* Real-time Streaming: Implement real-time data streaming for live inference using tools like Kafka or Flink.
* Data Governance & Security: Ensure data quality, lineage tracking, and compliance with privacy standards (GDPR, CCPA) in AI contexts.
Qualifications
* Education: Bachelor’s or Master’s in Computer Science, Data Engineering, or a related field.
* Experience: Generally 3–7+ years in data engineering or related ML roles.
Required Skills
* Programming: High proficiency in Python and SQL; experience with Scala or Java is often preferred.
* Big Data Technologies: Experience with Apache Spark, Hadoop, Hive, and Kafka.
* Cloud Platforms: Proficiency in AWS, Azure, or GCP cloud services.
* ML Frameworks: Familiarity with TensorFlow, PyTorch, or Scikit-learn.
* Data Tools: Experience with Airflow, dbt, and MLflow.