Job Title: Data Science Engineer

Location: Chicago, IL-Remote/Hybrid

About Us

At Predictive Sales AI (PSAI), we’re redefining how technology and intelligence transform digital marketing. Our AI-powered software enables home services businesses to make smarter, faster decisions—fueling growth through automation, prediction, and precision.

We are seeking a Data Science Engineer with strong data engineering and MLOps expertise to build scalable, production-grade ML and data platforms that directly impact customer growth and retention.

Job Overview

As a Data Science Engineer, you will design and operate the data + machine learning foundations behind PSAI’s predictive products. You will build scalable pipelines and robust warehouse/lakehouse models across CRM, marketing, product events, and external datasets — ensuring reliability, accuracy, and business continuity at scale.

This Role Requires

* 4+ years in data-centric engineering

* Proven experience deploying ML models via pipelines

* Deep expertise in Python, SQL, and Azure infrastructure

* Architectural ownership through data contracts and resilient modeling

Key Responsibilities

* Build scalable batch and near-real-time ingestion pipelines using Azure Data Factory, APIs, event streams, and external connectors.

* Develop ML-ready datasets across CRM, marketing automation platforms, product telemetry, and geospatial data sources.

* Design performant, well-modeled warehouse/lakehouse systems in Azure Synapse or Databricks.

* Train and deploy predictive models (lead scoring, churn prediction, forecasting) through reproducible pipelines.

* Build time-aware, leakage-resistant feature pipelines for production ML use cases.

* Support full MLOps lifecycle using Azure Machine Learning, including experiment tracking, model registry, and deployment.

* Implement automated validation, anomaly detection, reconciliation, and monitoring for pipelines and warehouse models.

* Design and enforce data contracts to prevent upstream schema changes from breaking downstream ML workflows.

* Own pipeline SLAs, alerting, incident response, and durable improvements through postmortems.

* Optimize processing for very large datasets (>100GB) through partitioning, incremental loads, distributed compute, and query tuning.

* Improve cost efficiency across compute/storage in Azure environments.

* Maintain clean, testable, production-ready Python codebases using:

* Object-oriented patterns

* Type hinting

* CI/CD workflows via Azure DevOps

* Package models and pipelines using Docker for consistent deployment across dev/staging/prod.

* Communicate architectural trade-offs and technical debt in business terms to Product, RevOps, and leadership.

* Partner with Engineering on instrumentation and scalable data integration.

* Mentor junior engineers through pairing, code reviews, and documentation best practices.

Desired Traits

We are looking for an individual who is organized, proactive, and detail-oriented. In this role, you will work closely with teams across the company. Here’s what we’re looking for:

* Ownership mindset with a reliability-first approach

* Strong SQL/Python and a high attention to data quality

* Scales systems thoughtfully (performance/cost aware, maintainable designs)

* Collaborative communicator across engineering, RevOps, and analytics

* Documents well and supports others through reviews/mentorship

Required Skills And Experience

* Preferred Master’s degree in Data Science, Computer Science, Statistics, Engineering, or a closely related quantitative field.

* 4+ years in data engineering, ML engineering, or data platform development.

* Minimum 2 years deploying ML models into production workflows.

* Experience building pipelines and warehouse systems at scale (>100GB datasets).

* Demonstrated adaptability in fast-changing technical and business environments.

* Python (Expert): pandas, polars, scikit-learn; PyTorch, transformers; production engineering (OOP, testing, typing)

* SQL (Expert): advanced analytics, recursive CTEs, query tuning, Azure Synapse optimization

* Azure Data & ML Stack: Data Factory (ETL/ELT), Azure ML (MLOps), Key Vault, Databricks/Spark, Docker deployment

* Distributed & Large-Scale Compute: Spark, Ray, Dask; GPU acceleration with RAPIDS (plus)

* Geospatial & Specialized Data: GeoPandas, Shapely, rasterio

* AI Automation & LLMs: LangChain/Semantic Kernel, agentic workflows

* DevOps & CI/CD: Azure DevOps pipelines, Gitflow, rebasing, clean version control

Why Join Us?

* Innovative Environment: Be part of a forward-thinking company that values creativity and encourages the exploration of new ideas.

* Professional Growth: Access opportunities for continuous learning and career advancement within a supportive and dynamic team.

* Comprehensive Benefits: Enjoy a competitive salary, performance-based bonuses, flexible work arrangements, and a robust benefits package.

* Collaborative Culture: Work in a team-oriented environment where collaboration and mutual respect drive our success.

If you're ready to be part of an innovative, growth-oriented team, apply today!