Back to jobs

Data Engineer

Qloo
New York, NY
Full-time
AI tools:
AWS Glue
Applications go directly to the hiring team

Full Description

About Us

At Qloo, we harness large-scale behavioral and catalog data to power recommendations and insights across entertainment, dining, travel, retail, and more. Our platform is built on a modern AWS data stack and supports analytics, APIs, and machine-learning models used by leading brands. We are looking for an experienced Data Engineer to help evolve and scale this platform.

Role Overview

As a Data Engineer at Qloo, you will design, build, and operate the pipelines that move data from external vendors, internal systems, and public sources into our S3-based data lake and downstream services. You’ll work across AWS Glue, EMR (Spark), Athena/Hive, and Airflow (MWAA) to ensure that our data is accurate, well-modeled, and efficiently accessible for analytics, indexing, and machine-learning workloads.

You should be comfortable owning end-to-end data flows, from ingestion and transformation to quality checks, monitoring, and performance tuning.

Responsibilities

* Design, develop, and maintain batch data pipelines using Python, Spark (EMR), and AWS Glue, loading data from S3, RDS, and external sources into Hive/Athena tables.

* Model datasets in our S3/Hive data lake to support analytics (Hex), API use cases, Elasticsearch indexes, and ML models.

* Implement and operate workflows in Airflow (MWAA), including dependency management, scheduling, retries, and alerting via Slack.

* Build robust data quality and validation checks (schema validation, freshness/volume checks, anomaly detection) and ensure issues are surfaced quickly with monitoring and alerts.

* Optimize jobs for cost and performance (partitioning, file formats, join strategies, proper use of EMR/Glue resources).

* Collaborate closely with data scientists, ML engineers, and application engineers to understand data requirements and design schemas and pipelines that serve multiple use cases.

* Contribute to internal tooling and shared libraries that make working with our data platform faster, safer, and more consistent.

* Document pipelines, datasets, and best practices so the broader team can easily understand and work with our data.

Qualifications

- Bachelor’s degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.

- Experience with Python and distributed data processing using Spark (PySpark) on EMR or a similar environment.

- Hands-on experience with core AWS data services, ideally including:

- S3 (data lake, partitioning, lifecycle management)

- AWS Glue (jobs, crawlers, catalogs)

- EMR or other managed Spark platforms

- Athena/Hive and SQL for querying large datasets

- Relational databases such as RDS (PostgreSQL/MySQL or similar)

- Experience building and operating workflows in Airflow (MWAA experience is a plus).

- Strong SQL skills and familiarity with data modeling concepts for analytics and APIs.

- Solid understanding of data quality practices (testing, validation frameworks, monitoring/observability).

- Comfortable working in a collaborative environment, managing multiple projects, and owning systems end-to-end.

We Offer

* Competitive salary and benefits package, including health insurance, retirement plan, and paid time off.

* The opportunity to shape a modern cloud-based data platform that powers real products and ML experiences.

* A collaborative, low-ego work environment where your ideas are valued and your contributions are visible.

* Flexible work arrangements (remote and hybrid options) and a healthy respect for work-life balance.

Applications go to the hiring team directly