Back to jobs

LLM/AI Data Engineer

Lifescale Analytics
United States
Full-time
14,000,000 – 14,500,000 / year
AI tools:
LLM
OpenAI API
Applications go directly to the hiring team

Full Description

Lifescale Analytics helps organizations unlock the power of data through advanced analytics, AI, and modern digital solutions. We partner with forward-thinking clients to design and implement scalable, high-impact technologies that drive measurable business outcomes.

We are currently seeking a LLM/AI Data Engineer to support client engagement. In this role, you will work at the intersection of data engineering and AI, designing and validating high-quality, production-grade data pipelines with integrated LLM capabilities. This opportunity is remote, but candidates must live in the United States.

Applicants responding to this position must be a US Citizen and may be subject to a government security investigation and must meet eligibility requirements by currently possessing the ability to view classified government information. The candidate must have lived in the United States for the past 5 years.

The Employer will not sponsor applicants for any employment visas, at hiring or in the future, including but not limited to H-1B visas. Corp-to-Corp or subcontract personnel will not be considered for this position.

What You’ll Do

* Design, build, and operate LLM-assisted analytics pipelines in structured data environments

* Implement retrieval-augmented generation (RAG) and structured data grounding patterns

* Validate and improve LLM output quality, consistency, and traceability

* Develop and maintain production-grade ETL/ELT pipelines

* Review and test pipelines to identify logic errors, data gaps, and performance issues

* Define and track pipeline SLAs (latency, throughput, data freshness)

* Build and enforce data quality frameworks and validation processes

* Document engineering processes including QC logs, test cases, and schema documentation

* Collaborate with cross-functional teams to ensure scalable and auditable data systems

* All other duties as assigned.

Required Skills & Experience:

LLM-Integrated Data Engineering

* Experience designing, building, or operating LLM-assisted analytics pipelines

* Experience validating and improving LLM output quality and reliability

Strong understanding of:

* Prompt engineering for structured outputs

* Retrieval-Augmented Generation (RAG) patterns

* Structured-data grounding & hallucination mitigation

Production Data Engineering

Minimum 4+ years of experience in:

* Data engineering

* ETL/ELT pipeline development

* Data quality assurance in production environments

* Proven experience working with high-volume structured data systems

Technical Stack Proficiency

* Advanced proficiency in SQL and Python

* Experience with tools such as dbt, Spark, or similar frameworks

Hands-on experience with Snowflake, including:

* Snowpark or equivalent transformation frameworks

* Data modeling and performance optimization

* Snowflake Cortex

Pipeline Validation & Data Quality

* Ability to design and implement data quality frameworks

Experience reviewing and validating production pipelines:

* Logic validation and transformation accuracy

* Data completeness and integrity checks

* Identification of edge cases and failure modes

Benchmarking & Performance Engineering

* Ability to benchmark and optimize pipelines against performance targets

Experience defining and measuring:

* Pipeline latency

* Throughput

* Data freshness SLAs

Auditability & Documentation

* Experience supporting auditable and explainable data systems

Strong documentation practices, including:

* QC logs and validation reports

* Test case design and execution records

* Schema and lineage documentation

* Issue tracking and remediation workflows

Preferred Qualifications (Nice-to-Have)

Experience supporting U.S. Department of Defense (DoD) environments:

* Air Force Life Cycle Management Center (LCMC)

* Army Materiel Command (AMC)

Familiarity with Palantir Foundry:

* Ontology modeling concepts

* Data product consumption patterns

Experience with defense datasets:

* Government-Industry Data Exchange Program (GIDEP)

* Federal Logistics Information System (FED-LOG)

Exposure to:

* Entity resolution and part matching

* ERP data integration into analytics platforms

* Data normalization across fragmented systems

Education

* Bachelor’s degree in Computer Science, Data Engineering, or related field (or equivalent experience)

Who we are:

Lifescale Analytics, a small business that provides specialized expertise in data and analytics. Formed in 2012, the Lifescale Analytics team has years of experience providing a spectrum of customized data management services and solutions including Data Management/Analytics, Big Data Solutions, Cloud Services, Business Intelligence, and Data Science that focus on building strong portfolios and programs. Through experience and innovation, we allow businesses, pharmaceutical companies, financial institutions, and government agencies to manage and proactively make decisions based on their biggest asset, their data. Our specialists are skilled at managing, refining, analyzing, or visualizing information for the specific purpose of increasing the value of IT to benefit from the data science industry. This job will be remote until the client decides to have employees report to the site.

For more information, please visit our website at www.lifescaleanalytics.com

Applications go to the hiring team directly