Back to jobs

Data Scientist

Focus GTS
Washington DC-Baltimore Area
Contract
AI tools:
AWS SageMaker
PyTorch
OpenAI API

The ideal candidate is a Data Scientist with 1-4 years of experience and a Computer Science or Statistics Degree. This individual will:

~Oversee day to day ops of AWS, Microsoft 365, multi auth work with "duo" (a tool)

~Write models to find key document sets (often times in millions of records of text data)

~Need strong STATS capabilities (core math capabilities)

* Key ResponsibilitiesDevelop Python-based pipelines for extracting and processing text and metadata from documents, including both native text and image-based content.

* Design and implement AI workflows using open-source and commercial large language models for classification, summarization, extraction, and analysis tasks.

* Build and maintain vector indexes and retrieval-augmented generation (RAG) workflows to support document-heavy legal use cases.

* Implement prompt templates and prompt design patterns to support consistency and reuse across client matters.

* Deploy, operate, and support AI workflows in AWS environments, including use of SageMaker for model training, experimentation, and inference.

* Apply traditional machine learning techniques (e.g., logistic regression, random forest, decision trees) where appropriate alongside LLM-based approaches.

* Support statistical validation efforts, including sampling, metric calculation, and basic error analysis to evaluate model performance.

* Work with SQL and MySQL to support data analysis, validation, and pipeline integration.

* Produce clear, detailed documentation describing data sources, model behavior, validation results, and assumptions to support transparency and review.

* Required QualificationsExperience in applied machine learning, NLP, and data engineering.

* Strong Python proficiency with experience building data processing or ML pipelines.

* Experience extracting and processing text from structured and unstructured documents, including images (e.g., OCR workflows).

* Hands-on experience working with open-source and/or commercial large language models.

* Experience deploying or supporting ML workflows in AWS, including SageMaker.

* General competency with SQL and MySQL for data storage, querying and analysis.

* Foundational understanding of statistical validation concepts, including sampling and performance metrics.

* Experience with prompt templating and structured prompt design.

* Experience building or working with vector indexes and RAG frameworks.

* Working knowledge of classical machine learning models, including logistic regression.

* Strong attention to detail and a disciplined approach to documentation.

Applications go to the hiring team directly