Data Scientist
Focus GTSThe ideal candidate is a Data Scientist with 1-4 years of experience and a Computer Science or Statistics Degree. This individual will:
~Oversee day to day ops of AWS, Microsoft 365, multi auth work with "duo" (a tool)
~Write models to find key document sets (often times in millions of records of text data)
~Need strong STATS capabilities (core math capabilities)
* Key ResponsibilitiesDevelop Python-based pipelines for extracting and processing text and metadata from documents, including both native text and image-based content.
* Design and implement AI workflows using open-source and commercial large language models for classification, summarization, extraction, and analysis tasks.
* Build and maintain vector indexes and retrieval-augmented generation (RAG) workflows to support document-heavy legal use cases.
* Implement prompt templates and prompt design patterns to support consistency and reuse across client matters.
* Deploy, operate, and support AI workflows in AWS environments, including use of SageMaker for model training, experimentation, and inference.
* Apply traditional machine learning techniques (e.g., logistic regression, random forest, decision trees) where appropriate alongside LLM-based approaches.
* Support statistical validation efforts, including sampling, metric calculation, and basic error analysis to evaluate model performance.
* Work with SQL and MySQL to support data analysis, validation, and pipeline integration.
* Produce clear, detailed documentation describing data sources, model behavior, validation results, and assumptions to support transparency and review.
* Required QualificationsExperience in applied machine learning, NLP, and data engineering.
* Strong Python proficiency with experience building data processing or ML pipelines.
* Experience extracting and processing text from structured and unstructured documents, including images (e.g., OCR workflows).
* Hands-on experience working with open-source and/or commercial large language models.
* Experience deploying or supporting ML workflows in AWS, including SageMaker.
* General competency with SQL and MySQL for data storage, querying and analysis.
* Foundational understanding of statistical validation concepts, including sampling and performance metrics.
* Experience with prompt templating and structured prompt design.
* Experience building or working with vector indexes and RAG frameworks.
* Working knowledge of classical machine learning models, including logistic regression.
* Strong attention to detail and a disciplined approach to documentation.