Back to jobs

AI Architect

UsefulBI Corporation
Foster City, CA
Full-time
AI tools:
GPT
Claude
Pinecone
SageMaker
LangChain
Applications go directly to the hiring team

Full Description

Position : AI Architect

Location: Foster City, CA/Raleigh, NC

Work Model: Hybrid – 3 days onsite / 2 days remote

About the Role

We are looking for a hands-on AI Architect to design, build, and deploy production-

grade Generative AI systems on AWS. This role goes beyond experimentation—you will architect secure, scalable, and cost-efficient GenAI solutions used by real users in enterprise environments.

You will work closely with engineering, data, and product teams to deliver LLM-powered applications, including RAG-based document intelligence, chatbots, and AI assistants.

Key Responsibilities

* Architect and implement Generative AI solutions using LLMs (GPT, Claude, Mixtral, etc.)

* Design and deploy Retrieval-Augmented Generation (RAG) pipelines for document Q&A and enterprise search

* Build semantic search and embedding pipelines using vector databases (FAISS, OpenSearch, Pinecone)

* Select and optimize LLM models, prompts, and inference strategies for accuracy, latency, and cost

* Implement hallucination mitigation techniques (grounding, prompt constraints, validation layers)

* Design secure, scalable architectures on AWS (Bedrock, SageMaker, Lambda, API Gateway, S3)

* Fine-tune models using PEFT techniques (LoRA, QLoRA) when required

* Partner with MLOps teams to productionize models with CI/CD, monitoring, and rollback

* Optimize GenAI systems for cost, latency, and throughput

* Collaborate onsite with cross-functional teams (3 days/week in Raleigh)

Required Skills & Experience Generative AI & LLMs

* Strong understanding of LLM architectures and inference

* Hands-on experience with RAG systems in

* production

* Prompt engineering, temperature/top-p tuning

* Knowledge of LoRA / QLoRA / PEFT techniques

* Experience mitigating hallucinations and improving factuality

Embeddings & Retrieval

* Semantic embeddings (Sentence-BERT, OpenAI, etc.

* Chunking strategies and metadata handling

* Vector similarity search (cosine, dot-product)

* Vector databases: FAISS, OpenSearch, Pinecone

AWS & Cloud Architecture

* AWS AI/ML services: Bedrock, SageMaker

* Serverless & APIs: Lambda, API Gateway

* Data storage: S3, DynamoDB

* Security: IAM, KMS, VPC, CloudTrail

* Experience designing enterprise-grade, compliant systems

Programming & Frameworks

* Python (strong)

* Experience with LangChain, Haystack, FastAPI (or similar)

* Familiarity with async processing and caching layers

MLOps & Production

* Model versioning and monitoring

* CI/CD for ML systems

* Rollback strategies and drift detection

* Performance and cost monitoring

Nice to Have

* Experience with knowledge graphs integrated into GenAI

* PDF/document ingestion pipelines (OCR, Textract)

* Multi-tenant GenAI architectures

* Healthcare / Pharma / regulated industry experience

* Exposure to self-hosted open-source LLMs

Qualifications

* Bachelor’s or Master’s degree in Computer Science, AI/ML, or related field

* 7+ years in software/ML engineering, with 2+ years in GenAI/LLMs

* Proven experience deploying AI systems to production

Applications go to the hiring team directly