AI Architect
UsefulBI CorporationFull Description
Position : AI Architect
Location: Foster City, CA/Raleigh, NC
Work Model: Hybrid – 3 days onsite / 2 days remote
About the Role
We are looking for a hands-on AI Architect to design, build, and deploy production-
grade Generative AI systems on AWS. This role goes beyond experimentation—you will architect secure, scalable, and cost-efficient GenAI solutions used by real users in enterprise environments.
You will work closely with engineering, data, and product teams to deliver LLM-powered applications, including RAG-based document intelligence, chatbots, and AI assistants.
Key Responsibilities
* Architect and implement Generative AI solutions using LLMs (GPT, Claude, Mixtral, etc.)
* Design and deploy Retrieval-Augmented Generation (RAG) pipelines for document Q&A and enterprise search
* Build semantic search and embedding pipelines using vector databases (FAISS, OpenSearch, Pinecone)
* Select and optimize LLM models, prompts, and inference strategies for accuracy, latency, and cost
* Implement hallucination mitigation techniques (grounding, prompt constraints, validation layers)
* Design secure, scalable architectures on AWS (Bedrock, SageMaker, Lambda, API Gateway, S3)
* Fine-tune models using PEFT techniques (LoRA, QLoRA) when required
* Partner with MLOps teams to productionize models with CI/CD, monitoring, and rollback
* Optimize GenAI systems for cost, latency, and throughput
* Collaborate onsite with cross-functional teams (3 days/week in Raleigh)
Required Skills & Experience Generative AI & LLMs
* Strong understanding of LLM architectures and inference
* Hands-on experience with RAG systems in
* production
* Prompt engineering, temperature/top-p tuning
* Knowledge of LoRA / QLoRA / PEFT techniques
* Experience mitigating hallucinations and improving factuality
Embeddings & Retrieval
* Semantic embeddings (Sentence-BERT, OpenAI, etc.
* Chunking strategies and metadata handling
* Vector similarity search (cosine, dot-product)
* Vector databases: FAISS, OpenSearch, Pinecone
AWS & Cloud Architecture
* AWS AI/ML services: Bedrock, SageMaker
* Serverless & APIs: Lambda, API Gateway
* Data storage: S3, DynamoDB
* Security: IAM, KMS, VPC, CloudTrail
* Experience designing enterprise-grade, compliant systems
Programming & Frameworks
* Python (strong)
* Experience with LangChain, Haystack, FastAPI (or similar)
* Familiarity with async processing and caching layers
MLOps & Production
* Model versioning and monitoring
* CI/CD for ML systems
* Rollback strategies and drift detection
* Performance and cost monitoring
Nice to Have
* Experience with knowledge graphs integrated into GenAI
* PDF/document ingestion pipelines (OCR, Textract)
* Multi-tenant GenAI architectures
* Healthcare / Pharma / regulated industry experience
* Exposure to self-hosted open-source LLMs
Qualifications
* Bachelor’s or Master’s degree in Computer Science, AI/ML, or related field
* 7+ years in software/ML engineering, with 2+ years in GenAI/LLMs
* Proven experience deploying AI systems to production