We are sharing a specialised part-time consulting opportunity for bilingual professionals with strong English and Arabic fluency, excellent analytical judgment, and the ability to evaluate AI-generated responses across diverse real-world topics.

This role supports an exciting collaboration with leading AI teams focused on improving the quality, usefulness, and reliability of general-purpose conversational AI systems used across a wide range of everyday and professional scenarios.

Selected professionals will assess model-generated responses, conduct fact-checking using trusted public sources and external tools, provide structured human feedback, and help ensure that advanced AI systems communicate in ways that are accurate, well-reasoned, clear, and aligned with human expectations.

Key Responsibilities

Professionals in this role may contribute to:

Response Evaluation & Quality Review

Evaluate LLM-generated responses based on how effectively they answer user queries

Assess reasoning quality, clarity, tone, and completeness across diverse topics and use cases

Ensure model responses align with expected conversational behavior and system guidelines

Fact-Checking & Annotation

Conduct fact-checking using trusted public sources and external tools

Generate high-quality human evaluation data by annotating strengths, areas for improvement, and factual inaccuracies

Apply consistent annotations using defined taxonomies, benchmarks, and detailed evaluation guidelines

AI Feedback & Generalist Analysis

Identify factual inaccuracies, reasoning errors, and communication gaps in model responses

Produce clear, consistent, and reproducible evaluation artifacts

Help improve AI response quality and user experience through structured feedback and analytical review

Ideal Profile

Strong candidates may have:

A Bachelor's degree

Native speaker level fluency or ILR 5 / primary fluency in Arabic, equivalent to CEFR C2

Strong English fluency and excellent written communication skills

Significant experience using large language models and understanding how and why people use them

Strong attention to detail and the ability to identify subtle issues others may miss

Adaptability across topics, domains, and customer requirements

A background in areas requiring structured analytical thinking such as research, policy, analytics, linguistics, engineering, or related fields

Excellent college-level mathematics skills

Current location in Egypt, Saudi Arabia, UAE, or USA only

Preferred Qualifications

Prior experience with RLHF, model evaluation, or data annotation work

Experience writing or editing high-quality written content

Experience comparing multiple outputs and making fine-grained qualitative judgments

Familiarity with evaluation rubrics, benchmarks, or quality scoring systems

Why This Opportunity

Contribute specialised bilingual expertise to a high-impact AI collaboration

Help improve how advanced language models behave in real-world conversational settings

Work at the frontier of human-in-the-loop AI development with meaningful impact on systems used by millions

Flexible remote contract work with competitive compensation

Contract Details

Independent contractor role

Fully remote with flexible scheduling

Open to full-time or part-time contract work

Compensation of $22.64 per hour

Fluent Language Skills Required In Both English And Arabic

Location restricted to Egypt, Saudi Arabia, UAE, and USA

Projects may be extended, shortened, or concluded early depending on project needs and performance

Weekly payments via Stripe or Wise

Work will not involve access to confidential or proprietary information from any employer, client, or institution

Please note: We are unable to support H1-B or STEM OPT candidates at this time

Start date: Immediate

About The Platform

This opportunity is available through a leading AI-driven work platform that connects domain experts with frontier AI research projects.

Experts contribute to improving advanced AI systems by providing specialised expertise across real-world workflows, structured evaluation, model training support, and domain-specific human feedback.

By submitting this application, you acknowledge that your information may be processed by 24-MAG LLC for recruitment and opportunity matching in accordance with our Privacy Policy: https://www.24-mag.com/privacy-policy

Remote | Generalist - English & Arabic — $22.64/hour

Skills & Expertise

Key Responsibilities

Full Description