AI Systems Engineer (LLM Performance, Cost & Reliability) | Audit → Recommend → Implement

🌍 Remote, USA 🚀 Full-time 🕐 Posted Recently

Job Description

Overview

Jules is a mobile AI-powered style and dating photo coach. We analyze outfit photos and dating profile images, score them, and give actionable feedback using LLMs and vision models.

The product is live, architected, and thoughtfully built.

What we need now is systems-level optimization.

We’re looking for a senior engineer to audit, optimize, and harden our LLM infrastructure — reducing latency and cost while improving reliability and consistency — without changing product flows or UX.

This is not a greenfield build.

This is not prompt polishing.

This is a real production system that needs to scale.

What You’ll Do

Phase 1

Audit all LLM usage across the system:

FitCheck (vision)

PicReview (vision)

Comparison modes

Conversational chat

Analyze:

Latency bottlenecks (user-perceived and backend)

Cost per request / feature / user

Model usage vs actual requirements

Prompt size, retries, determinism, and waste

Review existing cost instrumentation and update pricing assumptions

Deliverable:

A written audit outlining:

Current performance & cost profile

Clear problem areas

Ranked list of optimization opportunities with estimated impact

Phase 2 — Optimize & Implement

Implement agreed optimizations directly in the codebase, which may include:

Multi-model routing (cheap → expensive fallback)

Vision + text model rationalization

Caching (hash-based, context-based, or result reuse)

Async coordination improvements (queues, batching, retries)

Prompt minimization and structural refactors (not stylistic rewrites)

More accurate cost tracking and reporting

Ensure output stability and scoring consistency are preserved

Deliverable:

Merged code changes

Before/after latency and cost comparison

Clear documentation of decisions and tradeoffs

What You Will Not Do

To be explicit:

❌ Redesign product flows, UX, or scoring logic

❌ Rewrite Jules’ persona or tone

❌ “Improve” the product by adding features

❌ Push unnecessary infra churn before instrumentation

❌ Suggest fine-tuning as a first solution

Your job is to make the engine faster, cheaper, and more reliable, not change the car.

Technical Environment (You’ll Be Working Inside This)

Frontend: React Native (Expo, TypeScript)

Backend: Node.js + Express

Database: MongoDB

AI: OpenAI (GPT-4o for vision, GPT-4.1-mini for chat)

Infra: Cloudinary (images), Firebase Auth, Segment, Sentry

Architecture: Async API calls, structured JSON outputs, prompt routing system

Full architecture documentation will be provided on engagement start.

What We’re Looking For

Required

Deep experience optimizing production LLM systems

Strong intuition for cost vs latency vs quality tradeoffs

Hands-on backend engineering skills (Node.js)

Experience with:

model routing

async systems

caching strategies

deterministic LLM outputs

Nice to Have

Vision model experience

Experience evaluating multiple inference providers

Prior startup or zero-to-scale experience

Engagement Details

Type: Short-term contract

Length: TBD

Scope: Audit → Recommend → Implement

Potential extension: Yes, based on results

Timezone: Flexible, but on Pacific Time

Apply tot his job

Apply To this Job

Ready to Apply?

Don't miss out on this amazing opportunity!

🚀 Apply Now

Similar Jobs

Recent Jobs

You May Also Like