Building the AI Ingredient Scanner: A Multi-Agent Approach

December 24, 2025 · 4 min read

Building AI-Powered Applications

The AI Ingredient Scanner started as an exploration into multi-agent LLM architectures and evolved into a full-stack application with mobile support and multi-language OCR.

Project Vision

Create an application that analyzes food and cosmetic ingredient labels, providing personalized safety assessments based on user profiles (allergies, skin type, dietary restrictions).

Phase 1: Multi-Agent Architecture

The Agent Design

Built a three-agent system, each with a specific role:

Research Agent: Retrieves ingredient safety data
- Primary: Qdrant vector database with pre-indexed safety information
- Fallback: Google Search for unknown ingredients
- Caches results for performance
Analysis Agent: Generates comprehensive reports
- Powered by Gemini 2.0 Flash
- Considers user profile for personalization
- Produces structured safety assessments
Critic Agent: Quality validation
- 5-gate validation system
- Checks for accuracy, completeness, and relevance
- Can request re-analysis if quality thresholds aren't met

Tech Stack (Phase 1)

LLM: Google Gemini 2.0 Flash
Vector DB: Qdrant Cloud
Framework: LangChain + LangGraph
UI: Streamlit
Observability: LangSmith tracing

Key Features

PDF export with colored safety bars
Share via Email/WhatsApp/Twitter
User profiles for personalized analysis
Ingredient-by-ingredient breakdown

Phase 2: Mobile App & OCR

The Mobile Challenge

Users wanted to scan labels directly from products. This required:

Native camera integration
OCR for text extraction
Multi-language support (labels aren't always in English)

Solution Architecture

[Mobile App] --> [FastAPI Backend] --> [Multi-Agent System]
     |                  |
     v                  v
[Camera/Gallery]   [OCR + Translation]

React Native/Expo Implementation

Built the mobile app with Expo for cross-platform support:

ImageCapture: Camera interface with gallery picker
IngredientCard: Expandable details with safety metrics
ProfileSelector: Allergies, skin type, preferences
Dark/Light theme toggle

Multi-Language OCR

Implemented support for 9+ languages:

Auto-detection of source language
Translation to English for analysis
Original text preserved in results

Languages supported: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese

FastAPI REST Backend

Created dedicated endpoints for mobile:

POST /ocr - Extract text from images
POST /analyze - Run ingredient analysis
Swagger docs at /docs

Phase 3: Web Platform Support

The Web Export Challenge

After building the mobile app, the next step was making it accessible via web browsers. Expo provides web support through react-native-web, but some components needed platform-specific implementations.

Platform-Specific Components

Created dual implementations for components that differ between native and web:

ImageCapture.tsx      # Native: expo-camera, expo-image-picker
ImageCapture.web.tsx  # Web: MediaDevices API, file input

React Native's bundler automatically selects the correct file based on platform.

Web Camera Implementation

The web version uses browser APIs:

navigator.mediaDevices.getUserMedia() for camera access
Falls back to file picker if camera unavailable
Canvas API for image capture from video stream

API Environment Detection

Updated the API service to auto-detect environment:

const getApiBaseUrl = (): string => {
  if (Platform.OS === 'web') {
    return 'https://api.zeroleaf.dev';  // Production
  }
  return __DEV__ ? LOCAL_IP : PRODUCTION_API;
};

Testing Suite

Added comprehensive Jest tests:

Type validation tests for API contracts
Component rendering tests
Theme context behavior tests
API service tests

Browser-Specific Challenges

Building for web uncovered platform differences:

Camera initialization: Browser camera requires async permission flow with loading states
File picker: Web uses native <input type="file"> instead of expo-image-picker
Mode switching: Added mode prop to ImageCapture for direct camera vs gallery access

Deployment

Service	Platform	URL
Backend API	Railway	api.zeroleaf.dev
Streamlit UI	Railway	ingredient-analyzer.zeroleaf.dev
Web App	Cloudflare Pages	scanner.zeroleaf.dev
Mobile	Expo Go / Native	-

Lessons Learned

Agent orchestration matters: The critic agent catches errors that would slip through a single-agent approach.
Vector DB as primary source: Faster and more reliable than web search for known ingredients.
Mobile-first considerations: Camera permissions, image sizing, and network handling add complexity.
Multi-language is hard: OCR accuracy varies by language and image quality.
Platform abstractions help: React Native Web makes cross-platform development feasible, but platform-specific components still need careful handling.
Environment detection is crucial: Automatically switching between development and production APIs reduces configuration errors.

What's Next

App store deployment (iOS/Android)
Barcode scanning for product lookup
Ingredient history and favorites
Community-contributed safety data

Project Vision​

Phase 1: Multi-Agent Architecture​

The Agent Design​

Tech Stack (Phase 1)​

Key Features​

Phase 2: Mobile App & OCR​

The Mobile Challenge​

Solution Architecture​

React Native/Expo Implementation​

Multi-Language OCR​

FastAPI REST Backend​

Phase 3: Web Platform Support​

The Web Export Challenge​

Platform-Specific Components​

Web Camera Implementation​

API Environment Detection​

Testing Suite​

Browser-Specific Challenges​

Deployment​

Lessons Learned​

What's Next​