Skip to main content

Building the AI Ingredient Scanner: A Multi-Agent Approach

ยท 4 min read
Uday Tamma
Building AI-Powered Applications

The AI Ingredient Scanner started as an exploration into multi-agent LLM architectures and evolved into a full-stack application with mobile support and multi-language OCR.

Project Visionโ€‹

Create an application that analyzes food and cosmetic ingredient labels, providing personalized safety assessments based on user profiles (allergies, skin type, dietary restrictions).

Phase 1: Multi-Agent Architectureโ€‹

The Agent Designโ€‹

Built a three-agent system, each with a specific role:

  1. Research Agent: Retrieves ingredient safety data

    • Primary: Qdrant vector database with pre-indexed safety information
    • Fallback: Google Search for unknown ingredients
    • Caches results for performance
  2. Analysis Agent: Generates comprehensive reports

    • Powered by Gemini 2.0 Flash
    • Considers user profile for personalization
    • Produces structured safety assessments
  3. Critic Agent: Quality validation

    • 5-gate validation system
    • Checks for accuracy, completeness, and relevance
    • Can request re-analysis if quality thresholds aren't met

Tech Stack (Phase 1)โ€‹

  • LLM: Google Gemini 2.0 Flash
  • Vector DB: Qdrant Cloud
  • Framework: LangChain + LangGraph
  • UI: Streamlit
  • Observability: LangSmith tracing

Key Featuresโ€‹

  • PDF export with colored safety bars
  • Share via Email/WhatsApp/Twitter
  • User profiles for personalized analysis
  • Ingredient-by-ingredient breakdown

Phase 2: Mobile App & OCRโ€‹

The Mobile Challengeโ€‹

Users wanted to scan labels directly from products. This required:

  • Native camera integration
  • OCR for text extraction
  • Multi-language support (labels aren't always in English)

Solution Architectureโ€‹

[Mobile App] --> [FastAPI Backend] --> [Multi-Agent System]
| |
v v
[Camera/Gallery] [OCR + Translation]

React Native/Expo Implementationโ€‹

Built the mobile app with Expo for cross-platform support:

  • ImageCapture: Camera interface with gallery picker
  • IngredientCard: Expandable details with safety metrics
  • ProfileSelector: Allergies, skin type, preferences
  • Dark/Light theme toggle

Multi-Language OCRโ€‹

Implemented support for 9+ languages:

  • Auto-detection of source language
  • Translation to English for analysis
  • Original text preserved in results

Languages supported: English, Spanish, French, German, Italian, Portuguese, Japanese, Korean, Chinese

FastAPI REST Backendโ€‹

Created dedicated endpoints for mobile:

  • POST /ocr - Extract text from images
  • POST /analyze - Run ingredient analysis
  • Swagger docs at /docs

Phase 3: Web Platform Supportโ€‹

The Web Export Challengeโ€‹

After building the mobile app, the next step was making it accessible via web browsers. Expo provides web support through react-native-web, but some components needed platform-specific implementations.

Platform-Specific Componentsโ€‹

Created dual implementations for components that differ between native and web:

ImageCapture.tsx      # Native: expo-camera, expo-image-picker
ImageCapture.web.tsx # Web: MediaDevices API, file input

React Native's bundler automatically selects the correct file based on platform.

Web Camera Implementationโ€‹

The web version uses browser APIs:

  • navigator.mediaDevices.getUserMedia() for camera access
  • Falls back to file picker if camera unavailable
  • Canvas API for image capture from video stream

API Environment Detectionโ€‹

Updated the API service to auto-detect environment:

const getApiBaseUrl = (): string => {
if (Platform.OS === 'web') {
return 'https://api.zeroleaf.dev'; // Production
}
return __DEV__ ? LOCAL_IP : PRODUCTION_API;
};

Testing Suiteโ€‹

Added comprehensive Jest tests:

  • Type validation tests for API contracts
  • Component rendering tests
  • Theme context behavior tests
  • API service tests

Browser-Specific Challengesโ€‹

Building for web uncovered platform differences:

  1. Camera initialization: Browser camera requires async permission flow with loading states
  2. File picker: Web uses native <input type="file"> instead of expo-image-picker
  3. Mode switching: Added mode prop to ImageCapture for direct camera vs gallery access

Deploymentโ€‹

ServicePlatformURL
Backend APIRailwayapi.zeroleaf.dev
Streamlit UIRailwayingredient-analyzer.zeroleaf.dev
Web AppCloudflare Pagesscanner.zeroleaf.dev
MobileExpo Go / Native-

Lessons Learnedโ€‹

  1. Agent orchestration matters: The critic agent catches errors that would slip through a single-agent approach.

  2. Vector DB as primary source: Faster and more reliable than web search for known ingredients.

  3. Mobile-first considerations: Camera permissions, image sizing, and network handling add complexity.

  4. Multi-language is hard: OCR accuracy varies by language and image quality.

  5. Platform abstractions help: React Native Web makes cross-platform development feasible, but platform-specific components still need careful handling.

  6. Environment detection is crucial: Automatically switching between development and production APIs reduces configuration errors.

What's Nextโ€‹

  • App store deployment (iOS/Android)
  • Barcode scanning for product lookup
  • Ingredient history and favorites
  • Community-contributed safety data