Private · Local · Zero Cloud

Private Vision Relay

A two-model local AI system. Your iPhone captures images, sends them privately over Tailscale to your laptop, where Gemma 4 E4B analyzes them and MedGemma provides specialist medical interpretation — all without a single byte leaving your network.

📱

iPhone 13 (PRODUCT)RED

4 GB RAM 512 GB Storage A15 Bionic 12 MP Camera Safari UI

Tailscale → 100.92.189.116

🔒 Tailscale

WireGuard®

💻

Linux Workstation

16 GB RAM 6 GB VRAM 16 Core CPU Ubuntu XNNPACK

Tailscale → 100.65.243.70

Architecture

Two-Stage Vision + Medical Pipeline

📸 Image Capture iPhone

User photographs a document — handwritten prescription, lab report, X-ray, receipt — or picks an existing image from the gallery. Safari's native camera picker handles HEIC, JPEG, PNG.

↓

🔒 Tailscale Relay WireGuard

Image uploads over encrypted WireGuard mesh to 100.65.243.70:8000. Zero public internet exposure. The FastAPI backend normalizes the image to ≤800px JPEG and extracts base64 for downstream models.

↓

👁 Gemma 4 E4B — Vision Analysis CPU · 3.65 GB

LiteRT-LM loads the Gemma 4 E4B model on CPU via XNNPACK. The vision encoder processes image patches, then the language model generates a detailed analysis — OCR of handwritten text, structural interpretation, and clinical observations. Tokens stream to the iPhone in real-time via SSE. Hindi/Indic scripts are recognized natively.

↓

🧬 Medical Routing — Agentic Decision Auto-Detect

The server scans both the user's prompt and Gemma 4's response for ~80 medical keywords (symptoms, medications, imaging terms, anatomy). If medical content is detected — or the user forced it — the pipeline chains to MedGemma with both the original image and Gemma 4's full analysis as context.

↓

🏥 MedGemma 1.5 — Medical Specialist GPU · Ollama

MedGemma 1.5 4B runs on Ollama using the 6 GB GPU. It receives the image directly (multimodal via base64) plus Gemma 4's OCR and structural analysis. This dual-input approach means MedGemma can see the image itself while also leveraging Gemma 4's superior text extraction for handwritten Indic prescriptions. Response streams back to the iPhone as a separate blue "MEDICAL" card.

Live Example

Handwritten Hindi Prescription

Hindi Prescription — Adi Shankar Hospital

📄 Adi Shankar Hospital, Jabalpur (M.P.)

Vision · Stage 1 Gemma 4 E4B — 193s on CPU

Patient: Ajay Kumar, 44/M, Sattar, Jokapur
Vitals: BP 100/90 · Pulse 85/min · Temp 99.1°F
Diagnosis: White Coat Hypertension
Medications:
• Inderal-64 — 1 tablet twice daily (morning)
• Amlodipine Plus 6 — 1 tablet at night
Advice: कम नमक का सेवन (low salt intake), ABPM monitoring 8–5 days, योगा/ध्यान (yoga/meditation)
Doctor: Dr. Aditya Parihar

Medical · Stage 2 MedGemma 1.5 4B — GPU via Ollama

Clinical Assessment: 44-year-old male with Stage 1 Hypertension (diastolic 90 mmHg elevated). Dual antihypertensive therapy prescribed — beta-blocker (Propranolol) + calcium channel blocker (Amlodipine).

Key Observations: Inderal suggests possible angina or arrhythmia risk. ABPM referral indicates white-coat hypertension workup. Low-salt diet and yoga are evidence-based lifestyle interventions.

Recommended: Regular BP monitoring, ECG if cardiac symptoms persist, medication adherence review at follow-up.

Under the Hood

Key Design Decisions

🧠

CPU + GPU Split

Gemma 4 E4B runs on CPU via XNNPACK (3.65 GB model, 16 GB RAM). MedGemma runs on GPU via Ollama (6 GB VRAM). Both models run simultaneously without memory contention.

Zero cloud inference

🔗

Agentic Routing

The server auto-detects medical content using keyword matching on both the user prompt and Gemma 4's response. Medical images chain to MedGemma automatically — no manual toggle needed.

~80 medical keywords

👁

Dual-Model Vision

MedGemma receives both the raw image and Gemma 4's full text analysis. This is critical for handwritten Indic prescriptions where Gemma 4's OCR captures text that MedGemma alone might miss.

Image + context forwarding

🔒

Privacy by Design

Tailscale WireGuard mesh — no port forwarding, no public IP. Images are deleted immediately after inference. Nothing touches the internet. Patient data stays on your devices.

HIPAA-aligned architecture