A two-model local AI system. Your iPhone captures images, sends them privately over Tailscale to your laptop, where Gemma 4 E4B analyzes them and MedGemma provides specialist medical interpretation — all without a single byte leaving your network.
User photographs a document — handwritten prescription, lab report, X-ray, receipt — or picks an existing image from the gallery. Safari's native camera picker handles HEIC, JPEG, PNG.
Image uploads over encrypted WireGuard mesh to 100.65.243.70:8000. Zero public internet exposure. The FastAPI backend normalizes the image to ≤800px JPEG and extracts base64 for downstream models.
LiteRT-LM loads the Gemma 4 E4B model on CPU via XNNPACK. The vision encoder processes image patches, then the language model generates a detailed analysis — OCR of handwritten text, structural interpretation, and clinical observations. Tokens stream to the iPhone in real-time via SSE. Hindi/Indic scripts are recognized natively.
The server scans both the user's prompt and Gemma 4's response for ~80 medical keywords (symptoms, medications, imaging terms, anatomy). If medical content is detected — or the user forced it — the pipeline chains to MedGemma with both the original image and Gemma 4's full analysis as context.
MedGemma 1.5 4B runs on Ollama using the 6 GB GPU. It receives the image directly (multimodal via base64) plus Gemma 4's OCR and structural analysis. This dual-input approach means MedGemma can see the image itself while also leveraging Gemma 4's superior text extraction for handwritten Indic prescriptions. Response streams back to the iPhone as a separate blue "MEDICAL" card.
Gemma 4 E4B runs on CPU via XNNPACK (3.65 GB model, 16 GB RAM). MedGemma runs on GPU via Ollama (6 GB VRAM). Both models run simultaneously without memory contention.
Zero cloud inferenceThe server auto-detects medical content using keyword matching on both the user prompt and Gemma 4's response. Medical images chain to MedGemma automatically — no manual toggle needed.
~80 medical keywordsMedGemma receives both the raw image and Gemma 4's full text analysis. This is critical for handwritten Indic prescriptions where Gemma 4's OCR captures text that MedGemma alone might miss.
Image + context forwardingTailscale WireGuard mesh — no port forwarding, no public IP. Images are deleted immediately after inference. Nothing touches the internet. Patient data stays on your devices.
HIPAA-aligned architecture