PrivacyLens — Shitij Mathur

Stack

TypeScript
React 18
Vite
Transformers.js
Tailwind CSS
Chrome MV3
Express

What it does

PrivacyLens reads the text, PDFs, and images you're about to send to an AI service and runs the whole scan in your browser. When it finds personal data, it stops the message before it leaves your device. A fast regex pass gives you feedback as you type, and an on-device model (openai/privacy-filter, a 1.5B sparse mixture-of-experts classifier that runs through Transformers.js) does the deeper read on submit. Nothing is sent until you approve it or redact what it found.

How it works

Two scanners run together. A regex pass reacts to each keystroke, and an in-browser model does token classification when you submit. Their results are merged by confidence so overlapping matches resolve to one.
It handles more than plain text. pdfjs-dist pulls text out of PDFs, Tesseract.js runs OCR on images in the browser, and the Canvas API blacks out the detected regions so you can compare the original against the redacted copy.
The Ethics Logic Gate is a hard block in code. If the scan finds any personal data, the send pipeline stops and the button is disabled. You can still override it, but that takes a second confirmation.
The Chrome extension (Manifest V3) watches the input box on ChatGPT, Claude, Gemini, Perplexity, and Copilot, and you review what it caught in a side panel.
It runs two ways: as a full-stack app with an Express server that proxies Claude, or as a static build with no backend, where detection falls back to the pattern scanner and the in-browser model.

Key decisions

Detection runs locally rather than in the cloud, so the privacy guarantee comes from the architecture instead of a written policy. No content reaches a server before you approve it.
The gate blocks the send instead of warning about it, because you cannot recall a message once an AI service has it. The safe default is to hold the message until you decide.
The model is a sparse mixture of experts with 1.5B parameters total but only about 50M active per token, routed top-4 across 128 experts. That keeps semantic detection fast enough to run in the browser without a GPU.
Files the scanner cannot read are blocked by default, so personal data cannot slip through an unsupported format.
Built for the Kiro Spark Challenge on the ethics theme, under an MIT license.