# Browser Automation via Mobile Phone ## Overview Bridge is a system for programmatic browser automation that uses a real mobile browser (Firefox Android) as its execution engine. A Python server sends commands to a browser extension over WebSocket, enabling scraping and interaction with websites that rely on JavaScript rendering, SPAs, cookie state, or anti-bot detection — all through an actual browser session on a physical device. Unlike headless browser tools (Puppeteer, Playwright), Bridge operates on a real phone with a real browser fingerprint, making it suitable for sites that block automated browsers. ## Architecture ``` Python scripts (CLI) | | (Unix socket or in-process) v Python WebSocket server | | WSS (TLS via reverse proxy) v Firefox extension (phone browser) | | tabs.executeScript / content script v Target website DOM ``` ### Components 1. **Firefox Extension** — Runs on Firefox Android (Nightly). Connects to the server via WebSocket, authenticates with a shared token, and executes commands (navigate, run JS, click, fill, wait, screenshot, etc.) against the active browser tab. 2. **WebSocket Server** — Python asyncio server using the `websockets` library. Accepts one extension connection, authenticates it, and exposes a command interface. Also listens on a Unix domain socket for local CLI commands. 3. **Python Client SDK** (`BridgeClient`) — Thin async wrapper around the server's command interface. Used by site-specific automation scripts. Can connect in-process (for long-running orchestration) or via the Unix control socket (for one-off CLI commands). 4. **Site Modules** — Per-website automation scripts that combine navigation, JS extraction, and result formatting. Each module is a self-contained CLI tool. 5. **Reverse Proxy** — Nginx terminates TLS (via Let's Encrypt) and proxies `wss://` to the local WebSocket server. This allows the phone extension to connect securely over the internet. --- ## Extension ### Manifest (v2, Firefox) ```json { "manifest_version": 2, "name": "Bridge", "permissions": ["activeTab", "tabs", "", "cookies", "webNavigation", "storage"], "background": { "scripts": ["background.js"], "persistent": true }, "content_scripts": [{ "matches": [""], "js": ["content.js"], "run_at": "document_idle" }], "browser_specific_settings": { "gecko": { "id": "bridge@local" } } } ``` The extension requires Manifest V2 because Firefox Android Nightly supports sideloading `.xpi` files signed via AMO (addons.mozilla.org) as unlisted add-ons. ### Background Script The background script is the core of the extension. It: 1. **Loads a stored auth token** from `browser.storage.local`. 2. **Connects to the WebSocket server** and sends `{ type: "auth", token: "..." }`. 3. **Receives commands** as `{ type: "command", id, command, params }` messages. 4. **Dispatches** each command to a handler function. 5. **Returns results** as `{ type: "result", id, success, data?, error? }`. 6. **Emits events** (e.g. `pageLoaded`) when navigation completes. 7. **Auto-reconnects** with exponential backoff + jitter on disconnection. 8. **Sends heartbeat pings** every 25 seconds to keep the connection alive. #### Supported Commands | Command | Params | Description | |---|---|---| | `getPageInfo` | — | Returns `{ url, title }` of the active tab | | `navigate` | `{ url }` | Navigates the active tab to a URL | | `executeJs` | `{ code, context? }` | Executes JavaScript in the active tab. `context: "page"` runs in the page's own JS context (needed for accessing page-scope variables); default `"content"` runs via `tabs.executeScript` | | `getHtml` | `{ selector? }` | Returns `outerHTML` of a selector match, or the full document | | `click` | `{ selector }` | Clicks the first element matching a CSS selector | | `fill` | `{ selector, value }` | Sets a form field's value and dispatches `input`/`change` events | | `scroll` | `{ y?, selector? }` | Scrolls by `y` pixels, or scrolls an element into view | | `waitFor` | `{ selector, timeout? }` | Waits for a CSS selector to appear (MutationObserver-based), default 10s timeout | | `screenshot` | — | Returns a `data:image/png;base64,...` screenshot of the visible tab | | `getCookies` | `{ domain? }` | Returns cookies for a domain or the current page | #### Token Prompt If no token is stored, the extension injects a full-screen overlay into the current page (via the content script) prompting the user to enter the token phrase. This is more reliable on Firefox Android than using extension popups. The token is persisted in `browser.storage.local`. #### Content Script The content script serves two purposes: 1. **Page-context JS execution** — When `executeJs` is called with `context: "page"`, the content script injects a `