# Browser Automation via Mobile Phone
## Overview
Bridge is a system for programmatic browser automation that uses a real mobile browser (Firefox Android) as its execution engine. A Python server sends commands to a browser extension over WebSocket, enabling scraping and interaction with websites that rely on JavaScript rendering, SPAs, cookie state, or anti-bot detection — all through an actual browser session on a physical device.
Unlike headless browser tools (Puppeteer, Playwright), Bridge operates on a real phone with a real browser fingerprint, making it suitable for sites that block automated browsers.
## Architecture
```
Python scripts (CLI)
|
| (Unix socket or in-process)
v
Python WebSocket server
|
| WSS (TLS via reverse proxy)
v
Firefox extension (phone browser)
|
| tabs.executeScript / content script
v
Target website DOM
```
### Components
1. **Firefox Extension** — Runs on Firefox Android (Nightly). Connects to the server via WebSocket, authenticates with a shared token, and executes commands (navigate, run JS, click, fill, wait, screenshot, etc.) against the active browser tab.
2. **WebSocket Server** — Python asyncio server using the `websockets` library. Accepts one extension connection, authenticates it, and exposes a command interface. Also listens on a Unix domain socket for local CLI commands.
3. **Python Client SDK** (`BridgeClient`) — Thin async wrapper around the server's command interface. Used by site-specific automation scripts. Can connect in-process (for long-running orchestration) or via the Unix control socket (for one-off CLI commands).
4. **Site Modules** — Per-website automation scripts that combine navigation, JS extraction, and result formatting. Each module is a self-contained CLI tool.
5. **Reverse Proxy** — Nginx terminates TLS (via Let's Encrypt) and proxies `wss://` to the local WebSocket server. This allows the phone extension to connect securely over the internet.
---
## Extension
### Manifest (v2, Firefox)
```json
{
"manifest_version": 2,
"name": "Bridge",
"permissions": ["activeTab", "tabs", "<all_urls>", "cookies", "webNavigation", "storage"],
"background": { "scripts": ["background.js"], "persistent": true },
"content_scripts": [{ "matches": ["<all_urls>"], "js": ["content.js"], "run_at": "document_idle" }],
"browser_specific_settings": { "gecko": { "id": "bridge@local" } }
}
```
The extension requires Manifest V2 because Firefox Android Nightly supports sideloading `.xpi` files signed via AMO (addons.mozilla.org) as unlisted add-ons.
### Background Script
The background script is the core of the extension. It:
1. **Loads a stored auth token** from `browser.storage.local`.
2. **Connects to the WebSocket server** and sends `{ type: "auth", token: "..." }`.
3. **Receives commands** as `{ type: "command", id, command, params }` messages.
4. **Dispatches** each command to a handler function.
5. **Returns results** as `{ type: "result", id, success, data?, error? }`.
6. **Emits events** (e.g. `pageLoaded`) when navigation completes.
7. **Auto-reconnects** with exponential backoff + jitter on disconnection.
8. **Sends heartbeat pings** every 25 seconds to keep the connection alive.
#### Supported Commands
| Command | Params | Description |
|---|---|---|
| `getPageInfo` | — | Returns `{ url, title }` of the active tab |
| `navigate` | `{ url }` | Navigates the active tab to a URL |
| `executeJs` | `{ code, context? }` | Executes JavaScript in the active tab. `context: "page"` runs in the page's own JS context (needed for accessing page-scope variables); default `"content"` runs via `tabs.executeScript` |
| `getHtml` | `{ selector? }` | Returns `outerHTML` of a selector match, or the full document |
| `click` | `{ selector }` | Clicks the first element matching a CSS selector |
| `fill` | `{ selector, value }` | Sets a form field's value and dispatches `input`/`change` events |
| `scroll` | `{ y?, selector? }` | Scrolls by `y` pixels, or scrolls an element into view |
| `waitFor` | `{ selector, timeout? }` | Waits for a CSS selector to appear (MutationObserver-based), default 10s timeout |
| `screenshot` | — | Returns a `data:image/png;base64,...` screenshot of the visible tab |
| `getCookies` | `{ domain? }` | Returns cookies for a domain or the current page |
#### Token Prompt
If no token is stored, the extension injects a full-screen overlay into the current page (via the content script) prompting the user to enter the token phrase. This is more reliable on Firefox Android than using extension popups. The token is persisted in `browser.storage.local`.
#### Content Script
The content script serves two purposes:
1. **Page-context JS execution** — When `executeJs` is called with `context: "page"`, the content script injects a `<script>` element into the page and communicates results back via `window.postMessage`. This allows access to the page's own JavaScript scope (e.g., `__NEXT_DATA__`, Angular services).
2. **Token prompt overlay** — Renders and manages the token input UI.
#### Firefox Android Compatibility
- `getActiveTab()` uses 3 fallback strategies because `browser.tabs.query({ currentWindow: true })` is unreliable on Android.
- Auth rejection (WebSocket close code 4001) clears the stored token and re-prompts.
- All `ws.send()` calls are wrapped in try/catch.
### Building and Signing
The extension must be signed via the AMO API for Firefox Android to accept it:
```bash
web-ext sign \
--source-dir=extension \
--api-key="$AMO_API_KEY" \
--api-secret="$AMO_API_SECRET" \
--channel=unlisted \
--artifacts-dir=dist
```
Install on phone: download the `.xpi`, then in Firefox Nightly: Settings > Advanced > Install add-on from file.
---
## Server
### WebSocket Server (`ws_server.py`)
A single-file asyncio server with two interfaces:
**WebSocket interface** (for the extension):
- Listens on `127.0.0.1:8767` (behind the reverse proxy).
- First message must be `{ type: "auth", token }` matching the `BRIDGE_TOKEN` environment variable.
- On auth failure, closes with code 4001.
- Uses `websockets` library with `ping_interval=10, ping_timeout=10` for dead connection detection.
- Tracks pending commands as `{ id: Future }` — each `send_command()` creates a Future resolved when the extension sends back a matching result.
**Unix socket interface** (for local CLI):
- Listens at `/tmp/bridge-control.sock` (mode 0600).
- Accepts JSON messages: `{ command, params, timeout }`.
- Forwards to the extension and returns the result.
- Enables one-off commands without embedding the server in each script.
### CLI Entry Point (`__main__.py`)
```bash
# Start the server (blocks, waits for extension)
python -m server
# Send a one-off command to the running server
python -m server cmd getPageInfo
python -m server cmd executeJs 'document.title'
python -m server cmd navigate 'https://example.com'
```
### Deployment
The server runs as a systemd service on any Linux machine reachable from the internet (a cloud server, home server, etc.):
**systemd unit:**
```ini
[Unit]
Description=Bridge WebSocket Server
After=network.target
[Service]
Type=simple
WorkingDirectory=/opt/bridge
EnvironmentFile=/opt/bridge/.env
ExecStart=/opt/bridge/.venv/bin/python -m server
Restart=always
RestartSec=3
```
**Environment file** (`.env`):
```
BRIDGE_TOKEN=your shared secret phrase
```
**Nginx reverse proxy:**
```nginx
location / {
proxy_pass http://127.0.0.1:8767;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400s;
proxy_send_timeout 86400s;
}
```
TLS is provided by Let's Encrypt / certbot with the nginx plugin.
---
## Python Client SDK
### `BridgeClient`
```python
from bridge.client import BridgeClient
# Connect via Unix control socket (talks to running server)
client = BridgeClient.connect()
# Use in an async context
info = await client.get_page_info() # -> { url, title }
await client.navigate("https://example.com")
result = await client.execute_js("document.title")
await client.click("#submit-button")
await client.fill("#email", "user@example.com")
await client.wait_for(".results-loaded", timeout=15000)
html = await client.get_html(".product-card")
await client.scroll(y=500)
screenshot_data_url = await client.screenshot()
cookies = await client.get_cookies(domain=".example.com")
```
### `run_js_file()`
For complex extraction logic, JavaScript is stored in separate `.js` files and loaded at runtime:
```python
result = await client.run_js_file("sites/mysite/js/extract_data.js")
```
Supports `str.format()` placeholder substitution:
```python
result = await client.run_js_file("sites/mysite/js/search.js", query="coffee shops")
```
### Optional Command Logging
`BridgeClient` accepts a `run_id` for logging all commands to a SQLite database (command name, params, result, duration, errors). Useful for debugging and replay analysis.
---
## Site Modules
### Structure
Each site module lives under `sites/<sitename>/` with the following convention:
```
sites/
mysite/
__init__.py
search.py # Main CLI script
detail.py # Optional detail extraction
js/
extract_search.js # JS executed in the browser to extract data
extract_detail.js
accept_cookies.js
MYSITE.md # Documentation: DOM structure, selectors, quirks
```
### Pattern
Every site module follows the same pattern:
1. **Navigate** to the target URL (often with SPA cache-busting: navigate to `about:blank` first).
2. **Wait** for the page to load (`asyncio.sleep` or `client.wait_for`).
3. **Dismiss overlays** (cookie banners, login prompts).
4. **Execute JS** to extract structured data from the DOM.
5. **Return JSON** from the JS to Python.
6. **Pretty-print** results for CLI output.
7. **Optionally drill down** into detail views.
### JS Extraction Pattern
Extraction scripts are plain JavaScript that runs in the browser tab. They query the DOM, build a data structure, and return it as a JSON string:
```javascript
// sites/mysite/js/extract_search.js
var items = document.querySelectorAll(".product-card");
var results = [];
items.forEach(function(item) {
results.push({
name: item.querySelector(".title").textContent.trim(),
price: item.querySelector(".price").textContent.trim(),
url: item.querySelector("a").href
});
});
JSON.stringify({ count: results.length, results: results });
```
The last expression in the script is the return value. `JSON.stringify()` is used because `tabs.executeScript` serializes return values, and complex DOM-derived objects may not survive serialization.
### CLI Interface Pattern
Each module is invoked as a Python module with consistent argument conventions:
```bash
# Basic search
python -m sites.mysite.search "query term"
# Drill into a specific result (by 0-based index)
python -m sites.mysite.search "query term" --detail 0
# Pagination
python -m sites.mysite.search "query term" --page 2
# Site-specific flags
python -m sites.mysite.search "query term" --direct --class business
```
The `--detail N` pattern is universal: search first, then drill into result N for more information.
### Example: Library Catalog Search
```bash
# Search for books
python -m sites.library.search "borges"
# 0. Collected fictions Jorge Luis Borges Book [available]
# 1. Labyrinths Jorge Luis Borges Book [available]
# 2. The Aleph and other stories Jorge Luis Borges Book
# Show detail for first result (copies, availability, ISBN)
python -m sites.library.search "borges" --detail 0
# Page 2 of results
python -m sites.library.search "borges" --page 2
# Show 50 results per page
python -m sites.library.search "borges" --per-page 50
```
### Example: Flight Aggregator
```bash
# One-way flight search
python -m sites.flights.search AMS BKK 2026-03-17
# Return trip
python -m sites.flights.search AMS BKK 2026-03-17 --return 2026-03-24
# Direct flights only, business class
python -m sites.flights.search AMS BKK 2026-03-17 --direct --class business
# Show booking providers for flight #0
python -m sites.flights.search AMS BKK 2026-03-17 --detail 0
# Load more results
python -m sites.flights.search AMS BKK 2026-03-17 --more
```
### Example: Online Grocery Store (Search + Cart)
```bash
# Search products
python -m sites.grocery.search "milk"
# Add first result to cart
python -m sites.grocery.search "milk" --add 0
# Add 3 of something
python -m sites.grocery.search "milk" --add 0 -q 3
# Remove from cart (set quantity to 0)
python -m sites.grocery.search "milk" --add 0 -q 0
# Show product detail
python -m sites.grocery.search "milk" --detail 0
```
### Example: Google Maps Business Search
```bash
# Search for businesses, visits each result to extract full details
python -m sites.gmaps.search "coffee shops amsterdam"
# Outputs: name, website, phone, address, rating for each business
# Saves full results to data/gmaps_coffee_shops_amsterdam.json
```
---
## SPA and Anti-Detection Patterns
### SPA Cache Busting
Single-page applications cache state between navigations. To force a fresh page load:
```python
info = await client.get_page_info()
if 'targetsite.com' in info.get('url', ''):
await client.navigate("about:blank")
await asyncio.sleep(1)
await client.navigate(target_url)
```
### Cookie Banner Dismissal
Most sites show a cookie consent banner on first visit. Each module handles this with a small JS snippet:
```javascript
var btn = document.getElementById("cookie-accept");
if (btn) { btn.click(); "accepted"; } else { "no_banner"; }
```
### Angular / React Input Filling
Frameworks that use virtual DOMs or change detection often don't respond to `element.value = x`. Workarounds:
- **Angular (contenteditable):** Use `document.execCommand("insertText")` + dispatch `InputEvent`.
- **React:** Set the value, then dispatch `input` and `change` events with `{ bubbles: true }`.
- **Clear first:** Use Selection API + `execCommand("delete")` rather than setting `textContent = ""`.
### Avoiding Direct Navigation
Some sites return "Access Denied" when navigating directly to product URLs. The workaround is to search first, then click the product link in the search results page:
```python
# Don't do this — triggers bot detection:
await client.navigate("https://store.com/product/12345")
# Do this instead — click through from search:
await client.execute_js(f'document.querySelector(\'a[href="{href}"]\').click()')
```
### Extracting Data from Hidden Elements
Some SPAs render detail panels off-screen or with `visibility: hidden`. Use `textContent` instead of `innerText` (which respects CSS visibility):
```javascript
var panel = document.querySelector(".detail-panel");
// innerText returns "" if panel is hidden
// textContent returns the full text regardless of visibility
var data = panel.textContent;
```
---
## Database
An optional SQLite database (`data/bridge.db`) logs automation runs:
**Schema:**
- `runs` — Tracks each automation invocation (recipe name, start/end time, status, error).
- `results` — Stores extracted data per URL per run.
- `command_log` — Every command sent to the extension (command name, params, result, duration_ms, error).
This is useful for debugging failed extractions and measuring performance.
---
## Setup Checklist
### Server
1. Have a server reachable from the internet (cloud instance, home server, etc.) with a public IP and domain name.
2. Install Python 3.9+, nginx, certbot.
3. Create a Python venv and install `websockets`.
4. Set `BRIDGE_TOKEN` in a `.env` file.
5. Deploy the server code, nginx config, and systemd unit.
6. Obtain a TLS certificate with certbot.
7. Start the service: `systemctl start bridge`.
### Extension
1. Set the `WS_URL` constant in `background.js` to your `wss://` server URL.
2. Sign the extension via the AMO API (`web-ext sign`).
3. Install Firefox Nightly on an Android phone.
4. Enable the debug menu (Settings > About > tap logo 5x).
5. Install the `.xpi` via Settings > Advanced > Install add-on from file.
6. Visit any webpage — the token prompt appears.
7. Enter the same token phrase configured on the server.
### Adding a New Site Module
1. Create `sites/<sitename>/` with `__init__.py` and `search.py`.
2. Create `sites/<sitename>/js/` with extraction scripts.
3. Write a documentation file `sites/<sitename>/SITENAME.md` with:
- DOM structure and CSS selectors used.
- Known quirks and limitations.
- SPA behavior notes.
4. Follow the standard pattern: navigate, wait, dismiss overlays, extract, format.
5. Use `BridgeClient.connect()` for the client.
6. Make it runnable as `python -m sites.<sitename>.search "query"`.
---
## Protocol Reference
### WebSocket Messages (Server <-> Extension)
**Extension -> Server:**
```json
{ "type": "auth", "token": "shared secret" }
{ "type": "pong", "id": "uuid" }
{ "type": "result", "id": "uuid", "success": true, "data": ... }
{ "type": "result", "id": "uuid", "success": false, "error": "message" }
{ "type": "event", "id": "uuid", "event": "pageLoaded", "data": { "url": "...", "title": "..." } }
{ "type": "ping", "id": "uuid" }
```
**Server -> Extension:**
```json
{ "type": "auth_result", "success": true }
{ "type": "auth_result", "success": false }
{ "type": "command", "id": "uuid", "command": "executeJs", "params": { "code": "..." } }
{ "type": "ping", "id": "uuid" }
{ "type": "pong", "id": "uuid" }
```
### Control Socket Messages (CLI -> Server)
**Request:**
```json
{ "command": "executeJs", "params": { "code": "document.title" }, "timeout": 30 }
```
**Response:**
```json
{ "success": true, "data": "Page Title" }
{ "success": false, "error": "Extension not connected" }
```
---
## Dependencies
- **Server:** Python 3.9+, `websockets` (single pip dependency)
- **Extension:** Firefox 68+ (Manifest V2), no external dependencies
- **Infrastructure:** nginx (reverse proxy + TLS), certbot (Let's Encrypt), systemd
- **Optional:** SQLite (command logging), `web-ext` (extension signing)
## Limitations
- **Single browser tab** — Commands target the active tab. Running multiple automations concurrently is not supported.
- **Single extension connection** — The server accepts one extension at a time.
- **Timing-dependent** — Extraction relies on `asyncio.sleep()` waits for page loads. Adjust delays per site and network conditions.
- **Phone must stay awake** — The browser must remain in the foreground (or at least active) during automation. Screen-off or app switching may disconnect the WebSocket.
- **Manual cookie/login state** — Login sessions are managed by the real browser. If a site requires login, log in manually first; the automation uses the existing session.