riff

← Home · ~/riff · updated yesterday

riff

A native iOS client for the self-hosted Claude box you already run. If you have a Mac on a tailnet with the claude CLI (the OpenClaw setup, or any self-hosted Claude Code box), Riff puts it on your phone as a real terminal — full SwiftTerm fidelity driving the live claude REPL over SSH — with the Action Button dictating voice straight into the session. Not a messaging bridge: the actual terminal.

  • Terminal = zero server. Remote Login on, your phone's SSH key in authorized_keys, tmux + claude on the Mac. That's it. (Voice is an optional add-on — see Voice server quickstart.)
  • Vendor-neutral. Plain SSH + tmux + Claude Code. Riff does not use or require OpenClaw — OpenClaw users are just the audience who've already done the prerequisites. If you stop running OpenClaw, Riff keeps working.
  • TestFlight: <TESTFLIGHT_PUBLIC_LINK — fill in after the App Store Connect external group + public link are created (Phase 4.1)>

See Getting started (any user) and the OpenClaw quickstart to set up in ~30 seconds. The rest of this README is the canonical spec.


A custom iOS app that puts Mark's Mac mini claude CLI on his phone as a live terminal, with the Action Button driving voice dictation straight into that terminal. Riff opens to a full-screen SwiftTerm view attached over SSH to a persistent tmux session running claude; press the Action Button, talk, and the transcribed text is typed into the live REPL.

Current primary surface: the SSH terminal + voice-inject (2026-05-22). The original build (and builds through 20) was a chat client — an iMessage-style thread that round-tripped through riff_server → a claude poll session → APNs. That chat UI is shelved (files kept, unlinked from the view tree) in favor of talking to claude directly in a terminal. See Terminal architecture and the Kept / shelved map. The chat-era docs below (chat client, conversation store, APNs reply path) describe the dormant path; they still run server-side but the terminal never calls them.

The Action Button on iPhone 15 Pro and later is a configurable hardware button; today the only sensible voice path on a locked iPhone is Siri, which conflicts with Wispr Flow (which Mark keeps on for everything else). Blink (the obvious off-the-shelf SSH terminal) has no Action-Button voice dictation into the session — that single capability is the reason Riff exists: the shipped record→Scribe pipeline supplies it, repointed to type into the terminal instead of posting to a chat thread.

Terminal architecture (primary surface, 2026-05-22)

iPhone (Terminal tab — SwiftTerm full-bleed)
   │ keystrokes (TerminalViewDelegate.send)        ▲ output bytes (feed)
   ▼                                               │
┌──────────────────────────────────────────────────────────┐
│ TerminalController (owns transport + reconnect state)      │
│   • TerminalSurface: UIViewRepresentable<TerminalView>     │
│   • VoiceInjectController: record → /transcribe-only →     │
│     inject transcript via transport.write (same byte path) │
└───────────────┬────────────────────────────────────────────┘
                │  TerminalTransport (the swap seam)
                ▼
┌──────────────────────────────────────────────────────────┐
│ SSHTransport (SwiftNIO SSH, Apache-2.0)                    │
│   ClientBootstrap → TCP :22 → NIOSSHHandler (client)       │
│     • auth: on-device ed25519 key (Keychain) via publickey │
│     • host key: TOFU pin (trust on 1st connect, hard-fail  │
│       on change — PinnedHostKeyDelegate + HostKeyStore)    │
│     └─ session channel → pty-req (xterm-256color, cols×rows)│
│          → exec `tmux new-session -A -s riff -c ~/agents …`│
│          • inbound channel/stderr bytes → feed() SwiftTerm  │
│          • write() → channel stdin ; resize() → window-chg  │
└───────────────┬────────────────────────────────────────────┘
                │  SSH over Tailscale (no Funnel — tailnet only)
                ▼
┌──────────────────────────────────────────────────────────┐
│ Mac mini : sshd (Remote Login) → tmux session `riff`       │
│   • $SHELL -lc 'exec tmux -L riff new-session -A -s riff    │
│     -c "$HOME" env -u TMUX -u TMUX_PANE claude …'           │
│     (login shell → PATH resolves tmux/claude; no abs paths) │
│   • the live `claude` REPL the phone drives                │
│   • survives disconnects; every (re)connect re-attaches to │
│     the SAME session (claude keeps its context)            │
└────────────────────────────────────────────────────────────┘

The terminal talks to claude directly over SSH. The chat round-trip through riff_server (/riff/message, conversation store, multi-turn replay, APNs) is bypassed entirelyclaude's own CLI context is the memory. The only riff_server call on the terminal's hot path is POST /riff/transcribe-only (Scribe text for voice-inject; no claude, no APNs, no conversation write).

The TerminalTransport seam (and why mosh is deferred)

TerminalTransport (in Terminal/TerminalTransport.swift) abstracts the byte channel: connect(), write(), resize(), disconnect(), onOutput, onClosed. The rendering surface, keystroke path, resize logic, and voice-inject are all written against this protocol, so the transport underneath is swappable.

Riff ships SSH only and stays permissively licensed (SwiftTerm MIT + SwiftNIO SSH Apache-2.0). An earlier plan staged mosh as the eventual transport (instant local echo + roaming across network changes). It is deliberately not built here: mosh is GPLv3+, and Riff is being built so it could be sold — shipping a GPL transport (or any reused Blink mosh component) would impose GPL distribution obligations and is famously incompatible with App Store terms. The clean-licensing requirement outranks the roaming nicety. If mosh is ever revisited, the TerminalTransport seam is where a MoshTransport would drop in — but only behind a deliberate licensing decision. For now, network changes are handled by auto-reconnect (below), not roaming.

Auto-reconnect

SSH drops on a network change (Wi-Fi ↔ cellular, tailnet re-route). The TerminalController watches transport.onClosed; on an unexpected drop it re-SSHes and re-execs the tmux new-session -A attach line on a bounded backoff (1, 2, 4, 8, 15s), surfacing connecting / connected / reconnecting / disconnected (tap to retry) as a status chip. Because the tmux session lives server-side, a reconnect re-attaches to the same live claude with its context intact. A deliberate disconnect() (or app teardown) does not trigger reconnect.

The bottom button bar floods waveformRed while (re)connecting, but the glow is debounced (connectingGlowDelay, ~0.3s): it appears only if the connecting/reconnecting state persists past that, so a fast connect — a new session over the already-live SSH/tmux, or a fast reconnect — does not flash red. Red-on-tap is reserved for the hold-to-close affordance. (Future, not built: a minimum-show floor so a glow that does appear can't flash-and-vanish if connect finishes just after the threshold.)

Voice-inject (the unique value)

Action Button → .riffToggleVoiceInjectController.toggle(): - not recording → start the mic (reused RecordingViewModel + AudioFileWriter, unchanged AAC/m4a capture); - recording → stop, upload the clip to POST /riff/transcribe-only, get back {transcript}, and type it into the terminal via transport.write — the same byte path keystrokes take.

Inject does NOT auto-press Return by default (autoSubmitVoice, a Settings toggle, default off). The transcript lands at the claude prompt; Mark eyeballs it, edits a misheard word, and presses Return himself. STT is imperfect and a wrong auto-submitted prompt to a coding agent is costly. Flip the toggle on to auto-Return (\r, 0x0D) if editing turns out to be rare. The setting governs every stop gesture uniformly — the mic-button stop, the Action Button, and a tap on the recording waveform all honor it.

While a clip is transcribing the recording LED grid stays on screen with every bar lit (with a gentle ~0.9s breathing pulse) — the lit grid is the progress indicator; there is no spinner on the mic button. The grid clears the instant the status returns to idle (or shows the error text on failure).

SSH key / authorization

The ed25519 keypair is generated on-device on first run and the private key is stored in the iOS Keychain (never UserDefaults, never the bundle, never the repo). The public key is shown in Settings → SSH public key as a copy-to-clipboard authorized_keys line. The user pastes it into the Mac's ~/.ssh/authorized_keys once to authorize the phone; revoke by deleting that line. (See Terminal/SSHKeyStore.swift.)

Host-key trust (TOFU pinning)

Riff pins the Mac's SSH host key on a Trust-On-First-Use basis (Terminal/HostKeyStore.swift + PinnedHostKeyDelegate in SSHTransport.swift), replacing the original accept-any-key delegate:

  • First connect to a host:port → the presented host key is trusted silently and persisted (keyed by host:port, in the shared UserDefaults suite — host keys are public, so they do not go in the Keychain).
  • Every later connect → the presented key must match the pinned one, exactly as ssh checks known_hosts.
  • A CHANGED key is never auto-accepted — the connection HARD-FAILS with a man-in-the-middle warning and surfaces on the status chip as "Disconnected" with the mismatch reason. This is the security win over accept-any.

The pinned key's SHA256 fingerprint is shown in Settings → Host key, where a careful user can verify it out-of-band against ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub on the Mac (or ssh-keyscan). Settings → Reset trusted host key clears the pin for the current host:port so a user who legitimately reinstalled the Mac (new host key) or pointed Riff at a different box can re-TOFU on the next connect — this is the documented recovery for the hard-fail.

v1 policy (deliberate): silent auto-TOFU on first use + hard-fail on mismatch, with no inline "trust this key?" modal. The eager multi-session bootstrap (SessionManager.bootstrap) connects several sessions concurrently, so a blocking prompt would race N ways. The Settings fingerprint review/reset is the careful-user affordance instead; an interactive first-use confirm (which must serialize the first connect) is a possible fast-follow.

Mac-side contract (what must be true)

  1. Remote Login (SSH) ONSystem Settings → General → Sharing → Remote Login = ON (or sudo systemsetup -setremotelogin on). Without it the SSH connect is refused. The agent cannot verify this non-interactively; Mark confirms.
  2. The phone's public key is in ~/.ssh/authorized_keys (copied from Settings → SSH public key).
  3. A persistent tmux session named riff. Optional to pre-create: the iOS SSHTransport runs tmux -L riff new-session -A -s riff (attach-or- create) on connect, so the session is created on first phone connection. To have it exist before the phone connects (snappier first attach), the optional scripts/riff-tmux-up.sh (+ a per-user LaunchAgent) brings it up at login. A terminal-only user does NOT need it.
  4. The PATH footgun — solved by a login shell (no absolute paths). A non-login SSH exec shell has neither Homebrew nor ~/.local/bin on PATH, which is why the original build hardcoded absolute paths to tmux + claude. The de-Marked build instead runs everything through the user's login shell$SHELL -lc 'exec tmux -L riff new-session -A -s riff -c "$HOME" env -u TMUX -u TMUX_PANE claude …' — which sources the user's profile so tmux and claude resolve on $PATH on ANY Mac. No absolute paths, no bundled launcher script. The riff.tmux.conf essentials are applied inline (tmux \;-chained set -g). env -u TMUX is the truecolor trick (claude downgrades to 256-color when it sees $TMUX). The start directory defaults to $HOME and is configurable in Settings (riff.ssh.startDir). The harness — the command launched on that exec slot — is a single free-text string (riff.ssh.harness, default the literal claude): a binary name, an absolute path, or any shell command (codex, /usr/local/bin/codex, aider --model x), run verbatim, mirroring how the start directory is configurable. The string claude (the default) or empty keeps the native Claude launch (RiffTmux.claudeLaunch(), worktree fork intact); any OTHER string launches inline via RiffTmux.launchFor instead of claude. See RiffTmux in SSHTransport.swift; mirror any change in riff-tmux-up.sh.

Worktree (opt-in, default OFF)

A Worktree toggle in iOS Settings ▸ Claude (riff.claude.worktree, default false) makes each NEW session run claude inside its own git worktree on its own branch, isolating concurrent sessions' working trees. OFF (the default) ⇒ behavior is exactly today's. The toggle ONLY buys isolation — there is no auto-merge, no conflict resolution, no branch UI, no reaper; you merge riff/<session> yourself if you want it.

How it works when ON: the new session's tmux env carries RIFF_WORKTREE=1 + RIFF_START_DIR=<start dir> (spliced as tmux -e flags in RiffTmux.createCommand), and the launch routes through scripts/riff-claude.sh if it's on the Mac's login $PATH (else it falls back to the inline launch — no wedge, nothing to install for a worktree-off user). If RIFF_START_DIR is a git work tree, the script creates-or-reuses a worktree at ~/Library/Application Support/riff/worktrees/<repo>/<session> on branch riff/<session> (forked from origin/HEAD's default branch, else current HEAD), serialized by an atomic mkdir lock (no flock on macOS), then cds in. Any failure falls back to cd "$RIFF_START_DIR" — a worktree problem never wedges a session. All path logic lives in the shell scripts, never in Swift.

On session close (long-press the + and release with your finger still on the button — while held the whole bottom bar fades to the waveform red as the hold affordance), SessionManager.closeCurrent best-effort fires scripts/riff-worktree-remove.sh <session> over the no-PTY control channel (gated on the toggle) to reclaim the worktree dir — keeping the branch. It's fire-and-forget so it can't stall/break close; a leftover worktree is harmless (the next create's worktree prune tidies it). To use the feature, put both scripts on the Mac's login $PATH (e.g. symlink into ~/bin).

New Session customization (harness + launch directory)

The harness is a single free-text string — the command a new session launches, run verbatim: a binary name, an absolute path, or any shell command (claude, /usr/local/bin/codex, codex, aider --model x). It defaults to the literal claude. The string claude or empty keeps the native Claude launch (RiffTmux.claudeLaunch(), worktree fork + exact claude args intact); any OTHER string launches inline via RiffTmux.launchFor instead. There is no Claude/Shell/Custom picker — typing $SHELL or bash reaches a bare shell through the same verbatim path.

One shared store, last-write-wins. The harness lives under the single key riff.ssh.harness; the launch directory is the existing riff.ssh.startDir. BOTH surfaces — iOS Settings ▸ Session (Harness + Directory) and the double-tap "New Session" sheet — read AND write those same two keys. The sheet is pre-filled from the current Settings.startDir + Settings.harness (NOT from a recents MRU), and on Create it persists the edited values back to those keys, so the next double-tap and iOS Settings both default to the new values. There is no separate "default" vs "last-used" — they are the same stored values. (Settings.recentStartDirs survives only as a tappable cwd-autocomplete affordance, never as the pre-fill.)

The + button's tap is an instant new session with those shared defaults. When the harness is claude/empty the tap is byte-identical to before this feature: the create path resolves a nil spec (SessionManager.defaultSpecForPlainTap() returns nil), so it still routes through claudeLaunch() with the worktree fork intact. Any other harness builds a spec carrying Settings.startDir + .custom(command) and launches it inline via launchFor (no worktree script — the launcher only knows claude). The + ALWAYS creates a genuinely-new session and NEVER attaches to an existing one. The next free riff-N (N ≥ 2) is computed across the union of the live server's session names AND the in-memory pages — not the in-memory list alone. Before minting the name, createSession re-enumerates the live -L riff sessions (best-effort; a failed refresh falls back to in-memory only) so it cannot pick a name that exists server-side but was dropped from the in-memory list off-LAN (which is exactly how the + used to attach to a forgotten/orphaned session). As a belt-and-suspenders for a name that races in between the refresh and the create, a new-session -d that tmux rejects with duplicate session: NAME is surfaced as DuplicateSessionError; the manager then bumps N and retries (bounded), never attaching to the collision. Recovering forgotten/orphaned sessions you lost access to is reconcile()'s job (foreground / reconnect — see below), NOT the +'s; the two roles stay distinct. Its long-press (0.5s) exactly preserves the existing close behavior: hold and release with the finger still on the button → Close Session (the bottom bar floods red while held, same as before). The customization menu is a separate gesture: hold, then slide the finger ≥44pt off the button before lifting → a small menu (a SwiftUI .confirmationDialog) opens instead of closing. (44pt is the original close-tolerance radius, so anything that would have closed before still closes — the menu only appears on a deliberate slide-off.) The menu:

  • New Session… → opens NewSessionMenu, a compact bottom card with exactly two fields (identical to iOS Settings) + a Create Session button:
  • Directory — a text field pre-filled from the current Settings.startDir, accepting $HOME / ~ / an absolute path (expanded by the login shell at -c "<dir>"). A short recents list offers tappable MRU suggestions, but the pre-fill is startDir, not the recents. The cwd is the injection-sensitive field and is contained inside the outer login-shell -lc '…' (covered by cwdWithSpacesAndQuotesIsContainedSafely).
  • Harness — a single monospaced text field pre-filled from Settings.harness, holding a command run verbatim after the env -u TMUX -u TMUX_PANE prefix (e.g. codex, aider --model x; quote your own args, the cwd is escaped for you). claude/empty keeps the native Claude launch.
  • Create Session persists both edited fields back to Settings.startDir + Settings.harness (the shared store — so the next double-tap and iOS Settings default to them), records the dir into the recents suggestions (Settings.recentStartDirs, capped 6, deduped), then creates.
  • Close Session (destructive) — closes the current session. (This is the same action as the plain hold-and-release-on-button; it's in the menu too so a slide-off hold can still reach it.)

How the create is plumbed (bake-at-create): the sheet builds a NewSessionSpec { cwd, harness } and calls SessionManager.createSession(spec:)SessionManaging.createSession(named:spec:)RiffTmux.newDetached(name, cwd:harness:)createCommand. The cwd/harness are optional and default to nil, falling back to the globals; a nil/nil create is byte-identical to today (pinned by noOverrideIsByteIdenticalToTodaysNewDetached). Every harness still launches under env -u TMUX -u TMUX_PANE via the login shell (truecolor preserved), composed by RiffTmux.launchFor from the same live ClaudeArgs globals as the no-override path. The string↔Harness mapping is Settings.harness(from:) / Harness.rawString; the model + builder are NewSessionSpec.swift; the sheet is NewSessionMenu.swift.

Worktree interaction (MVP scope): when Worktree is ON and a per-session cwd override is given, that override is used for RIFF_START_DIR (the worktree forks from the chosen base) — for the claude harness only. A custom harness combined with worktree-on is out of MVP: the launcher script only knows how to launch claude, so a custom command runs the inline launch and does NOT route through riff-claude.sh even if the toggle is on. Most users have worktree OFF, so this is a corner of a corner. Not in MVP: a remote directory browser (text field + recents only).

Session reconcile & enumeration robustness (self-heal a desynced list)

The in-memory session list can silently desync from the live -L riff tmux server when bootstrap enumeration fails quietly — an off-LAN slow handshake or an early channel close that yields an empty/partial result indistinguishable from "no sessions exist." The app would then show only the base riff page even though riff-2 … riff-N were alive server-side, and recovery used to require a force-quit on LAN. Three mechanisms close that gap:

  1. management run() distinguishes a timeout / early-empty-close from a genuine empty list, and THROWS so callers can retry. The ; echo __RIFF_EOF__ sentinel is the dividing line: a successful zero-session list-sessions still echoes the sentinel → the buffer is non-empty → .output("") (a real empty list, NOT a throw). Only a handshake that dies before the sentinel echoes resolves to a typed ManagementError (.timeout on a hard-timeout with an empty buffer; .closedEmpty on an early onClosed(nil); .connect on a real connect/auth failure). So "server up, zero sessions" still yields exactly the base page and never errors — while the false-empty that lost sessions is now a retryable signal.
  2. Bootstrap retries enumeration with backoff (enumerateWithRetry, default [0, 1, 2]s, 3 attempts) before falling back to base-only. The base session still comes up immediately (time-to-first-paint is unchanged — only the extra-page enumeration waits); a genuine empty list returns [] on the first try (no retry), and a truly-unreachable Mac exhausts the retries and leaves just the base page (which drives its own visible reconnect).
  3. The app re-enumerates + reconcile()s on foreground and after a reconnect (TerminalScreen observes scenePhase == .active and the rising edge of the active session into .connected). reconcile() diffs the in-memory pages against a fresh live list WITHOUT tearing down healthy connections: it appends a page (+ eager-connect) for every live session the list forgot, removes a page whose session vanished server-side (EXCEPT the base riff, always kept — it's attach-or-create and may be mid-creation), never reorders a surviving page, and anchors the active session by NAME across the reshuffle. A transient enumeration failure is a no-op (keep what we have) — reconcile never drops a live page just because one probe timed out, and it never kills server-side (the × is the only kill). So a desync self-heals without a relaunch; the old force-quit recovery is gone.

The no-PTY management exec's hard timeout is Settings.managementTimeout (riff.ssh.managementTimeout, default 9s — raised from the old 4s because off-LAN no-PTY handshakes can exceed 4s, which is what caused the silent enumeration loss). Internal key only; no Settings row.

Requirements (regression checklist) — multi-session integrity (Features 4)

  • M1 — New Session ALWAYS creates a NEW session (never attaches to an existing one), even when the in-memory list is stale off-LAN: the name is the lowest free riff-N across server ∪ memory, and a duplicate session collision is detected and bumped/retried.
  • M2 — a desynced session list self-heals on foreground / reconnect (live sessions the in-memory list forgot appear as their own pages; pages whose session vanished server-side drop) with no relaunch. A transient probe failure keeps the current pages; reconcile never kills server-side.
  • M3 — a genuine empty server (zero sessions) still yields exactly the base page and does not error (the sentinel keeps a real empty list off the retry/throw path).

Share Extension (image OR video → active session)

A Share Extension (RiffShare, bundle id mark.riff.share) puts Riff in the iOS share sheet for a single image OR a single video — share a screenshot, a photo, or a screen recording (from Photos or any source) and it gets attached to the active terminal session exactly as the in-app photo/video button does. Photos are encoded JPEG (q 0.85 — a full-res camera photo is a few MB, not ~30 MB as PNG); videos are deposited as their original bytes, no transcoding.

The extension can't foreground the host app itself (iOS forbids a share extension from opening its container — only Today widgets may), so the media is queued, then a one-tap notification brings Riff to the front to attach it — no Shortcut required:

  1. RiffShare/ShareViewController.swift (a bare UIViewController, programmatic principal class — no storyboard, NSExtensionPrincipalClass in the plist) pulls the first media attachment (preferring a movie, else an image): an image is re-encoded to JPEG; a video's original file is copied via loadFileRepresentation. It writes the file into the App Group container (group.mark.riff) under share-inbox/<epoch>-<uuid>.<ext> (the real extension — .jpg/.mov/.mp4) via SharedImageInbox.deposit(_:ext:), then posts an "Open in Riff" local notification (Shared/ShareNotification.swift) and completeRequests immediately (no compose UI — the share feels instant). A local notification scheduled from an app extension is attributed to the containing app, so it shows up as a Riff banner.
  2. Tap the "Open in Riff" notification → Riff foregrounds and the media attaches — the one-tap, no-Shortcut handoff. The host drains the inbox on foreground: RiffApp posts .riffSharedMediaAvailable on scenePhase == .active, and the AppDelegate UNUserNotificationCenterDelegate re-posts it on the tap too (belt-and-suspenders if the scene was already active). TerminalScreen observes it (it owns the SessionManager) and runs each queued file through the same RiffClient.uploadMediainjectText path as the photo/video button, consolidated on SessionManager.attachSharedMedia (it derives a content type from the extension; video routes through a dedicated large-upload URLSession). The Mac path is typed into the active session's input line, and claude can Read/ffprobe/inspect it. Notification auth is the same grant as push (requested once at launch) and the handoff degrades gracefully: if notifications are denied, the deposit still lands and attaches the next time you open Riff. A multi-item share coalesces into a single banner (stable request id); one tap drains the whole queue. No Shortcut is required — the optional ShareToRiffIntent App Intent is a Siri/Shortcuts convenience, not a prerequisite.
  3. Cold launch / no active session yet: a file shared while Riff was terminated stays in the inbox; the host re-drains on bootstrap and again once the active session reaches .connected (it is NOT dropped). A file is removed only AFTER a successful upload+inject; a failed upload leaves it queued for the next foreground. SharedImageInbox.purgeStale caps leftover lifetime at 24h so a permanently-failing file can't accumulate. An in-flight Set<URL> guards two quick foregrounds from uploading the same file twice (dedupe).

Right-session targeting: shared media (and the in-app picker) inject into the last-active session, not always the base riff. SessionManager persists the active session by name (Settings.lastActiveSessionName, set on page/create/close) and restores currentIndex from it at the end of bootstrap() — so Riff reopens on the session you last used and media lands there.

Size cap: uploads are bounded at 200 MB (RiffClient.maxUploadBytes mirrors the server's MAX_UPLOAD_BODY). The in-app picker pre-checks the file size and shows "Video too large (NNN MB > 200 MB)" rather than starting a doomed upload; the server's 413 is the backstop.

SharedImageInbox.swift lives in ios/Shared/ and is compiled into both the host and the extension target — it is the single source of the container contract (the type name stays SharedImageInbox for less churn; it handles media). The extension carries only the App Group entitlement (no APNs / audio / location), so its first provisioning against the pre-registered mark.riff.share App ID succeeds.

Session paging — architecture & the native-pager trap (READ before touching SessionPager)

Why this is severity-1: the horizontal pager is the only way to reach your other tmux sessions. If the swipe breaks, you are trapped in whichever session is showing — every other session is unreachable. A dead swipe is not a polish bug, it's "half the app is inaccessible."

The trap — it cost two long sieges (builds 62–75, then 107–112). Riff hosts the terminal stack in SwiftUI via UIViewControllerRepresentable. In that embedding, UIPageViewController's own scroll pan never recognizes a touch, and neither does SwiftTerm's UIScrollView pan — SwiftUI's gesture layer suppresses both. An on-device gesture probe (build 111) proved it conclusively: a raw catch-all UIPanGestureRecognizer added to the pager view logs every swipe (so the touch does arrive), but the native pager pan stays silent and its dataSource is never queried. Every attempt to make the native pager work failed because they all lean on that dead pan:

build attempt why it failed
62–63 velocity gate / direction-locked vertical pan a 1-finger scroll pan on the deep terminal excludes the pager's 1-finger pan ("deeper view wins")
66 TWO-finger scroll so paging is 1-finger-clear worked then only because the view tree predated the current embedding
67–68 require(toFail:) arbitration deadlocked / "fought the pager"; native pan still never fired
71–72 drop custom code, isScrollEnabled=true "native cooperation" the native pan is suppressed — this is the regression that broke it for good
75–78 give up, discrete pageRelative snap on release not interactive; later read as "swipe does nothing"
107–110 restore 66/67/68 configs, toggle isScrollEnabled all still depend on the dead native pan

The fix (build 112) — own the gesture. There is no UIPageViewController and no native paging. SessionPagerPagerHostVC lays every session's persistent SessionPageVC side-by-side in a content strip and drives EVERYTHING off ONE plain UIPanGestureRecognizer we add ourselves (the only kind that fires here): - horizontal-dominant drag → translate the strip 1:1 with the finger; release past ⅓ width (or a velocity flick) commits to the neighbour, else snaps back; - vertical-dominant drag → forward SGR mouse-wheel to tmux (+ a momentum glide); - axis is latched once per gesture, BOTH are single-finger, and the pan uses cancelsTouchesInView=false + simultaneous recognition so taps / typing / keyboard / link-tap / swipe-down all still work.

terminalView.isScrollEnabled is false (in SessionController) so SwiftTerm's own pan can't compete; we forward the wheel ourselves anyway (a tmux attach has no local scrollback).

Scrolled-up input → copy-mode (the stray-q fix, #2). A wheel-up forwards an SGR mouse-wheel to tmux, which (via its default WheelUpPane binding) enters copy-mode with copy-mode -e. The -e flag means tmux auto-exits copy-mode the moment the pane scrolls back to the live bottom — there is NO explicit cancel at the bottom. Before delivering any input while scrolled, SessionController.exitScrollbackIfNeeded sends a copy-mode cancel keystroke to snap the pane back to the live bottom first — otherwise tmux eats the typed/dictated text as copy-mode key bindings.

That cancel keystroke is F12 (\u{1b}[24~), not a bare q (SessionController.copyModeCancel is the single source of truth). .tmux.conf binds F12 to send -X cancel in both copy-mode tables (copy-mode + copy-mode-vi) and root-guards it: bind -T root F12 if -F '#{pane_in_mode}' { send -X cancel }. At a live prompt pane_in_mode is 0, so tmux consumes F12 as a silent no-op — it never reaches Claude Code. This is what makes the cancel leak-proof: unlike the old bare q (a literal character at the prompt), firing F12 against a live prompt writes nothing. Deploy ordering matters: the tmux binds must be live on the -L riff server (tmux source-file ~/.tmux.conf) BEFORE a build that emits F12 ships, or F12 forwards raw to Claude Code in the gap.

SessionController still tracks net scroll depth (scrollState/scrollDepth: wheel-up adds ticks, wheel-down subtracts, floored at 0) and only emits the cancel when scrollDepth > 0 — but this is now a best-effort optimization (skip the write when we were never scrolled), not the correctness mechanism. The client-side counter can desync from tmux's real position: an over-scroll past the top of scrollback inflates it (so it reads >0 while tmux is already at the live bottom — the stray-q direction), and streaming output that shifts the live bottom can leave it at 0 while still in copy-mode (the build-174 input-eaten direction). Correctness now comes from the emitted key being harmless at the live prompt (the root guard), not from the counter being exact. The stray-q direction is fully fixed; the rarer input-eaten direction is not fully solved here (it needs a tmux-side signal) — this change doesn't regress it (the cancel still fires whenever depth>0) but don't mistake it for closed.

THE RULE: do not reintroduce UIPageViewController, and do not rely on any nested UIScrollView/native pan for paging in this SwiftUI embedding — it will look like it should work and silently won't (that exact assumption burned dozens of builds). Paging lives on the custom pan in PagerHostVC. If paging breaks, first confirm PagerHostVC still owns the pan and nobody re-enabled terminalView.isScrollEnabled.

Requirements (regression checklist) — session paging & render

Paging + vertical scroll are driven by ONE custom pan in PagerHostVC.handlePan; the terminal render also races page layout. The gesture/render rows (P0–P9) are NOT testable in the simulator — verify on device on every change to SessionPager.swift, SessionController.terminalView setup, or SessionManager geometry. The geometry framing invariant (G0) IS now simulator-tested (RiffTests/PagerGeometryTests); its keyboard-timing path still needs on-device verification (see ▸ Testing).

Paging (#4 — interactive finger-tracking): - P0 — every session is reachable by swiping (it's the only way; a dead swipe = trapped in one session — the severity-1 case above). - P1 — horizontal drag tracks the finger 1:1. The adjacent session is revealed as you drag (NOT a release-only snap). Release past ~⅓ width (or a flick) COMMITS; a short release SNAPS BACK. - P2 — no flicker / no double-slide on commit. currentIndex syncs via manager.page(to:) in the animation completion; the page doesn't re-slide. - P3 — vertical scroll still works (drag forwards SGR mouse-wheel to tmux). - P4 — vertical flick momentum glides and decays to a stop. - P5 — axis disambiguated cleanly (latched once per gesture: near-horizontal never scrolls, near-vertical never pages).

Preserved interactions (must survive ANY paging change — the historic churn zone): - P6 — tap raises the keyboard; typing reaches claude; the accessory row works. - P7 — swipe-DOWN dismisses the keyboard while it's up (gated to keyboard-up). - P8 — tapping a URL opens it; tapping empty space still raises the keyboard. - P9 — the +//photo bottom-bar buttons and the draggable cluster are unaffected.

Geometry (the terminal must FILL its page in BOTH keyboard states — builds 114→117): - G0 — terminal fills the screen with the keyboard DOWN and UP: no black void below it, no rows clipped under the bottom bar / home indicator. The trap: PagerHostVC hand-set its child page frames in viewDidLayoutSubviews, which missed the layout pass when the keyboard toggled. Result — keyboard-up the page (and the terminal in it) collapsed to a ~12-row sliver with a huge void below (on-device probe: page 140pt inside a 460pt strip); at other times it rendered full-screen with claude's bottom rows cut off under the button bar. UIPageViewController sized its children to its bounds on every pass automatically; a custom pager must too. The fix is Auto Layout, NOT manual frames: the terminal is pinned to its SessionPageVC view, and each page is pinned to the content strip (top/bottom = full height, width = pager width, leading constant = i×pageWidth). The constraint engine then re-fits them on every bounds change. If a void or a bottom-clip ever returns, look for a view.frame = that should be a constraint — manual child framing in this embedding silently misses keyboard-driven resizes. The framing invariant (given a layout pass, the terminal fills its page at any bounds) is now simulator-tested — RiffTests/PagerGeometryTests hosts PagerHostVC in a UIWindow and asserts it at a keyboard-down and a keyboard-up height (see ▸ Testing). The keyboard-driven TIMING path (the actual builds-114→117 failure — a forced layoutIfNeeded can't reproduce a missed pass) still requires on-device verification with the keyboard up, backed by the #if DEBUG layoutPages assert. - G1 — the center dictation button responds to the FIRST tap at cold launch, with NO keyboard toggle first: kill Riff → cold launch → tap the Flying-V → dictation starts. ROOT-CAUSED in build 165 (traced in code, not guessed). History — six fixes (builds 155/156/158/160/163) each guessed at the MECHANISM and missed: the gesture-type theories (duck audio → route the tap through a Button → add a LongPressGesture) all failed, the recognizer was never it; build 163's coldLaunchNudge (a 1pt frame nudge forcing a relayout) also failed. Build 164 stopped guessing and shipped a Settings-gated hit-test probe to NAME the layer eating the touch. The actual cause was then found by reading the code: bottomSafeInset reads the key window's home-indicator inset, but there is no key window during the first layout passes after a cold launch, so it returns 0 (its own doc-comment admits this). keyboardLift then can't cancel the home-indicator double-count, so lift computes as the full ~34pt home inset while the keyboard is down — rendering the stack in the keyboard-UP geometry (over-tall + .clipped()), which desyncs the bar's hittable region from where it's drawn. The button is visible but the touch lands in dead space. The first keyboard toggle re-reads a now-valid inset → lift snaps to 0 → taps work. This uniquely explains why 163's nudge failed: a frame nudge never touched lift. The fix (homeIndicatorInset(windowInset:proxyInset:)): when the window read is 0, fall back to the GeometryReader's own bottom inset — at cold launch the keyboard is down, so the proxy bottom IS the home indicator → lift cancels to 0 on the very first layout. Once a key window exists the window read wins, so the hard-won smooth-slide path is byte-for-byte unchanged. Guarded by KeyboardLiftTests.coldLaunchUsesProxyInsetWhenNoKeyWindow (red→green: lift 34→0) and keyWindowInsetWinsOverProxyWhenPresent (slide-path regression guard). The probe stays in (Settings ▸ Diagnostic ▸ Hit-test probe — toggled live, no rebuild) as the device-side check: with it on, a cold-launch reading should now show lift: 0 and down/act/dict all incrementing on the first tap. If the dead tap ever recurs, the counters name the layer: - down=0 → the touch never reached the Button (a view ABOVE it — the pager pan recognizer, SwiftTerm, or a misplaced hit region — swallowed it); - downact=0 → press recognized, tap canceled before firing (a competing recognizer claimed the sequence); - actdict=0 → an early-return guard (inputHoldFired/clusterDidReposition) ate it; - dict↑ but no dictation → onInputTap/VoiceInjectController is the culprit. Device-only (no simulator repro of cold-launch touch timing); reproduce by kill → launch → first-tap, ×3–5.

(Render requirements R1–R5 are added with the single-column render fix.)

Requirements (regression checklist) — Share Extension (Feature 3, #5)

The extension hands media (images + videos) to the host through the App Group; the host reuses the in-app attach path (SessionManager.attachSharedMedia). Verify on device — the App Group container + share sheet don't work in the simulator.

  • S1 — Riff appears in the iOS share sheet for an IMAGE or a VIDEO (screenshot / photo / screen recording), and NOT for text/URLs (NSExtensionActivationSupportsImageWithMaxCount: 1 + NSExtensionActivationSupportsMovieWithMaxCount: 1, in BOTH project.yml and the authoritative RiffShare/Info.plist).
  • S2 — sharing media, then opening Riff, types the uploaded server path into the ACTIVE session's input line — identical to the in-app photo/video button (same uploadMediainjectText path via SessionManager.attachSharedMedia). Photos arrive .jpg; videos arrive .mov/.mp4, full and playable (no transcode).
  • S3 — cold launch: media shared while Riff was terminated lands after bootstrap connects (not dropped; stays in the inbox until a session is .connected, re-drained on the connection-state change).
  • S4 — dedupe: shared media is injected exactly once, even across two quick foregrounds (in-flight Set<URL> guard + remove-on-success; <epoch>-<uuid>.<ext> filenames are unique).
  • S5 — cleanup: processed files are deleted from the App Group inbox; purgeStale removes anything older than 24h that never got processed.
  • S6 — minimal entitlements: RiffShare carries ONLY the App Group (group.mark.riff) — no APNs/audio/location — so first-build provisioning against the pre-registered mark.riff.share App ID succeeds.
  • S7 — the extension returns fast (completeRequest), no long compose UI; the host, not the extension, foregrounds and attaches.
  • S8 — right-session targeting: media injects into the LAST-ACTIVE session, not always base riff. Settings.lastActiveSessionName persists by name; the bootstrap restore reopens Riff on it.
  • S9 — size cap: a >200 MB video shows "Video too large…" and does NOT upload (RiffClient.exceedsUploadCap pre-check; server 413 backstop).
  • S10 — after sharing a photo/video, an "Open in Riff" notification appears; tapping it foregrounds Riff and the media attaches to the active session — no Shortcut, no manual reopen (ShareNotification.post from the extension; the tap is attributed to the host, where the .active drain attaches).
  • S11 — exactly-once across the tap: the media is injected once even though the tap and scenePhase == .active both fire a drain (in-flight Set<URL> + SharedImageInbox.newlyClaimable + remove-on-success).
  • S12 — graceful degradation: with notifications denied/undetermined, the deposit still lands and attaches on the next manual foreground — no hard dependency on the permission (post silently no-ops; the inbox + foreground drain still run).

Getting started (any user)

A fresh install (distributed build) opens to a guided onboarding flow (Onboarding/OnboardingView.swift), not the terminal — because a distributed build ships with no host/user baked in, so dropping straight into a terminal would just fail to connect. The flow walks you through:

  1. What you need — a Mac with Remote Login on, Tailscale on this phone and that Mac, and tmux + claude installed on the Mac.
  2. Authorize this phone — copy the on-device SSH public key (or the whole echo '<key>' >> ~/.ssh/authorized_keys command) and run it on the Mac.
  3. Host & user — your Mac's Tailscale name (your-mac.tailXXXX.ts.net) and your macOS account short name (whoami).
  4. Connect — a one-tap test (SessionManagement.listSessions, a cheap SSH round-trip) proves auth + reachability, with actionable errors ("Key not authorized — did you paste the public key…", "Can't reach the host — is Tailscale on…"). On success you land in the terminal.

The onboarding writes the same UserDefaults keys Settings edits — it's just a guided first pass. Settings remains the canonical editor, and Settings → Re-run setup clears the completion flag to return to the flow (for users who change Macs). The voice button is an optional power-up (see Voice server quickstart); skipping it leaves the terminal fully functional.

Mark's own dev build pre-seeds his host/user/start-dir and marks onboarding complete (Settings.seedDevDefaultsIfNeeded, #if DEBUG), so he goes straight to the terminal. Only DISTRIBUTED (Release/TestFlight) builds ship empty defaults and run onboarding.

OpenClaw quickstart

Already running an OpenClaw box? You've done Riff's prerequisites — a Mac on a tailnet with claude installed. Riff is a native iOS client for that same self-hosted Claude box, in a real terminal instead of a messaging bridge. The 30-second version:

  1. On the Mac your OpenClaw runs on, confirm Remote Login is ON (sudo systemsetup -setremotelogin on) — OpenClaw itself doesn't require it, so this is usually the one missing piece.
  2. In Riff → Settings → copy the SSH public key; on the Mac run echo '<key>' >> ~/.ssh/authorized_keys.
  3. In Riff set Host = your Mac's Tailscale name (the same *.ts.net OpenClaw reaches it by) and User = your macOS short name (whoami).
  4. Connect. You're now driving the same claude install OpenClaw uses, but in a real terminal.

In the app, the onboarding's "Already self-hosting (OpenClaw, etc.)? Quick setup" button jumps straight to those steps.

Riff does not use or require OpenClaw — it talks to the standard Claude Code CLI over plain SSH. If you stop running OpenClaw, Riff keeps working. OpenClaw users are the target audience, not a runtime dependency: no Riff code path imports, shells out to, or assumes OpenClaw. (Optional voice button → see Voice server quickstart; the terminal alone is complete.)

Voice server quickstart (optional)

The voice button (Action Button → speak → transcript types into the live claude terminal) is the one feature that needs a server — a tiny riff_server on your own Mac that runs ElevenLabs Scribe v2 on your clip. The terminal needs none of this.

Minimum setup (verified against riff_server.py boot requirements):

  • Two env keys in ~/.env: RIFF_SHARED_SECRET (any 32-byte hex — openssl rand -hex 32; fatal-at-boot) and ELEVENLABS_API_KEY (your own ElevenLabs key; /riff/transcribe-only returns 503 without it, but the server still boots). No APNs keys or .p8 — those are for the shelved chat path.
  • Install: ./install.sh --voice-only on the Mac. This validates ONLY those two keys (skipping the APNs/.p8 checks that the default install — tuned to Mark's box — would otherwise wall a voice-only stranger on), installs the server under launchd on port 8902, and prints the secret.
  • Pair: paste that secret into Riff → Settings → Voice server (stored in the iOS Keychain) and set the same Host.

Full walkthrough: server/QUICKSTART.md.

Voice is materially more setup than the terminal (a server process + an ElevenLabs account + a second secret). It's positioned as an optional power-up, not table stakes — the terminal alone is the product; voice is the differentiator on top.

Kept / shelved (the chat-era pieces)

Piece Disposition
RecordingViewModel + AudioFileWriter KEPT, reused as-is. Capture half of voice-inject is unchanged; only the sink moved. stopAndTranscribe() is the new transcribe-only variant of stopAndSend().
Action Button App Intent (RiffIntents.swift, RiffToggleBus, .riffToggle) KEPT, receiver repointed. The intent/shortcut/bus/notification are unchanged; the .riffToggle handler moved from ChatViewModel.toggleVoice to VoiceInjectController.toggle() (TerminalScreen is the receiver; ContentView pulls the Terminal tab front).
Scribe v2 transcription (POST /riff/audio) KEPT. A new POST /riff/transcribe-only shares the same Scribe helper but returns just {transcript} (no conversation write, no claude, no APNs).
ChatView / ChatViewModel / MessageStore SHELVED (files compile, unlinked). Removed from the TabView; MessageStore is still injected by RiffApp so a re-link Just Works. ChatView's own .onReceive(.riffToggle) is dormant (not in the tree).
riff_server.py chat endpoints (/riff/message, /riff/conversation, conversation store, multi-turn replay) SHELVED in place. Still run; the terminal never calls them. The shelved chat UI needs them if re-linked.
APNs push of replies SHELVED. The terminal shows replies live; nothing to push. Registration code stays dormant.
RiffWidget (weather widget) REMOVED (2026-05-28). Extracted into the standalone ~/tops/ app (its own project, bundle IDs, App Group group.mark.tops). Riff no longer ships a "Riff Weather" widget. The App Group group.mark.riff was retainedHostKeyStore (SSH known-hosts) and SharedImageInbox (RiffShare inbox) still use it.
HMAC shared-secret auth (X-Riff-Secret) KEPT for /riff/transcribe-only (and everything else).

Scope

Phase 1 (the whole iOS app): press the Action Button → Riff app launches into foreground (works while locked) → audio session opens in the background-audio mode → recording starts in onAppear → Mark talks for as long as he likes (no auto-send; pauses to think are free) → he taps Send (middle-left, thumb-reach) or presses the Action Button again → the phone uploads the recorded audio to the Mac mini over Tailscale → riff_server transcribes it via ElevenLabs Scribe v2 (server-side; the key never leaves the Mac) → server drops a riff poll event with the transcript and waits → reply comes back two ways in parallel: synchronous response body (instant in-app) + APNs push (readable on the lock screen if the phone has gone to sleep). No unlocking required at any point; the only button is Send.

The capture flow was reworked (2026-05-22): recording is manual-only (the old 1.2s silence auto-send was removed because it cut Mark off mid-thought and SFSpeechRecognizer had a hard ~60s on-device ceiling), and transcription moved off-device to ElevenLabs Scribe v2 for materially better accuracy. See Why server-side cloud STT below.

Phase 2 (watchOS): Apple Watch counterpart. Tap a complication or Smart Stack widget to start recording; transcript travels to Mac mini through the paired iPhone (or directly over Tailscale on a cellular Watch); reply lands as a Watch haptic + notification. Mark doesn't own a Watch yet — defer the build until hardware arrives. Xcode simulator won't help (no mic in sim).

Why no PushToTalk

An earlier draft proposed Apple's PushToTalk framework for true lock-screen hold-to-talk. Dropped because (1) the entitlement is gated on "VOIP communications" use cases and solo-AI-assistant apps historically get rejected, and (2) it doesn't actually solve a different problem from Phase 1. iOS does not surface Action Button press/release events to apps regardless of entitlement — the button is a "launch this thing" trigger, not a held-down switch. Phase 1's foreground-audio-session-with-silence-detect achieves the same workflow (talk while locked, reply on lock screen) without the Apple review risk.

Distribution (2026-05-24): now TestFlight-bound for the OpenClaw / self-host crowd. Riff was personal-only (Xcode-direct-to-device); it's now being prepared for distribution via TestFlight to people who already run a self-hosted Claude box (a Mac on a tailnet with claude). See Distributing Riff (TestFlight) and First-time setup. Still out of scope: cross-platform (Android), a hosted multi-tenant riff_server backend (so users needn't self-host the voice server), and a full public App Store release — the latter is gated on the hosted backend, because Apple's Guideline 2.1 (App Completeness) means a self-host-required app looks dead to a reviewer with no Mac mini. TestFlight does not face that bar the same way (testers are told to self-host), which is exactly why it's the right channel for the self-host era.

Architecture

iPhone (Chat tab / Action Button)
   │ type or press-to-talk
   ▼
┌─────────────────────────────────┐
│ Riff.app (SwiftUI chat client)  │
│  • ChatView: iMessage-style     │
│    bubble thread + compose bar  │
│    (mic-when-empty / ▲-when-text)│
│  • MessageStore: ordered thread │
│    persisted to a container JSON│
│  • voice: AVAudioEngine tap →   │
│    AAC/m4a file (mono 16kHz)    │
│  • signs body w/ shared key     │
└────────────────┬────────────────┘
        text │   │ voice
  POST /riff/message  POST /riff/audio (X-Riff-Conversation-Id)
   {conversation_id,   raw audio body, HMAC over the bytes
    text}              metadata in X-Riff-* headers
                 │ over Tailscale (no Funnel — tailnet only)
                 ▼
┌──────────────────────────────────────────────────────┐
│ Mac mini : riff_server.py:8902                        │
│  • verifies HMAC over the raw body                    │
│  • voice: POST audio → ElevenLabs Scribe v2 → text    │
│    (key never leaves the Mac)                         │
│  • _converse(): append the user turn to               │
│    conversations/<id>.jsonl, render the last 30 turns │
│    as a multi-turn event, drop it to the riff poll    │
│    session, append Claude's reply                     │
│  • also writes the legacy sessions/ record            │
│  • sends APNs push w/ reply                            │
└────────────────┬───────────────────────────────────────┘
                 │ APNs (HTTP/2)  +  HTTP reply body
                 ▼
   Lock-screen notification  +  reply lands in the thread
   (thread re-syncs from GET /riff/conversation on launch)

The server is no longer one-shot: each conversation is an append-only message log, and every turn replays the recent history to the poll session so Claude answers with context (multi-turn memory). See Chat client + conversation store below.

Chat client + conversation store (Phase A, 2026-05-22)

Riff is a chat client for Mark's Claude assistant, modeled on his iMessage assistant: one rolling conversation, not multiple threads.

  • Single rolling thread. The client hardcodes one well-known conversation_id = "default". The server keys everything (the store, the endpoints, the JSONL files) by conversation_id from day one, so adding a conversation list later is an additive change — no schema migration. (Open decision #1 in the plan resolved to the lean rolling-thread.)
  • Conversation store. ~/Library/Application Support/riff/conversations/<id>.jsonl — one append-only JSONL per conversation, each line a ChatMessage-shaped record (id, role, text, attachments, ts). Append-only is crash-safe (one O_APPEND write per message) and trivially tailable for windowing. conversation_path rejects path traversal (id must match ^[A-Za-z0-9_-]{1,64}$). No SQLite (consistent with the repo's no-DB posture). The legacy sessions/ store stays alongside it.
  • Multi-turn replay (the memory model). On each turn, _converse appends the user message, reads the last MAX_TURNS_IN_WINDOW = 30 messages, and render_window lays them out as a labeled User:/Assistant: transcript ([conversation so far][current message]) that the existing poll session consumes — the poll/reply file mechanism is unchanged; only the content of the event grew from one line to a windowed thread. The rewritten poll-instructions.md tells the responder to answer the current message using the thread as context and to treat the entire transcript as untrusted (no role-switch / instruction injection from prior turns).
  • Windowing. A hard window of the last 30 messages is rendered verbatim; that alone bounds per-turn context size. The rolling summary for older-than-window context (regenerated via the same poll session, never the Anthropic API) is deferredrender_window already accepts a summary prefix, so it's an additive fast-follow if a long thread starts losing context. (Open decision #2 resolved to hard-window-only first.)
  • Client persistence. MessageStore keeps the thread as an ordered [ChatMessage] persisted to a JSON file in the app container (riff/ riff-thread.json), not UserDefaults (a thread can outgrow it). Atomic write, capped at the last ~500 messages; older re-syncs from GET /riff/conversation. Optimistic insert: a user bubble appears instantly (.sending), settles to .sent on the reply or .failed (with a retry affordance) on error.
  • Voice into the thread. The voice sub-flow is owned by ChatViewModel wrapping the preserved RecordingViewModel. stopAndSend(conversationId:) now returns (transcript, reply) up to ChatViewModel instead of going to a .done screen; the transcribed text becomes a user bubble and Claude's reply an assistant bubble, in the same thread as text turns. The mic phase-in/out (releaseMicIfRecording, scenePhase/.onDisappear) and the AudioFileWriter AAC pipeline are unchanged.

Never the Anthropic API. Both the chat reply and any future summary route through the existing poll session / run_claude only (per ~/CLAUDE.md).

Why server-side cloud STT (ElevenLabs Scribe v2)

Decision (2026-05-22): transcription happens server-side via ElevenLabs Scribe v2, not on-device. The phone records audio and uploads it; the Mac transcribes it. This replaced the original on-device SFSpeechRecognizer path.

Two on-device failures forced the swap:

  1. The ~60s on-device session ceiling. SFSpeechRecognizer caps a single one-shot transcription at ~1 minute, which broke the "unlimited-length dictation" goal outright.
  2. Accuracy on technical jargon. On-device Speech mangled terms like "Kalshi", "Hyperliquid", "git rebase" — the exact vocabulary Mark dictates most.

Scribe v2 is the best mainstream STT API (~2.2% WER; the only other serious candidate was OpenAI gpt-4o-transcribe at ~4.1%). The trade-off is explicit and accepted: an audio upload + a network STT round-trip (and the latency/cost that implies) in exchange for materially better accuracy and no on-device model-download UX. Confirmed batch model id: scribe_v2 (verified live against POST /v1/speech-to-text, 2026-05-22). The realtime variant is the distinct scribe_v2_realtime (not used). Overridable via ELEVENLABS_STT_MODEL in ~/.env; fallback id is scribe_v1.

Batch, not streaming (v1). The whole clip uploads on Send and the server transcribes it in one Scribe call, then runs Claude, then returns the reply. There is no live transcript while speaking — only the waveform + a "Recording…" indicator, then "Transcribing…", then the reply. The user already waits on Claude's answer, so adding the STT latency to a wait he already tolerates is a small marginal cost for a large simplicity win (one HTTP POST, no WebSocket relay, no partial-result reassembly).

Streaming deferred (Phase 5+, not built). If the missing live preview turns out to bother Mark in practice, add scribe_v2_realtime via a WS relay through riff_server (live partials streamed back over a WS the app holds open) as a follow-up. It is a materially bigger build (a phone↔server↔ ElevenLabs WS relay) and a pricier model (~$0.39/hr vs ~$0.22–0.28/hr batch); spec it then, not now.

Privacy delta

Audio now leaves the device — to the Mac, then to ElevenLabs — whereas the old path transcribed on-device and only ~2KB of text left the phone. This is a deliberate, accepted trade for accuracy. The /riff/audio endpoint stays tailnet-only (no Funnel), so the audio→Mac leg never touches the public internet; only the Mac→ElevenLabs leg does (over TLS, exactly as newsfeed TTS already sends audio to the same account).

Why Tailscale, not Funnel

Funnel is internet-public; Riff would expose a microphone-attached endpoint to the world. Tailscale ACL keeps the endpoint reachable only from Mark's devices on the tailnet — the iPhone is already on it. That's the auth layer: device on tailnet = trusted. Layer a single shared HMAC secret on top so a stolen iPhone with the tailnet still joined can be revoked by rotating the secret server-side.

Reply path

APNs notification with the response truncated to ~250 chars (the iOS notification body limit). Tapping the notification opens the app to the chat thread, where the full reply is the latest assistant bubble (the thread re-syncs from GET /riff/conversation on launch).

For Mark's typical voice command ("what's on my calendar today", "send a text to Sarah saying I'm running late", "what did the kalshi calibration sweep find") the reply will fit in the notification.

Interface

iOS app

Screen Purpose
Chat iMessage-style conversation thread (ChatView) — the primary surface. A scrolling bubble thread (user right/accent, assistant left/gray; oldest top, newest bottom; auto-scrolls to the newest). A compose bar that mirrors iMessage: a multiline-growing TextField, a mic button when the field is empty and a send (▲) when it has text. Tapping the mic transforms the bar into a live waveform + Recording… with send/cancel inline; the server transcribes the clip (no live transcript) and the spoken text becomes a user bubble. A typing indicator (three dots) shows while a turn is in flight; a failed send shows a red "tap to retry" on the bubble. Empty state: "Start a conversation". The thread persists across launches (MessageStore, a container JSON file) and re-syncs from GET /riff/conversation on launch. The mic phases in/out — held only while the Chat tab is active, released on background / tab-switch (so a podcast keeps playing).
~~History~~ Removed. The chat thread is the persistent history Mark scrolls. HistoryView.swift / SessionStore.swift remain in the repo (unlinked) for one release, then get deleted in a later cleanup.
Terminal (bottom bar) The center Flying-V button: tap = one-shot dictation; long-press = enter/exit conversation mode (CallKit hands-free call — see Conversation mode). While in a call it shows a coral pulse. (Phase 3: "Hey Siri, call Riff" reaches the same mode.)
Terminal — + button Tap = new session with the shared defaults (instant; claude/empty harness is byte-identical to before). Long-press + release on the button = Close Session (preserved exactly — bar floods red while held). Long-press + slide ≥44pt off the button opens the New Session… menu: New Session… → a card to set the launch directory and harness (a single free-text command, default claude), both pre-filled from the shared store and persisted back on Create; Close Session (destructive) closes the current one. Baked at create time via NewSessionSpecRiffTmux.newDetached(cwd:harness:); a claude/empty harness create stays byte-identical. One shared store (riff.ssh.harness + riff.ssh.startDir), last-write-wins; Settings.recentStartDirs is a cwd-autocomplete affordance only. See New Session customization.
Settings Tailscale endpoint, shared secret, notification/push diagnostics, "Send test (ping)". The iOS Settings.app pane (Settings.bundle/Root.plist) is grouped into bare sections (no per-setting footer descriptions — deliberately uncluttered): Connection (Host / User), Voice (auto-submit dictation / Narrate Claude's replies, riff.voice.narrate, default OFF, see Narration (output voice)), Session (Harnessriff.ssh.harness, a single free-text command, default claude, the command a +-tap launches, shared with the double-tap sheet — + Directory, riff.ssh.startDir), and Claude (skip-permissions / Worktree default off, see Worktree / Effort free-text, default max / Tokens auto-compact window, default 200000, blank = Claude's default). Applied at NEW-session create time.

Mac mini server

POST /riff/messagechat text path (Phase A). JSON body {conversation_id, text, device_id}, HMAC over the bytes (stays under MAX_BODY = 8 KB). Routes through _converse: appends the user turn to conversations/<id>.jsonl, renders the last MAX_TURNS_IN_WINDOW = 30 messages as a multi-turn event, drops it to the riff poll session, appends Claude's reply, writes the legacy sessions/ record, fires APNs. Returns {reply, conversation_id, message_id, ts_reply}. Failure modes: empty text → 400; bad/traversal conversation_id → 400; bad HMAC → 401; Claude timeout → 504 (the timeout reply is still appended to the thread).

GET /riff/conversation?conversation_id=default&limit=200 — the conversation's messages oldest-first ({messages:[…], conversation_id}), so the client syncs the thread on launch. limit (default 200, max 1000) returns the last N. Bad/traversal id → 400.

GET /riff/narrate-poll?after=<epoch_float>&cwd_slug=<optional>output narration long-poll. The terminal's Claude REPL has no reply-text channel back to the app (the byte stream is a repainting TUI — unspeakable). So this endpoint tails the session transcript JSONL (~/.claude/projects/<cwd-slug>/ <session>.jsonl) instead of scraping the terminal: it locates the active transcript (newest-mtime *.jsonl, constrained to cwd_slug if given, else global-newest — robust to the Worktree feature's different slug and to a mid-session cd), finds the latest completed assistant turn newer than after (a line with type=="assistant", not isSidechain, message.stop_reason=="end_turn", and a non-empty text block — interim "let me check…" preambles and thinking blocks are skipped, so each turn is spoken once), strips it to speakable prose (drops code fences, inline code, URLs, file paths, #/@ refs, markdown markers; caps at NARRATE_MAX_CHARS = 1200), and synthesizes MP3 via ElevenLabs TTS. Holds up to NARRATE_POLL_HOLD_S = 25 s polling every 0.4 s. Returns 200 {ts, audio_b64, chars} (base64 MP3) on a hit, 204 if no new turn within the hold window (the client immediately re-polls with the same after), 503 if ELEVENLABS_API_KEY is absent at boot, 502 on a TTS non-200/timeout. HMAC over an empty body (a GET — the client signs Data(), like /riff/sessions and /riff/health). Because it returns only the single latest turn newer than the cursor, a backlog (two fast turns) collapses to "speak only the latest". Gated client-side by the riff.voice.narrate toggle (default OFF). See Narration (output voice).

POST /riff/audiovoice capture path. Body: the raw audio bytes (AAC/m4a), HMAC-signed over the exact bytes. Per-request metadata in headers: X-Riff-Device-Id, X-Riff-Session-Id, X-Riff-Ts, X-Riff-Audio-Format (m4a/wav), and optionally X-Riff-Conversation-Id. The server POSTs the audio to ElevenLabs Scribe v2 (multipart file + model_id, xi-api-key header), gets the transcript, then: - with X-Riff-Conversation-Id → routes the transcript through the same multi-turn _converse path as /riff/message (voice and text share memory), returning {reply, conversation_id, message_id, ts_reply, transcript} (the spliced-in transcript lets the client render the spoken user bubble without a refetch); - without the header → keeps its original one-shot behavior (writes only a sessions/ record, no conversation append), returning {reply, session_id, ts_reply} — back-compat for any caller on the old path. Body cap MAX_AUDIO_BODY = 25 MB (≈30 min mono-16kHz AAC), enforced per-path so text endpoints stay tight at MAX_BODY = 8 KB. Failure modes: empty body → 400; bad HMAC → 401; over 25 MB → 413; Scribe timeout/error → 502 (error session record written; Claude not called with an empty transcript); Claude timeout → 504; ELEVENLABS_API_KEY missing at boot → 503 (STT not configured). Scribe leg bounded by SCRIBE_TIMEOUT_S = 30, Claude leg by CLAUDE_TIMEOUT_S = 60 (worst-case server-held time ≈90 s).

POST /riff/transcript — body: {transcript, ts, device_id, session_id} → drops a riff poll event carrying the transcript, waits for the session's reply file, returns {reply, session_id}, fires APNs push. Kept for tests / back-compat / any future text client; the iOS app uses /riff/message and /riff/audio instead. This path is one-shot (no conversation memory).

GET /riff/sessions?since=<id> — paginated list of past sessions (legacy + chat). Each chat turn also writes a sessions/ record (now tagged with conversation_id), so this endpoint stays meaningful.

GET /riff/health — heartbeat.

All endpoints require X-Riff-Secret: <hmac> header. Unauthenticated requests → 401.

Action Button configuration — Riff App Shortcut (App Intent)

The Action Button is bound to the Riff App Shortcut — an AppIntent (RiffToggleIntent) auto-registered via AppShortcutsProvider (RiffShortcuts in RiffIntents.swift). iOS runs the intent's perform() in the app's process (foregrounding it via openAppWhenRun = true), which re-runs deterministically on every press. perform() posts the existing .riffToggle notification → ChatView calls vm.toggleVoice() and ContentView forces the Chat tab to the front. The toggle contract:

  • Not recording (idle / error / done / permission) → launch + start recording.
  • Already recordingsend (only if audio was captured; a stray press on a silent session is a no-op).
  • Already sending → no-op (a send is in flight).

There is no auto-stop: recording runs through arbitrarily long pauses and ends only on a Send tap, the Action Button (a second press = send), or Cancel.

Why an App Intent, not the old riff://toggle URL (fix, 2026-05-22): the URL scheme's .onOpenURL does not reliably re-fire when the app is already foregrounded, so a second Action-Button press (the send) did nothing once Riff was open — the confirmed second-press-to-send bug. An App Shortcut runs perform() in-process every press regardless of foreground state, so start → send is deterministic. The intent reaches the live ChatViewModel through the same .riffToggle notification Phase A already wired (a tiny RiffToggleBus posts it), so the intent and the retained URL fallback share one path.

Set up the Action Button (one-time): 1. Settings → Action Button → swipe to Shortcut → tap Choose a Shortcut. 2. Pick the auto-registered Riff shortcut ("Toggle Riff" / "Riff Toggle Recording"). It appears automatically after install — there is no manual Shortcut to build (unlike the old "Open URLs riff://toggle" flow this replaced).

That's it — no Shortcuts-app authoring step.

URL scheme retained as a fallback. riff://toggle (CFBundleURLTypes + RiffApp.onOpenURL) is kept as a zero-cost documented fallback; it posts the same .riffToggle notification, so it's a safety net if the App Shortcut ever needs re-adding. The App Intent is the primary Action-Button binding.

Locked-screen Face ID requirement (OS limit — documented, not engineered around). iOS requires a Face ID / passcode unlock to launch Riff from a locked screen, even via the App Intent (confirmed on-device 2026-05-22 — same as the URL scheme behaved). This is an OS boundary, not a Riff bug, and there is no Lock Screen widget / Control Center hack that bypasses the unlock for a foregrounding app. The App Intent's win is reliability when the phone is already unlocked / the app is foregrounded — which is exactly the broken second-press case it fixes.

Requested features — working checklist

Live backlog of Mark's requests (this batch opened 2026-05-27). Every actionable item ships with a test per Tests & the ship gate (global CLAUDE.md) — feature or bug — and is checked against this list before shipping. Status: [ ] queued · [~] in progress · [x] done.

  • [x] Both guitar marks version-controlledFlyingVShape.swift (in-app) + icon-1024.png (app icon, had been gitignored by *.png). Done 32b987b.
  • [x] One guitar mark, one source of truth — the in-app glyph and the app icon now derive from the SAME Swift source (FlyingVShape + the new RiffMark style view); the icon PNG is rendered from it (no longer hand-exported), so the fins-rounded look and stroke/dot proportions can't drift between surfaces. See Assets — the guitar mark (single source of truth). RiffMarkRenderTests + the FlyingVShapeTests single-source guards. Pending your on-device icon look.
  • [x] Dot in the in-app guitar — option A, baked into FlyingVShape (filled FlyingVDot layer) + the icon; tested (FlyingVShapeTests). 05b6a1c. Live in build 130 — pending your on-device look.
  • [x] Keyboard rise/fall animation — terminal + button bar track the keyboard in true LOCKSTEP. body wraps the stack in a GeometryReader, reads the keyboard's live safeAreaInsets.bottom each frame (→ keyboardLift), holds the stack full-height (size.height + lift, constant → no SwiftTerm reflow), .ignoresSafeArea(.keyboard) + .offset(y: -lift). No separate animation to diverge. Build 144, confirmed on-device "works very well". KeyboardLiftTests. (Dead-ends 139–143: manual .offset+.easeOut double-lifted / overtook the keyboard.)
  • [x] Dictation: duck other audio, don't stop it, don't reroute it — the real ask (the "full spectrum" wording was a misread). Recording used mode .measurement, which grabs the audio route exclusively → a podcast/music STOPPED when dictation started; switched to .default so other audio ducks + resumes. a55351e, build 133. (The rainbow LED I'd built was reverted, LED left as-is — 62b4fbe.) Build 158: also dropped .defaultToSpeaker + added .allowBluetoothA2DP so the ducked audio stays on AirPods instead of rerouting to the phone speaker (the side effect Mark hit). The intended non-call workflow: press → waveform → dictate → press to stop; other audio ducks (not stops), output unchanged, built-in mic (never grabs the AirPods HFP mic). Option set lives in RecordingViewModel.dictationCategoryOptions, guarded by DictationAudioOptionsTests.
  • [~] Auto-compact "Context limit" — root-caused + Riff-side fix done 2026-05-27 (pending Mark's confirm it fires). My earlier "NOT a Riff bug" call was WRONG (twice: first blamed value/semantics, then tmux env propagation). Ground truth from the 2.1.153 binary: CLAUDE_AUTOCOMPACT_PCT_OVERRIDE is parsed by jd8() into an internal testPctOverride field — it does NOT drive user-facing auto-compact. The real lever is CLAUDE_CODE_AUTO_COMPACT_WINDOW (a TOKEN count), read by Kl(); help text: "…takes precedence. Auto-compact summarizes the conversation when context usage approaches this limit." So the pct var was inert everywhere — the =20 on every claude process (Riff, poll, digest alike) came from Mark's ~/.zshrc:125 export (global), NOT from Riff. Two fixes: (1) Mark's ~/.zshrc:125 → change CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=20 to CLAUDE_CODE_AUTO_COMPACT_WINDOW=200000 (token budget; fixes laptop + Riff-by-inheritance + digest at once — new sessions only). (2) Riff ClaudeArgs.env() now emits CLAUDE_CODE_AUTO_COMPACT_WINDOW=<contextTokens> straight through (no ÷1M pct math); AutocompactArgsTests rewritten to lock the token var. CAVEAT: no Settings UI binds contextTokens yet, so Riff's own path is latent — the zshrc line is the live lever today (a Settings field is the obvious follow-up). Verify on device: /context shows "Auto-compact window: N tokens (from CLAUDE_CODE_AUTO_COMPACT_WINDOW)".
  • [x] Preserve shared-image format (stop JPEG compression) — original bytes pass through the picker, clipboard, and share extension, labeled by a magic-byte sniff (Shared/ImageFormat.swift); lossless PNG only as a last resort, never JPEG. Tested (ImageFormatTests). f348789. Live in build 131.
  • [x] Dictation button must not touch the keyboard — removed the record-start dismissKeyboard() in onInputTap; the mic only starts/stops recording now. Build 145.
  • [x] Dictation waveform shows (confirmed by Mark on build 152) — long saga, two real bugs: (1) my build-144 keyboard refactor dropped the voice.spectrum dependency through the GeometryReader/computed-property, freezing the LEDGrid (broke 144/147/149; 146 "worked" only because a debug overlay read voice there) → fixed build 150 with an invisible reactivity anchor that reads voice in the keyboard closure (see [[reference_riff_waveform_reactivity_anchor]]); (2) build 150 unfroze the UI, revealing the true blocker — setActive(true) intermittently throws "Session activation failed" so recording never starts → fixed build 151 (deactivate + retry once on activation failure). The wrongly-added stale-engine guard (147) was reverted (148).
  • [x] Dictation transcript ends with a trailing spaceinject() writes trimmed + " ". Tested (VoiceInjectTests). Build 148.
  • [x] Clickable hyperlinks — literal markdown links [label](url) are now tappable anywhere on their span, not just the bare-url substring. handleLinkTap falls back to markdownLinkURL(in:atColumn:) (scans the tapped line, built from getCharData in visible-row coords) when SwiftTerm finds no explicit/implicit link. OSC-8 + bare URLs keep working via the existing term.link(at:.explicitAndImplicit) path. Tested (MarkdownLinkTests). Build 149.
  • [x] Repositioning the cluster shouldn't press the buttons (confirmed by Mark on build 152) — sliding the cluster slid the photo / Flying-V buttons under the finger, so their own gestures saw ~0 movement and misfired as taps (accidental dictation / photo picker). Added a clusterDidReposition flag set when the cluster actually slides + gating the photo and Flying-V actions; cleared on the next runloop so the same-event button release stays suppressed. (sessionButton was already immune — it freezes the cluster on touch.) Gesture timing → on-device verification.

Cleared 2026-05-27 (per Mark — not pursuing)

  • App icon de-neon/flat guitar (a561a50, build 135 — shipped, cosmetic final-look approval dropped).
  • Center the pedalboard (12d924b, build 132 — shipped the padding bump; not chasing the terminal-gap further).
  • Decouple weather widget — full standalone split: DONE (2026-05-28). The widget + its LocationCache were extracted into a standalone app at ~/tops/ (own project, bundle IDs mark.tops / mark.tops.TopsWidget, App Group group.mark.tops) and removed from RiffRiffWidget/, Riff/LocationCache.swift, and RiffTests/WeatherLogicTests.swift are gone, the RiffWidget target + dependency dropped, and NSLocationWhenInUseUsageDescription removed (Riff no longer uses CoreLocation). The App Group group.mark.riff was retained for HostKeyStore + SharedImageInbox. The new app's first device install needs a one-time Xcode-GUI profile mint (new App IDs can't be created headlessly) — see ~/tops/README.md ▸ Signing & first install. Riff's next OTA drops the widget.

Assets — the guitar mark (single source of truth)

The Riff brand mark (the Flying-V silhouette) has one source: the Swift geometry in ios/Riff/FlyingVShape.swift (FlyingVShape + FlyingVDot) plus the shared style in ios/Riff/RiffMark.swift. Both surfaces render RiffMark, so they can't drift:

  • In-app glyph (the center-button mark, TerminalScreen.swift): RiffMark(ink: …).frame(width: 30, height: 36) — transparent, tight bbox fit (the silhouette fills the frame). ink is white idle, conversationCoral in-call.
  • Home-Screen app icon (Assets.xcassets/AppIcon.appiconset/icon-1024.png): rendered by RiffMark(drawBackground: true) — opaque #18160f background, the padded authoring-box framing (the full 512 author box maps into the 1024 square, so the V keeps its natural margin and does not touch the edges, matching the original icon).

Stroke and dot are sized in 512 authoring units (RiffMark.strokeAuthor ≈ 13.4, FlyingVDot r = 14) and multiplied by FlyingVShape.fitScale(in:), so the SAME author weight reproduces ~1.2pt over the 30×36 glyph and ~26.8px over the 1024 icon. The two fins (vertices 5/7) are rounded via FlyingVShape.finRound = 26; the headstock tip and inner notch stay sharp.

The icon PNG is NOT hand-maintained — it is rendered from the Swift source:

cd ios && ./render-icon.sh        # rebuild + run RiffIconGen → overwrites icon-1024.png
git diff --stat                   # should show only icon-1024.png changed
# review the PNG, then commit it together with any FlyingVShape/RiffMark change

render-icon.sh runs xcodegen generate, builds the RiffIconGen macOS type: tool target (a CLI tool — no code-signing/provisioning, so it runs headless over SSH; CODE_SIGNING_ALLOWED=NO), and runs it. The tool compiles the SAME FlyingVShape.swift + RiffMark.swift as the app, rasterizes via ImageRenderer at 1024 px (AppKit NSHostingView fallback if cgImage is nil), and writes an opaque 8-bit RGB (no-alpha) PNG to match the asset's format. The RiffIconGen scheme is separate from Riff, so ./test.sh (iOS) never builds it.

Rule: after editing FlyingVShape.swift or RiffMark.swift, re-run render-icon.sh and commit the regenerated icon-1024.png in the same change. The ~/www/riff-icon-*.html / riff-fin-rounding.html pages are previews only (annotated as such) — they re-encode the geometry by hand and ship nothing.

Testing

A simulator-only Swift Testing harness (ios/RiffTests, target RiffTests) — no physical device, no real SSH, no Mac tmux, no network, no ElevenLabs. Built with Swift Testing (import Testing, @Test/#expect), not XCTest; the toolchain (Xcode 26.4.1 / Swift 6.3.1) ships Testing.framework for the iOS simulator and xcodebuild test auto-discovers @Test in the bundle.unit-test target. The bundle is hosted in Riff.app (TEST_HOST/BUNDLE_LOADER) so @testable import Riff resolves and the geometry test gets a real UIWindow.

Policy (per global CLAUDE.md ▸ Tests & the ship gate): every change adds a test here — new features and bug fixes alike. A bug-fix test is red without the fix, green with it; a feature test exercises the new behavior. Added in the same change. And ./test.sh must pass before shipping any build (/riff-ota, /riff-update, /riff-publish) — a green suite is a deploy precondition.

What's covered

  • Logic (SessionManagerTests, SessionControllerTests, LinkNormalizationTests): session naming/ordering (base first, deduped, stable), nextSessionName reuse of a freed middle name, the soft cap, page clamp + active-input-dirty sync, the setGeometry degenerate guard + fan-out to every session, closeCurrent navigation (left / clamp-to-zero / recreate base on last close), input-dirty tracking (printable → dirty, CR/LF → clean, escape sequences ignored), the scroll-wheel SGR byte sequences (ESC[<64;1;1M up / ESC[<65;1;1M down, capped at 8 ticks), the net scroll-depth / copy-mode-state model (scrollState: wheel-up arms, wheel-down floored at 0, depth==0 ⇒ back at the live bottom so no stray q — the scroll-up-then-down case), the stray-q leak guard (scrolledUpCancelIsTheSafeKeyNeverBareQ, desyncedScrollDepthStillEmitsSafeCancelNeverBareQ: the emitted copy-mode cancel is the tmux-bound safe key F12 = \u{1b}[24~, never a bare q/0x71, even when the scroll counter has desynced past the live bottom), and link normalization. Driven through the public surface with an injected MockSessionManagement (canned, synchronous — no SSH) plus a RecordingTransport (captures emitted bytes).
  • Geometry G0 (PagerGeometryTests): hosts PagerHostVC in a UIWindow with 2–3 unconnected mock sessions and asserts the Auto-Layout-pin invariant — every page fills the pager bounds and every terminal fills its page — at BOTH a keyboard-down (393×852) and a keyboard-up (393×516) window height.
  • Input routing (PagerGeometryTests.firstResponderFollowsActivePageOnCommit): with the keyboard up, committing a swipe must hand FIRST RESPONDER to the incoming session (SwiftTerm routes typed bytes there) — guards the build-129 "typing always goes to the first session" bug. Drives commitPageChange directly, so no pan synthesis is needed.
  • Guitar mark single source of truth (FlyingVShapeTests, RiffMarkRenderTests — see Assets — the guitar mark): the two fins round while the rest stays sharp (finsAreRoundedButTheRestStaysSharp), fitScale is the bbox uniform scale, the shared strokeAuthor reproduces the 1.2pt in-app glyph weight, the stroke scales with the mark (icon ≈ 26.8px), the icon framing keeps the authoring margin (no bbox-fill), and the icon consumes the SAME shape + dot as the glyph. The render smoke (RiffMarkRenderTests) rasterizes RiffMark via ImageRenderer on the sim and asserts a 1024×1024 NON-blank image (a real mix of #18160f bg + white ink) — the in-sim twin of what the RiffIconGen tool ships, with no repo write/signing.

How to run

cd ios && ./test.sh                                  # all tests, default sim
SIM='iPhone 17' ./test.sh                            # override the simulator
./test.sh -only-testing:RiffTests/PagerGeometryTests # forward extra args
make test                                            # Makefile parity

CI / copy-paste (the raw command test.sh runs):

xcodebuild test -project ios/Riff.xcodeproj -scheme Riff \
  -destination 'platform=iOS Simulator,name=iPhone 17 Pro'

test.sh runs xcodegen generate first (so the target exists on a fresh checkout — Riff.xcodeproj is gitignored) and depends on no output formatter (xcbeautify/xcpretty are not installed). A clean run includes a SwiftTerm + swift-nio-ssh build and a simulator boot — it is multi-minute.

The honest limit (load-bearing). The original builds-114→117 bug was a SwiftUI-hosting timing failure — the UIViewControllerRepresentable embedding missed the keyboard-driven layout pass that re-ran the manual framing code, collapsing the terminal to a ~12-row sliver. PagerGeometryTests guards the framing math + the Auto-Layout-pin invariant (given a layout pass, the terminal fills its page at any bounds) — it cannot reproduce the timing bug, because a unit test that calls layoutIfNeeded() itself forces the very pass that was being missed. The complementary on-device guard is a #if DEBUG assert in PagerHostVC.layoutPages (deferred one runloop tick so it reads settled bounds, gated on !settling && axis == .undecided) that trips if the visible terminal doesn't fill the pager — stripped in Release, so it only ever fires in a DEBUG/dev build on device, exactly where a regression would surface.

Future: XCUITest end-to-end (NOT built). True coverage of the keyboard-up render needs an XCUITest launching the app with SSHTransport swapped for a scripted in-process MockTransport (feeds canned bytes, never touches the network), selected via a launch arg / env (e.g. RIFF_TEST_MODE=1) read in RiffApp/SessionController. TerminalTransport is already the seam; what's missing is the launch-arg selection + a deterministic mock + XCUITest flake-management. It would raise the keyboard, screenshot, and assert the terminal still fills the area above the keyboard. Materially larger than this harness — deferred until the geometry bug recurs in a way the DEBUG assert doesn't catch. (XCUITest is the only reason XCTest would re-enter this repo, and only for that one tier.)

SourceKit noise. In-editor diagnostics here lie — "No such module UIKit"/"No such module SwiftTerm"/"cannot find type" are all false positives. A phase is green only when xcodebuild test exits 0 and prints ** TEST SUCCEEDED **, never on the absence of editor squiggles.

Repo layout

riff/
├── README.md                       # this file (spec + install + workflow)
├── install.sh                      # Mac mini server install + ios bootstrap
├── ios/
│   ├── project.yml                 # XcodeGen manifest (Riff + RiffShare + RiffTests iOS targets + RiffIconGen macOS tool; Riff + RiffIconGen schemes)
│   ├── Makefile                    # `make project`, `make sim`, `make sim-run`, `make test`
│   ├── test.sh                     # run the RiffTests suite on the simulator (SIM= override; forwards args)
│   ├── render-icon.sh              # render icon-1024.png from the Swift source via the RiffIconGen tool (headless, no signing) — see Assets
│   ├── Riff.xcconfig.example       # committed; copy to Riff.xcconfig
│   ├── Riff.xcconfig               # generated by --bootstrap-ios; gitignored
│   └── Riff/
│       ├── RiffApp.swift           # @main; injects MessageStore + SessionStore; defines Notification.Name.riffToggle; riff://toggle fallback handler
│       ├── RiffIntents.swift       # Action Button App Intent (RiffToggleIntent) + AppShortcutsProvider (RiffShortcuts); RiffToggleBus posts .riffToggle in-process
│       ├── AppDelegate.swift       # APNs token capture + register-device POST
│       ├── ContentView.swift       # TabView: Chat / Settings (History removed)
│       ├── ChatView.swift          # iMessage-style thread + compose bar (mic/▲ swap)
│       ├── ChatViewModel.swift     # send orchestration (text + voice), optimistic insert, sync
│       ├── MessageStore.swift      # ordered [ChatMessage], container-JSON persistence (atomic, ~500 cap)
│       ├── RecordingViewModel.swift# AVAudioSession + AVAudioEngine tap → AAC/m4a file (AudioFileWriter); stopAndSend returns (transcript, reply)
│       ├── NarrationController.swift# output voice: long-polls /riff/narrate-poll, AVAudioPlayer + ducking session; owned by TerminalScreen
│       ├── Conversation/           # hands-free CallKit conversation mode
│       │   ├── CallController.swift          # CXProvider/CXCallController wrapper; owns the call + the call-owned AVAudioSession (callOwnsAudioSession)
│       │   ├── ConversationController.swift  # orchestrator: ties the call to narration + voice + speech; single funnel for both entry points; half-duplex gate
│       │   └── SpeechTurnController.swift     # Phase 2: on-device SFSpeechRecognizer + AVAudioEngine + NLTagger endpointing; per-turn lifecycle, turn-gated
│       ├── RiffClient.swift        # HMAC-signed POST + actor wrapper (postMessage / fetchConversation / postAudio / pollNarration)
│       ├── SessionStore.swift      # legacy; unlinked from ContentView (kept one release)
│       ├── HistoryView.swift       # legacy; unlinked from ContentView (kept one release)
│       ├── SettingsView.swift      # host/secret/version + push state + test
│       ├── Settings.swift          # static config from Bundle.main
│       ├── Riff.entitlements       # aps-environment: development
│       ├── Info.plist              # UIBackgroundModes: audio, remote-notification, voip (voip required for CallKit conversation mode)
│       ├── FlyingVShape.swift      # the Flying-V geometry (FlyingVShape + FlyingVDot + fitScale) — single source for BOTH the glyph and the icon
│       ├── RiffMark.swift          # shared style view: stroke (author units × fitScale), ink, dot, optional dark bg; tight glyph fit vs padded icon framing
│       ├── Terminal/               # SSH+tmux terminal surface: SessionManager (+ attachSharedImage), SessionController, SessionManagement (+ SessionManaging), SessionPager (PagerHostVC/SessionPageVC), SSHTransport, TerminalTransport
│       └── Util/Hex.swift          # Data <-> hex helpers
│   ├── Shared/
│   │   └── SharedImageInbox.swift  # App Group share-inbox contract; compiled into BOTH Riff and RiffShare
│   └── RiffShare/                  # Share Extension target (mark.riff.share): image/video → App Group inbox → host drains
│       ├── ShareViewController.swift  # programmatic principal class; deposits the shared JPEG/movie, completeRequest fast
│       ├── Info.plist              # NSExtension (com.apple.share-services, image-only activation rule)
│       └── RiffShare.entitlements  # App Group only (group.mark.riff)
│   ├── RiffIconGen/                # macOS CLI tool (type: tool, no signing): renders icon-1024.png from FlyingVShape + RiffMark
│   │   └── RiffIconGen.swift       # @main ImageRenderer @ 1024px → opaque RGB PNG (AppKit NSHostingView fallback); NOT named main.swift (would force top-level-code mode)
│   └── RiffTests/                  # Swift Testing unit suite (simulator-only)
│       ├── SessionManagerTests.swift       # naming/paging/geometry-guard/close
│       ├── SessionControllerTests.swift    # input-dirty / scroll-wheel SGR / link-norm
│       ├── PagerGeometryTests.swift        # G0 Auto-Layout-pin invariant in a UIWindow
│       ├── FlyingVShapeTests.swift         # fin-rounding + single-source guards (fitScale, stroke proportionality, icon framing)
│       ├── RiffMarkRenderTests.swift       # ImageRenderer smoke: RiffMark → 1024² non-blank (bg + ink) on the sim
│       └── Mocks/                          # MockSessionManagement (canned, no SSH) + RecordingTransport (captures bytes)
├── server/
│   ├── riff_server.py              # aiohttp endpoint on :8902 (chat store + multi-turn replay)
│   ├── poll-instructions.md        # riff poll session contract (multi-turn responder)
│   ├── apns.py                     # HTTP/2 + .p8 token auth
│   └── tests/
│       ├── test_riff_server.py     # HMAC, claude stub, APNs mock, conversation store + windowing
│       ├── test_narrate.py         # strip_to_prose, transcript tail, /riff/narrate-poll (stubbed synth)
│       ├── test_apns.py            # JWT signing, header shape, truncation
│       ├── fixtures/sample_transcript.jsonl  # representative end_turn / interim / sidechain lines
│       └── manual_apns_smoke.py    # real-network smoke (--yes-real-apns)
└── LaunchAgents/
    └── com.mark.riff-server.plist  # KeepAlive, RunAtLoad

Dependencies

iOS

  • iOS 17+ (Action Button requires iPhone 15 Pro / 16 / 17 line; iOS 17 ships the modern Action Button API).
  • Xcode 15+.
  • Frameworks: AVFAudio / AVFoundation (mic capture + AAC encode via AVAudioFile + AVAudioConverter), CallKit (conversation mode — outgoing call + the call-owned audio session), Speech (SFSpeechRecognizer on-device capture, conversation mode only — see below), NaturalLanguage (NLTagger function-word endpointing in conversation mode), UserNotifications, Network (Tailscale resolution), Crypto + Security (on-device ed25519 keygen + Keychain storage for the SSH identity), WatchConnectivity (Phase 2).
  • Speech scope: normal tap-dictation transcribes server-side via ElevenLabs Scribe v2 (moved off-device 2026-05-22, because on-device STT had a ~60s continuous ceiling). Speech is back only for hands-free conversation mode, where a fresh SFSpeechRecognizer request per turn dodges that ceiling.
  • Phase 3 (Siri "call Riff") will add SiriKit Calling surface (com.apple.developer.siri + an Intents extension + NSSiriUsageDescription); not yet built. PushKit is NOT used (deferred VoIP-wake).
  • SwiftPM dependencies (added 2026-05-22 for the terminal):
  • SwiftTerm (github.com/migueldeicaza/SwiftTerm, MIT, pinned from: 1.13.0) — terminal emulator + UIKit TerminalView (feed(byteArray:) to paint, TerminalViewDelegate.send to capture keystrokes; auto-provides the Esc/Ctrl/Tab/arrows keyboard accessory row).
  • SwiftNIO SSH (github.com/apple/swift-nio-ssh, Apache-2.0, pinned from: 0.13.0) — pure-Swift SSH client driving the interactive PTY into tmux. No C dependency, no OpenSSL.
  • Both are permissively licensed by design (Riff may be sold; see Why mosh is deferred). mosh is NOT a dependency — it is GPLv3+ and was dropped.
  • Build note (Xcode 26): SwiftTerm bundles Metal shaders, so a device build needs the Metal Toolchain component (xcodebuild -downloadComponent MetalToolchain, one-time). Without it the build fails at CompileMetalFile Shaders.metal (all Swift still compiles).

Mac mini

  • Python 3.11+ (matches ~/agents/ standard).
  • aiohttp for the server endpoint and the outbound multipart POST to ElevenLabs Scribe v2 (already a dep; no new package — no elevenlabs Python SDK).
  • httpx[http2] for APNs HTTP/2 push.
  • pyjwt[crypto] for the APNs JWT (the .p8 key flow).
  • A running riff poll session (from poll-bringup; transcripts are dropped as events and the server waits for the reply file, per Mark's "never call Anthropic API" rule — rides the Claude Code subscription).
  • ElevenLabs Scribe v2 for STT — reached over the public internet via TLS, billed to the same account that funds newsfeed TTS (same ELEVENLABS_API_KEY). See Cost & latency below.

Apple Developer Program

  • Membership active (the dev email Mark received).
  • APNs auth key (.p8) — generated once in Apple Developer console, stored at ~/.ssh/apns_riff.p8 mode 0600. Not committed.
  • Bundle ID registered under Mark's team.
  • Push Notifications capability + Background Modes: audio on the Riff target.

Secrets

Per the global CLAUDE.md, all secrets in ~/.env:

RIFF_SHARED_SECRET=<32-byte hex>
APNS_TEAM_ID=<10-char alphanumeric>
APNS_KEY_ID=<10-char alphanumeric>
APNS_BUNDLE_ID=mark.riff
APNS_PRIVATE_KEY_PATH=/Users/mark/.ssh/apns_riff.p8
ELEVENLABS_API_KEY=<elevenlabs key>          # SHARED with newsfeed TTS — same var
# ELEVENLABS_STT_MODEL=scribe_v2             # optional override (default scribe_v2)
# RIFF_NARRATE_TTS_MODEL=eleven_multilingual_v2  # optional: narration TTS model
# RIFF_NARRATE_VOICE_ID=21m00Tcm4TlvDq8ikWAM     # optional: narration voice (default Rachel)

Server reads these on boot.

RIFF_SHARED_SECRET in distributed (TestFlight) builds is USER-ENTERED, not baked. Baking Mark's secret into a distributed .ipa would hand every tester a working credential to his riff_server. So:

  • Mark's local dev build (./install.sh --bootstrap-ios) bakes the real secret + host into ios/Riff.xcconfig (gitignored) for one-tap use.
  • A distributed build ships an EMPTY RIFF_SHARED_SECRET / MAC_MINI_HOST (see /riff-publish, which refuses to publish if Mark's real values are baked). A voice-button user pastes the secret their own riff_server printed (./install.sh --voice-only) into Settings → Voice server, where it lives in the Keychain (Terminal/HMACSecretStore.swift, service mark.riff.hmac) — it IS a secret, unlike the public host key.
  • Settings.sharedSecret reads the Keychain value first, falling back to the baked xcconfig value, so both paths work without code changes. The terminal needs no secret at all; only the voice button does.

ELEVENLABS_API_KEY is the same var newsfeed's TTS already usesriff_server.py's load_dotenv() injects ~/.env, so the server picks it up with no new plumbing. Reused verbatim (do not invent a new key name). If it's absent, /riff/audio returns 503 (STT not configured) but the rest of the server (health, text path) stays up — unlike RIFF_SHARED_SECRET, which is fatal at boot. The key never leaves the Mac; only the Mac→ElevenLabs leg sends it (as the xi-api-key header, over TLS).

The same ELEVENLABS_API_KEY powers output narration (TTS) — no extra key. Two optional overrides tune the narration voice, both with safe defaults so the feature works out of the box: - RIFF_NARRATE_TTS_MODEL — the ElevenLabs model_id (default eleven_multilingual_v2, a currently-shipping model; eleven_v3 works on this account too but access varies by account, so the committed default stays on multilingual_v2). A model the account can't use returns a non-200 → the poll surfaces 502 quietly, no crash. - RIFF_NARRATE_VOICE_ID — the TTS voice_id (default 21m00Tcm4TlvDq8ikWAM, the ElevenLabs stock "Rachel" voice). STT needs no voice, so there was no pre-existing Riff voice; swap this to a preferred clone via ~/.env.

Narration (output voice)

The input half of voice (Scribe STT → terminal) has a mirror: output narration speaks each completed Claude turn aloud via ElevenLabs TTS. It's opt-in, OFF by default (riff.voice.narrate in iOS Settings ▸ Voice) — it's loud and costs credits.

Why tail the transcript, never scrape the terminal. Riff renders the live claude CLI through SwiftTerm — a repainting TUI of ANSI escapes, box-drawing, a spinner, and a token counter. That byte stream is unspeakable, and there's no reply-text channel from the terminal back to the app. So the server reads the session transcript JSONL that claude writes to ~/.claude/projects/<cwd-slug>/<session>.jsonl — structured, one line per content block — and extracts the clean prose of the latest completed (stop_reason=="end_turn") assistant turn. This sidesteps ANSI parsing entirely and gets the exact text Claude emitted.

Flow. NarrationController (@MainActor, owned by TerminalScreen beside VoiceInjectController) runs a long-poll loop against GET /riff/narrate-poll, seeding its cursor to "now" at start() so it never speaks history already on screen. On a hit it advances the cursor past the turn's timestamp before playing (so the next poll can't replay it; two fast turns collapse to latest-wins), then plays the MP3 with AVAudioPlayer. The audio session uses .playAndRecord + .duckOthers (no .allowBluetooth, same rationale as RecordingViewModel) so a show on another device ducks rather than stops, and deactivates with .notifyOthersOnDeactivation afterward to restore it.

Coordination with the mic. Both narration and recording use .playAndRecord, so TerminalScreen drives narration.setRecording(_:) off voice.status: while recording/transcribing, narration pauses its loop and releases the session so RecordingViewModel can claim .measurement mode cleanly. Narration is also interrupted (skip()) the moment Mark acts — submitting a prompt or starting to dictate — since he's moved on. On background the loop stops; on foreground it restarts if the toggle is on. The toggle is read live each iteration, so flipping it takes effect on the next poll without an app relaunch.

Phase 1 ships whole-message (non-streaming) REST synthesis — simple, and a ~1.5 KB turn returns in a couple seconds. Sentence-chunked WS streaming (to cut start latency) is a deferred Phase 4, built only if the latency annoys.

Conversation mode (CallKit, hands-free) — drive with the phone locked

Hands-free, in-car use: conversation mode presents as a real outgoing system phone call (CallKit) held open for the whole session. A live call is the legitimate iOS mechanism that grants locked-screen background audio, a live mic while locked, CarPlay/Bluetooth routing, and native call controls (mute/end). It is user-initiated, not a keep-alive daemon — the app suspends normally when no call is active.

Two entry points → one identical loop. (1) Long-press the center Flying-V guitar button (tap is still one-shot dictation; long-press toggles the call). (2) "Hey Siri, call Riff" (Phase 3, via the SiriKit Calling domain — the route that can start the call from the lock screen without an unlock; not yet built). While in a call the center button shows a slow coral pulse.

The loop is turn-gated and half-duplex (no barge-in in v1): 1. Call starts → it's your turn → the mic opens. 2. You speak; an on-device endpoint detector finalizes the utterance (below). 3. The transcript is injected into the active session AND auto-submitted (conversation mode always submits — you can't edit while driving). 4. Claude's reply narrates (the call's output). The mic is CLOSED while narration plays. 5. Narration ends → the mic auto-reopens → repeat.

Capture = Apple on-device Speech (SpeechTurnController): SFSpeechRecognizer + SFSpeechAudioBufferRecognitionRequest with requiresOnDeviceRecognition = true, fed by an AVAudioEngine input tap, streaming partial results. A fresh recognition request per turn (started on mic-open, stopped on endpoint) sidesteps the on-device recognizer's continuous-duration ceiling — the same ceiling that sank the chat-era on-device STT (which is why normal tap-dictation uses the ElevenLabs Scribe batch path; conversation mode is the one place Apple Speech is used, precisely because per-turn requests dodge the limit). Speech needs only NSSpeechRecognitionUsageDescription + a runtime requestAuthorization — no portal capability.

Endpointing is on-device, NO LLM, zero API cost. While listening, track time since the last newly-recognized words. On a ~3s pause, run NLTagger(.lexicalClass) on the final token: if it's a conjunction/preposition/determiner/filler ("and", "to", "the", "because", "um", …) you're mid-thought → keep listening; otherwise finalize. A ~6s total-silence backstop finalizes regardless (a complete clause you paused on). No wake word, no cue. (These thresholds measure time-since-last-recognized-WORD, not acoustic silence — the on-device recognizer reports partials with lag — so they run generous; a too-eager cutoff means bump pauseSeconds. Tunable via a Settings dial if 3s isn't the sweet spot.)

The single call-owned audio session (the load-bearing rule). Under an active call the audio session is owned by the call — CallKit sets the category (.playAndRecord / .voiceChat / .allowBluetooth) and activates it in provider(_:didActivate:); the app never calls setActive/setCategory while the call owns it. ConversationController pushes a callOwnsSession flag into NarrationController, RecordingViewModel, and SpeechTurnController; all three short-circuit their own session setup/teardown while it's true. This is the build-96/97 dictation regression (guard !recording) generalized — "the call owns the session, nobody else deactivates it." SpeechTurnController has zero setActive/setCategory calls; it only runs its engine on the already-active call-owned session. Narration is forced on for the call (ignores the global riff.voice.narrate toggle — a silent "call" makes no sense).

voip background mode is required. On-device, CallKit only activated the call's audio session (fired didActivate) once voip was added to UIBackgroundModes (build 100); with audio alone the call brought up no audio. voip is present and load-bearing. It does not mean Riff rings you unprompted — we register for no VoIP pushes (PushKit wake is deferred, not built).

App Store note. CallKit-for-an-AI-assistant carries real, irreducible review risk (Apple may decide it isn't a "genuine" call). Defenses: real two-way audio, user-initiated only, honest call metadata, clean lifecycle. Fallback is to gate it behind a disclosure or ship it TestFlight/sideload-only. Unknowable in advance.

Requirements (regression checklist)

Conversation mode shares the bottom-bar gesture surface with dictation, sessions, and the photo button, and it coordinates three audio controllers — so changes regress each other easily (e.g. the long-press gesture once froze cluster dragging). Check every change to TerminalScreen's bottom bar or the Conversation/ controllers against this list before shipping. Each item is verified on-device (CallKit / Speech / locked behavior can't be tested in the simulator).

Bottom-bar gestures (one shared cluster): - G1 — tap the guitar = one-shot dictation (Scribe path), exactly as before conversation mode existed. - G2 — long-press the guitar (~0.5s, near-stationary) = toggle conversation mode on/off. - G3 — the cluster stays draggable left/center/right, INCLUDING by grabbing the guitar. A horizontal drag repositions and must NOT be misread as a tap/long-press, and must NOT freeze. (Build-103 regression: the guitar's press handling must not set the cluster-freeze flag on touch-down.) - G4 — vertical swipe on the bar hides (down) / raises (up) the keyboard. - G5 — + tap = new session (global defaults); hold + release ON the button = Close Session (preserved exactly); hold + slide ≥44pt OFF the button = open the New Session… menu (New Session… → cwd+harness sheet; Close Session — destructive); tap = send, hold = clear; photo = attach. Unchanged by conversation mode. growWork only floods the bar red at the 0.5s threshold — it no longer pops the menu. The classification happens on RELEASE (sessionButtonRelease): on-button hold (moved < 44) → .closeSession (defers past the red-flood fade, as before); slid-off hold (moved ≥ 44) → .presentMenu (no manager mutation, fires immediately). The dirty- hold (.clear) still defers past the fade. - Which glyph (+ vs ) — the dirty signal. inputDirty flips to (send) when input lands on the line — a printable byte through write(), or a non-empty dictation transcript — and back to + (new session) on submit (CR/LF). An empty / whitespace-only dictation does NOT flip it (build 174-era fix): VoiceInjectController.inject trims and writes nothing on empty input, so the dictation-end no longer marks dirty unconditionally — no on an empty box. Known limitation (unchanged): manually backspacing a line empty still leaves it , since the heuristic watches bytes sent, not Claude's input widget. - G6 — tapping a link is keyboard-neutral. Tap a URL / markdown link in the terminal: it opens in Safari and the keyboard ends in the same state it started (down stays down, up stays up). SwiftTerm's simultaneous focus-tap would otherwise raise it; the bridge snapshots keyboard state at touch-down (shouldReceive) and resigns first responder iff it wasn't already up (dismissKeyboardAfterLinkTap).

Conversation call (CallKit): - C1 — entering places a real CallKit call (native call UI); the guitar shows a coral pulse while mode == .active. - C2 — narration plays while the phone is LOCKED, routed to CarPlay/Bluetooth. - C3 — native end + mute, and a second long-press, end/mute the call; on end the app suspends, mic released, narration stopped. - C4 — voip is in UIBackgroundModes (required — without it CXCallController.request fails Code 1 "unentitled" and nothing happens). - C5 — no setActive/setCategory while the call owns the session (Narration/Recording/SpeechTurn all gate on callOwnsSession). Static-grep on every audio change. - C6 — no crash on the mic-reopen after narration (validate the input format before installTap; clear any stale tap first — build 102).

Hands-free turn loop (Phase 2): - T1 — turn-gated, half-duplex: mic open only while narration is NOT speaking; auto-opens after narration ends and at call start. No barge-in in v1. - T2 — capture is on-device SFSpeechRecognizer (conversation mode only; tap-dictation stays on Scribe), a fresh request per turn. - T3 — endpoint = NLTagger function-word check after a ~3s pause + ~6s silence backstop. No LLM / claude -p, no stop word, no wake word. - T4 — finalize injects AND submits. The Return is a SEPARATE, ~0.3s-delayed keystroke — an inline \r in the same burst as the text is swallowed by Claude Code's input as a literal newline (lands the text without submitting). - T5 — a manual tap-dictation mid-call releases the hands-free mic (the two AVAudioEngines must never contend for the hardware input). - T6 — the first conversation prompts for Speech Recognition permission.

Cost & latency (Scribe v2)

  • Cost: Scribe v2 batch$0.22–0.28 per hour of audio (the realtime variant scribe_v2_realtime, not used, is ≈$0.39/hr). For Mark's usage — short voice commands, seconds to a couple minutes — this is fractions of a cent per request; a 30-second command ≈ $0.002. Billed to the existing ElevenLabs account that funds newsfeed TTS.
  • Latency (per request): upload + STT + Claude. The upload is a small AAC clip over the tailnet (sub-second for typical clips); Scribe v2 batch returns a short clip in a few seconds; Claude is the existing wait (bounded by CLAUDE_TIMEOUT_S = 60). Net: the perceived wait grows by the upload + Scribe time (a few seconds for normal commands) on top of today's Claude-only wait — the explicit trade for ≈2.2% WER.

Install

Two install entry points, both run from ~/riff/:

# Mac mini server: deps, ~/bin symlinks, launchd job, env validation.
./install.sh

# iOS bootstrap: writes ios/Riff.xcconfig from ~/.env, then xcodegen.
./install.sh --bootstrap-ios

# Health check (HMAC-signed curl to /riff/health).
./install.sh --health

# Tear down launchd + ~/bin symlinks (keeps Application Support state).
./install.sh --uninstall

The default install prints a final status table (env keys present, .p8 permissions, launchd job loaded, server reachable, ios xcconfig + Xcode project present). Re-running is idempotent.

State lives under ~/Library/Application Support/riff/:

~/Library/Application Support/riff/
├── sessions/                    # one JSON per voice command
│   ├── _index.jsonl             # append-only history index
│   └── <session_id>.json
└── devices.json                 # {device_id: {push_token_hex, env, ts}}

The launchd plist lives at ~/Library/LaunchAgents/com.mark.riff-server.plist (copied, not symlinked, because launchd distrusts symlinked plists). Logs land at ~/Library/Logs/riff-server.log.

Distributing Riff (TestFlight)

Riff ships to the OpenClaw / self-host crowd via TestFlight (external testers + a public link). The build pipeline is automated by the /riff-publish skill (skills/riff-publish/, the distribution counterpart to /riff-update). Publishing is deliberate — each upload burns a build number and may trigger Beta App Review — so it's invoked by hand, never on every commit.

/riff-publish (the automated build)

skills/riff-publish/riff-publish.sh runs: guardrail → bump CFBundleVersion (host + widget in lockstep) → xcodegen → archive (Release) → export via ios/ExportOptions.plist (method = app-store-connect, automatic signing, upload symbols) → upload via xcrun altool (App Store Connect API key from ~/.env: ASC_KEY_ID + ASC_ISSUER_ID; the .p8 at ~/.appstoreconnect/private_keys/AuthKey_<ASC_KEY_ID>.p8). --no-upload produces the .ipa only (upload via Xcode Organizer instead).

GUARDRAIL (load-bearing): /riff-publish REFUSES to publish if ios/Riff.xcconfig bakes a real RIFF_SHARED_SECRET (non-empty, non-zeros) or Mark's MAC_MINI_HOST. A baked secret in a distributed .ipa is a committed credential handed to every tester. Distributed builds ship an empty secret/host; a voice-button tester enters the secret in-app (Settings → Voice server). To produce a clean build: printf 'RIFF_SHARED_SECRET =\nMAC_MINI_HOST =\n' > ios/Riff.xcconfig && (cd ios && xcodegen generate).

App Store Connect (one-time, web console — Mark)

  1. App record for bundle id mark.riff (Apps → +). Even a TestFlight-only app needs the record.
  2. TestFlight test info + a reviewer note explaining the self-host model (you drive your own Mac over SSH; provide demo Mac creds if feasible, else explain why a reviewer needs their own box). For Beta App Review, which the first external build must pass — internal testers (up to 100 on the team) do NOT require it, so that's the fast initial loop.
  3. An external testing group ("OpenClaw / self-host beta") + a public TestFlight link (up to 10,000 testers) for self-enrollment — the distribution surface the landing page points at.
  4. An App Store Connect API key (Users and Access → Integrations) → store ASC_KEY_ID + ASC_ISSUER_ID in ~/.env, .p8 in ~/.appstoreconnect/….

Versioning, privacy, ATS

  • CFBundleVersion must be unique + monotonic per upload/riff-publish bumps it (host + widget together). CFBundleShortVersionString is the marketing version (0.1.0).
  • Privacy questionnaire: Riff collects ~nothing. The terminal sends keystrokes only to the user's own Mac; audio leaves the device only if the user opts into voice → their own server → ElevenLabs. Answer "no data collected by us."
  • ATS: NSAllowsArbitraryLoads: true is needed for the plaintext SSH/HTTP- over-tailnet transport (Tailscale provides the encryption). TestFlight generally accepts it; the App Store (Tier-B) may demand a justification.
  • APNs: aps-environment: development (project.yml). TestFlight runs against production APNs. The terminal doesn't push (chat/APNs is shelved), so this is irrelevant to the distributed app — but a development-only entitlement uploaded to App Store Connect can trip validation. Verify the archive validates; if it complains, flip Release to production or drop the entitlement from the distributed build. (Verify-then-decide — don't blindly flip Mark's working dev build.)

Cadence

  • TestFlight builds expire 90 days after upload — testers lose access; re-run /riff-publish to refresh so the beta doesn't silently go dark.
  • A new external build may need Beta App Review re-approval if it adds capabilities; metadata-only changes usually don't.
  • Internal loop first: upload → install from the TestFlight app on Mark's device → confirm onboarding + terminal on a Release build (catches Release-only issues: empty xcconfig secret, production APNs, signing) → then external Beta App Review.

Distributing to Mark's own phone (OTA — the primary dev-deploy)

/riff-update installs to Mark's device over the CoreDevice tunnel (devicectl), which requires the phone on the same LAN and keeps dropping off-LAN ("device unavailable", error 1011). The /riff-ota skill (skills/riff-ota/) replaces it as the primary dev-deploy: it builds an ad-hoc-signed .ipa and hosts it + an itms-services manifest + a one-tap install page on the Tailscale Funnel (public HTTPS, Let's Encrypt cert). Mark installs by opening the install page in Safari and tapping Install Riff — from anywhere, off-LAN, over cellular, no cable. devicectl / /riff-update is now the cabled fallback.

~/.claude/skills/riff-ota/riff-ota.sh             # Debug build → publish → print the URL (no iMessage)
~/.claude/skills/riff-ota/riff-ota.sh --release   # Release build (TestFlight-equivalent smoke)
~/.claude/skills/riff-ota/riff-ota.sh --send      # ALSO iMessage the URL (default OFF — Riff reads it inline)

No-real-secret requirement (load-bearing)

A Funnel-hosted .ipa is publicly downloadable by anyone with the URL, so the OTA build must never bake a real RIFF_SHARED_SECRET or Mark's host. Two facts make this work with zero setup:

  • The secret is an all-zeros placeholder. Mark's RIFF_SHARED_SECRET (in ~/.env and ios/Riff.xcconfig) is 64 zeros — harmless to publish, and it's what the tailnet-only riff_server authenticates against. So /riff-ota ships it as-is (does NOT empty it) — that's what lets voice/narration authenticate on an OTA build with no in-app entry step. The build's verify REFUSES to publish if the baked secret is ever a REAL (non-zero) value, so a public .ipa still can never leak a real secret.
  • The host is emptied. /riff-ota passes only MAC_MINI_HOST="" as a build setting on archive (empty host → onboarding, the "clean" choice), leaving ios/Riff.xcconfig byte-for-byte untouched (checksummed before/after) so /riff-update's cabled flow keeps its baked config.

It then unzips the exported .ipa and verifies the embedded Info.plist carries no real secret (empty or all-zeros only) and no host before publishing.

If Mark ever sets a REAL (non-zero) secret server-side, the OTA path breaks: the build correctly refuses to bake it publicly, AND the terminal-primary build has no in-app secret-entry UI (Settings moved to the iOS Settings.app — Host + Voice toggles, but no secret field). A real secret would require adding a secret-entry UI first. Today's all-zeros placeholder sidesteps this entirely.

The flow

skills/riff-ota/riff-ota.sh: secret-less build-setting override → bump CFBundleVersion (host + widget lockstep) → xcodegen → archive (Debug default; --release for Release) → export via the committed ios/ExportOptions-adhoc.plist (method = development — signs with Mark's Apple Development cert + the development profile that embeds his registered device UDID, the same provisioning devicectl uses; itms-services installs it like an ad-hoc build. release-testing/ad-hoc need a Distribution cert this keychain lacks, and minting one over SSH hits "No Accounts" — so development is the working, lower-friction path) → verify the .ipa is secret-less → publish Riff.ipa + manifest.plist + install.html to ~/www/riff-ota/ → print/bb-send the install-page URL (https://marks-mac-mini.tail20af9f.ts.net/riff-ota/install.html).

The webpage-server serves the routes via dedicated handlers — the manifest as text/xml (itms-services refuses octet-stream) and the .ipa with Range/206 (resumable on cellular). See webpage-server/README.md.

One-time setup (Mark)

  • Onboarding (one tap-through). The OTA build ships an empty baked host, so the first launch shows onboarding (seedDevDefaultsIfNeeded no-ops without a baked host). Enter host marks-mac-mini.tail20af9f.ts.net, user mark. This writes UserDefaults that persist across install-over, so subsequent OTA builds open straight to the terminal.
  • No secret entry needed. Voice/narration work out of the box: the all-zeros placeholder secret ships baked and matches the server (see No-real-secret requirement). There's no in-app secret-entry field in the terminal-primary build anyway; the terminal needs no secret regardless.
  • Ad-hoc provisioning profile (only if export fails). If automatic signing can't mint one, do the one-time portal step: developer.apple.com → Profiles →
  • → Distribution ▸ Ad Hoc → App ID mark.riff → device 00008140-001C308A2101401C → generate → download, then re-run.

Gotchas

  • Open the install page in Safari. itms-services:// links are intercepted by SpringBoard; the raw link in Messages/Mail may do nothing — hence a landing page, not the bare URL.
  • Provisioning-profile expiry (~1 year): a build signed with an expired development profile won't launch (or install). Re-running /riff-ota re-signs, so routine use self-heals — re-run before debugging a months-old OTA build that won't launch.
  • iCloud Private Relay can break Funnel pages (TLS error on both cellular and Wi-Fi) — toggle it off for the install.

Workflows

First-time setup

  1. Apple Developer account: create App ID mark.riff, enable Push Notifications + Background Audio capabilities, generate APNs auth key (.p8), download to ~/.ssh/apns_riff.p8 chmod 600.
  2. Add RIFF_SHARED_SECRET, APNS_TEAM_ID, APNS_KEY_ID, APNS_BUNDLE_ID, APNS_PRIVATE_KEY_PATH to ~/.env.
  3. From ~/riff/: ./install.sh (validates env + .p8, installs pip deps if missing, symlinks server/riff_server.py and server/apns.py into ~/bin/, copies the launchd plist into ~/Library/LaunchAgents/, bootstraps the job).
  4. ./install.sh --bootstrap-ios — writes ios/Riff.xcconfig from ~/.env and runs xcodegen to (re)generate Riff.xcodeproj.
  5. Sign Xcode into the Apple Developer team. Open Xcode → Settings (Cmd-,) → Accounts → "+" → Apple ID. Sign in with the Apple ID that owns team 6C63UU27YB. Without this, xcodebuild for a device target fails with "No Accounts" / "No profiles for 'mark.riff'".
  6. Enable Developer Mode on the iPhone. Settings → Privacy & Security → Developer Mode → on. Phone reboots.
  7. Register App ID mark.riff in the Apple Developer console (Identifiers → "+" → App IDs → App; bundle ID Explicit mark.riff; capability: Push Notifications). The auto-signing flow in step 8 needs this entry to exist.
  8. Plug iPhone into the Mac mini via USB-C, unlock, "Trust This Computer" prompt → trust. From ~/riff/ios/: xcodebuild -project Riff.xcodeproj -scheme Riff \ -destination 'id=<your-iphone-udid>' \ -configuration Debug \ -allowProvisioningUpdates build install On the iPhone: Settings → General → VPN & Device Management → trust the developer profile.
  9. Open the Riff app once on the phone. Grant mic + speech + push permissions when prompted. Confirm the Settings tab shows "Push token: " — that means the device-token POST succeeded and ~/Library/Application Support/riff/devices.json now lists the phone.
  10. iOS Settings → Action Button → swipe to "Shortcut" → Choose a Shortcut → pick the auto-registered Riff shortcut ("Toggle Riff" / "Riff Toggle Recording"). It appears automatically after install (the App Intent's AppShortcutsProvider registers it) — no manual Shortcut to author. See Action Button configuration for the rationale and the locked-screen Face ID limit.
  11. Smoke-test: press Action Button, say "ping", expect a reply notification within ~3s.

Typical use (after setup)

  1. Press Action Button.
  2. Riff app launches (instant) and starts recording.
  3. Speak the command — take as long as you like; pauses are fine.
  4. Tap Send (middle-left), or press the Action Button again, to end recording and upload the audio. (Cancel discards.)
  5. The Mac transcribes via Scribe v2, runs Claude, and a notification arrives with the reply, readable on the lock screen.

Apple Watch (Phase 2, when Mark gets a Watch)

  1. Smart Stack widget or complication tap → recording starts on Watch.
  2. Audio goes through paired iPhone if non-cellular, direct over Tailscale (Tailscale supports cellular Watches) if cellular.
  3. Reply: haptic tap + notification on the Watch face.

Constraints / gotchas

  • Action Button is a one-shot launch trigger, not a held-down switch. iOS does not surface Action Button press/release events to apps. The Action Button opens a Shortcut → riff://toggle, which launches the app (or, if already recording, sends). The audio session starts in onAppear and ends only on a manual Send / Cancel / toggle. The press-and-hold walkie-talkie ergonomic is genuinely unavailable on iOS — see the "Why no PushToTalk" section in Scope.
  • Lock-screen recording requires Background Audio + an active audio session that started while the device was unlocked. When the Action Button launches Riff on a locked phone, iOS shows the app and lets it run, but interaction (tap Send) requires Face ID. The Action-Button toggle (a second press = send) sidesteps the need to tap the on-screen button — pending the locked-screen verification noted in Action Button configuration.
  • No on-device transcription ceiling anymore. The old SFSpeechRecognizer ~1-minute one-shot cap is gone — recording is unbounded (cap is the server's MAX_AUDIO_BODY = 25 MB ≈ 30 min of AAC). Transcription is server-side via Scribe v2; the phone only records audio.
  • Wispr Flow conflict: Wispr Flow holds the system audio session for system-wide dictation. Riff explicitly opens its own audio session with .playAndRecord category and .allowBluetooth. Wispr Flow should yield the session when Riff activates — verify on first install.
  • Tailnet name resolution from a freshly-launched app: the first call after a cold launch occasionally times out while the device's tailnet routes warm up. Build a 1.5s connection timeout + one retry into RiffClient.
  • APNs push delivery is not real-time-guaranteed. For a voice reply to feel "instant" the response should come back via the same HTTP request that delivered the transcript (synchronous reply in the response body), with APNs as a redundant notification path for when the user has already locked the phone. The app handles both.
  • App Store distribution is out of scope. A 7-day free signing certificate works for personal use; for a longer window, sign with the paid developer membership.

Weather widget (feels-like temp + clothing + icon)

EXTRACTED & REMOVED (2026-05-28). This widget is no longer part of Riff. It was split into a standalone app at ~/tops/ (own project, bundle IDs mark.tops / mark.tops.TopsWidget, App Group group.mark.tops). The design notes below are retained as history — the live code now lives in ~/tops/. Riff kept the App Group group.mark.riff only for HostKeyStore + SharedImageInbox.

Riff bundles a Home Screen + lock-screen widget that shows the apparent (feels-like) temperature for Mark's current location plus an at-a-glance clothing recommendation and a weather icon. Apple's stock weather widget gives true temperature + wind separately and forces him to mental-math the wind-chill; this widget shows the answer directly, and one glance at the SHIRT / SWEATER / JACKET / PARKA label answers "what do I throw on before I walk out the door."

Why bundle it inside Riff instead of a separate app

Riff already has paid Apple Developer signing, the App ID mark.riff registered, the iPhone UDID added to team 6C63UU27YB, and an install.sh that knows how to rebuild + deploy via xcodebuild. A WidgetKit extension is a sibling target to the iOS app, sharing the parent App ID for signing. Adding a separate "Weather" app would mean registering another App ID, registering the device again (the dance from phase 3), and a second project tree to maintain. Bundling is strictly less work for an additive feature.

The widget extension has its own bundle ID mark.riff.WeatherWidget (Apple convention is <parent-bundle-id>.<extension-name>). No second App ID registration in the Developer console — extension bundle IDs automatically inherit the parent App ID's entitlements + provisioning.

Widget families

family size content
.accessoryRectangular lock-screen rectangle (~160×40pt) Headline display, lock screen. Two-line layout, whole stack centered via VStack(alignment:.center) + outer .frame(alignment:.center) (an HStack maxWidth:.infinity does not center reliably in the accessory frame; never use a greedy Spacer — it flings temp/icon to opposite edges). Top row: large feels-like temp + large worst-of-day weather icon as a tight group (fixed 10pt gap), centered w.r.t. the bottom line. Bottom line: <wind> MPH · <actual>° · <CLOTHING>, U+00B7 middle-dot separated, single line with minimumScaleFactor(0.6) + lineLimit(1). Starting fonts 34/30/13pt, tuned by eye. The icon reflects the worst weather expected across today's local calendar day, not current conditions (see Weather icon mapping).
.systemMedium Home Screen, 4×2 (~330×155pt) Unchanged — the dense three-column layout (WIND left, big feels-like center with actual XX° beneath, precip-% + current-conditions icon + clothing right) at a roomier scale (70pt center). Optional — add it to Home Screen if you want the richer current-conditions display.
.accessoryInline single line near the clock 73°F feels like (appends · 30% rain when probability ≥ 20%).
.accessoryCircular circle widget Just 73°.

User picks the Home Screen medium widget from the Home Screen widget picker (long-press an empty area → "+" top-left → search "Riff Weather" → pick the medium / 4×2 variant → Add Widget). The accessory variants appear in the lock-screen widget picker (long-press lock screen → Customize → tap widget row → "Riff"). Tap-to-open routes to the Riff host app's Recording tab from any family — same as launching Riff from anywhere else. No new in-app surface for the widget.

Clothing recommendation

The right column of the medium widget shows one of four labels picked from feels-like temperature plus an actual-rate "raining now" boolean:

feels-like F   raining now   ->  label
< 50           any           ->  PARKA
50–59          true          ->  JACKET
50–59          false         ->  SWEATER
>= 60          true          ->  JACKET   (rain trumps shirt-weather)
>= 60          false         ->  SHIRT

"Raining now" is current.precipitation > 0 mm/h, not the hourly probability — chosen to fix the misty-day false-negative case where probability reads 0% but it's actively spitting. The threshold is tunable; raise to e.g. > 0.1 if trace amounts trip JACKET too often.

Edge: when feels_like_f == nil (placeholder previews or fully degraded states), the label defaults to SHIRT — least alarming.

Weather icon mapping

The medium widget's icon is an SF Symbol picked from Open-Meteo's WMO current.weather_code plus current.is_day:

weather_code         icon
{0, 1}               sun.max.fill   (is_day == 1)
{0, 1}               moon.fill      (is_day == 0)
{2, 3}               cloud.fill
{45, 48}             cloud.fog.fill
{51..67, 80..82}     umbrella.fill
{71..77, 85..86}     snowflake
{95..99}             cloud.bolt.fill
anything else / nil  cloud.fill

Thunderstorm uses cloud.bolt.fill rather than umbrella.fill — thunderstorm is a distinct hazard from generic rain and benefits from its own glyph. One-line change in iconName(...) to flip later.

Lock-screen rectangular widget — worst-of-day icon. The .accessoryRectangular icon does not use current weather_code. Instead it shows the worst (most weather-severe) WMO code across today's local calendar day, so a glance at the lock screen at 8am warns of an afternoon thunderstorm even when the sky is currently clear. Today's per-hour codes (hourly.weather_code[], all 24 local slots — see Data source) are reduced to one code by the pure worstWeatherCode(of:) severity rank in OpenMeteoClient.swift:

severity (highest first → surfaced)   WMO codes
thunderstorm                          95, 96, 99  (also 97, 98)
snow / snow showers                   71, 73, 75, 77, 85, 86
rain / rain showers                   61, 63, 65, 66, 67, 80, 81, 82
drizzle / freezing drizzle            51, 53, 55, 56, 57
fog                                   45, 48
cloud (partly / overcast)             2, 3
clear / mainly clear                  0, 1
unknown code                          ranked just above clear (mild)

The tier ranges in worstWeatherCode(of:) deliberately match the iconName(...) range groupings, so the surfaced icon and the severity ranking never disagree (rain vs drizzle is split into two severity tiers within the single shared umbrella.fill icon group — a stricter split that cannot change which icon shows). On a severity tie the first-seen code at the max tier is returned (cosmetically irrelevant — any member of a tier maps to the same icon).

If the hourly forecast fetch fails or returns an empty array, worst_weather_code is nil and the rectangular icon gracefully falls back to the current weather_code (rectIconCode = entry.worst_weather_code ?? entry.weather_code). The medium widget's icon is always current weather_code + is_day and is unaffected by this. The rectangular icon is rendered with the day variant (it is a daily summary; only codes 0/1 branch on is_day, so storm/rain/snow codes are unaffected).

Data source — Open-Meteo

https://api.open-meteo.com/v1/forecast?latitude=<lat>&longitude=<lng>&current=temperature_2m,relative_humidity_2m,wind_speed_10m,precipitation,weather_code,is_day&hourly=precipitation_probability,weather_code&forecast_days=1&timezone=auto&temperature_unit=fahrenheit&wind_speed_unit=mph

  • Free, no auth, no quota worries for personal use.
  • forecast_hours=1 was removed. Verified live (2026-05-16): forecast_hours=1 + forecast_days=1 together collapse both hourly arrays to a single element (precipitation_probability len 1, weather_code len 1) — so the worst-of-day reduction would only see one hour. With forecast_days=1 + timezone=auto only, both hourly arrays are the full 24-slot local calendar day (pp len 24, wc len 24), index 0 = location-local midnight, last slot = local 23:00. timezone=auto localizes the ISO-8601 timestamps in hourly.time and current.time but does not change any numeric value the widget reads.
  • Response: current.temperature_2m (true temp, °F — also the feels-like input; see Feels-like model below), current.relative_humidity_2m (0–100 — NWS heat-index input), current.wind_speed_10m (mph — NWS wind-chill input), current.time (local ISO-8601 — used to index the current hour), hourly.precipitation_probability[] (24 local-day slots; the widget reads the slot whose hourly.time equals current.time, falling back to [0] if the timestamps don't align — % chance of precip in the current hour), hourly.weather_code[] (24 local-day per-hour WMO codes — reduced to the single worst code by severity via worstWeatherCode(of:); drives the lock-screen rectangular icon), current.precipitation (mm/h — drives the "raining now" boolean for clothing logic; documented as mm regardless of temperature_unit), current.weather_code (WMO integer code — drives the medium icon and is the rectangular icon's fallback when the hourly array is absent), current.is_day (0/1 — selects sun vs moon for clear codes).
  • ~120ms typical response time over the tailnet's egress.

Apple's WeatherKit is the alternative and was considered. Rejected because it requires (a) enabling the WeatherKit capability on the App ID in the Developer console (extra step), (b) WeatherKit token issuance per request (extra JWT signing), and (c) it's a Mac-side thing not relevant to a widget extension. Open-Meteo's only downside is location metadata travels to a third-party server — documented under Risks.

Feels-like model

The big number is feels-like, computed locally by apparentTempF(tempF:windMph:humidityPct:) in OpenMeteoClient.swift using the Steadman shade apparent-temperature model (Australian BOM):

AT = Ta + 0.33·e − 0.70·ws − 4.00              (Ta °C, ws m/s)
e  = (RH/100)·6.105·exp(17.27·Ta / (237.7+Ta)) (vapour pressure, hPa)

History of this decision (don't undo it without reading):

  1. Started as Open-Meteo's apparent_temperature — ran ~6 °F colder than Apple at ~47 °F. That field is this Steadman model plus a solar-radiation term; the radiation term is what over-cooled it.
  2. Switched to the US NWS convention (wind chill < 50 °F, heat index ≥ 80 °F, air temp between). Fatal flaw for this use: the 50–80 °F band has no adjustment at all — feels-like == air temp, wind ignored. That band covers most of Mark's weather, so it showed feels==actual even at 67 °F/11 mph. Rejected.
  3. Now: plain Steadman, no radiation term. Continuous across all temperatures, wind- and humidity-sensitive (no dead band) — this is how Apple's "feels like" behaves directionally.

Not Apple-identical — Apple's formula is proprietary. Steadman reads a few °F below actual whenever it's breezy (that is apparent temperature; ~62–63 °F at 67 °F/11 mph). If it lands consistently off Apple in one direction, add a flat calibration offset in apparentTempF (noted in its doc comment). Inputs all come from the same current= block; units already match the formula (°F, mph, RH 0–100).

Location

CoreLocation with When-In-Use permission. Widgets re-acquire location on each timeline reload (Apple permits this for widgets). Fallback chain:

  1. CLLocationManager.requestLocation() — fresh one-shot fix.
  2. CLLocationManager.location — last cached coord from CoreLocation.
  3. UserDefaults key lastKnownCoord — written by the host Riff app on every foreground launch, read by the widget when 1 and 2 fail.
  4. If all three fail, widget renders a "Tap to grant location" prompt that opens the host app, which deep-links to Settings.

The NSLocationWhenInUseUsageDescription plist string is shared between host and widget: "Riff shows the feels-like temperature for your current location on the lock screen." Both plists need it; the widget's plist is auto-generated by XcodeGen.

Refresh cadence

TimelineProvider returns 4 entries spaced 30 minutes apart on every reload, so the widget shows fresh-looking numbers for 2 hours without a re-fetch. Apple's documented widget budget is ~70 timeline reloads per day; 48 reloads/day (one every 30 min) sits comfortably inside.

If the phone has been locked overnight and the widget's last reload was 8h ago, the displayed number may be stale — the widget shows a tiny relative-time stamp (updated 14m ago) so Mark can tell at a glance. On unlock, the widget re-renders within ~5s under normal operation.

Code structure (additive to Riff)

ios/
├── Riff/                          # existing host app (unchanged)
└── RiffWidget/                    # widget extension target
    ├── RiffWidgetBundle.swift     # @main bundle, lists WeatherWidget
    ├── WeatherWidget.swift        # Widget definition + family list
    ├── WeatherProvider.swift      # TimelineProvider: CoreLocation + Open-Meteo
    ├── WeatherEntry.swift         # TimelineEntry: temp_f, feels_like_f, wind_mph, precip_prob_pct, precip_mm_h, weather_code, worst_weather_code, is_day, fetched_at
    ├── WeatherView.swift          # SwiftUI views per family (medium/rect/inline/circular); pure helpers clothingLabel(...) and iconName(...) live at the bottom of this file (file-private; testable in isolation if a test target is ever added)
    ├── OpenMeteoClient.swift      # Stdlib URLSession wrapper, decodes JSON
    └── Info.plist                 # generated by XcodeGen

ios/project.yml gains a RiffWidget target with type: app-extension, extensionType: WidgetKit, the standard widget entitlements (com.apple.security.application-groups shared with the host so the lastKnownCoord UserDefaults handoff works), and the WhenInUse location plist string.

Phase 3 of Riff (widget add)

phase scope depends on gating
3 Lock-screen feels-like-temp widget (Open-Meteo + CoreLocation + WidgetKit) Phase 1 shipped + device install completed (already done) none past current state

No new Apple-console steps. No new env vars or secrets. The widget ships in the same xcodebuild build install pass as the host app.

Verification

  1. Install build, launch Riff once, grant location permission.
  2. Long-press an empty area of the Home Screen → "+" top-left → search "Riff Weather" → pick the medium (4×2) variant → Add Widget. (Lock- screen accessory variants use the lock-screen widget picker: Customize → tap widget row → "Riff".)
  3. Within a minute the widget should populate with a number; if it sticks on "—" check Settings → Privacy → Location Services → Riff is set to "While Using".
  4. Walk around the block — refresh should pick up location change at next 30-min reload (or sooner on lock-state change).
  5. Sanity-check the clothing label against the on-screen feels-like number and the truth table above (< 50 → PARKA, 50–59 and dry → SWEATER, >= 60 and dry → SHIRT, any temp with rain → JACKET/PARKA).

Risks (widget-specific)

risk mitigation
CoreLocation flakiness in widget extensions — Apple permits but advises caching Three-step fallback chain ending in app-shared UserDefaults; "Tap to grant" prompt when all sources fail.
Open-Meteo egress: device lat/lng → third-party server Documented. Mark accepts this for free, no-auth, no-quota access. Alternative is WeatherKit (Apple-first-party) at the cost of more setup; logged but not chosen for v1.
Stale data after long lock Relative timestamp visible on the widget. On unlock the widget re-renders, which usually triggers a refresh within Apple's budget.
Widget exhausts daily timeline budget 30-min cadence yields 48 reloads/day vs Apple's ~70/day budget. Plenty of headroom.
User denies location permission Widget renders "Tap to grant" prompt; tap opens host app's Recording tab, which surfaces a one-shot Location permission ask. If still denied, widget shows "—".

Phases

phase scope depends on gating
1 Action Button → app → record → on-device transcribe → POST → riff poll event → synchronous reply + APNs push Apple Developer membership, APNs key, Tailscale on iPhone none past membership
2 watchOS target, Smart Stack widget, complication, voice + haptic reply Phase 1 shipped, Mark has an Apple Watch Watch hardware
3 Lock-screen feels-like-temp widget Phase 1 shipped none

Roadmap / backlog (added 2026-05-22)

Direction set with Mark on 2026-05-22 — Riff is evolving from a one-shot voice Q&A into a full native chat client for his Claude assistant (mirroring how he already talks to it over iMessage), with best-in-class frictionless voice input.

  • Manual unlimited dictation + Scribe v2 STTbuilt 2026-05-22 (commit fa1f8bf); server deployed + verified end-to-end; iOS client build pending device install. No auto-send (pause as long as you want), middle-left Send, Action Button riff://toggle (press = start, press = send), audio uploaded to the Mac → ElevenLabs Scribe v2 → Claude. Replaced the on-device SFSpeechRecognizer path (~60s cap + first-word VAD cutoff). The mic phases in/out — held only while the record screen is active, and released the instant the app backgrounds or the user switches tabs (scenePhase.background / .onDisappearsetActive(false, .notifyOthersOnDeactivation)), so it never hogs the mic or keeps another app's audio (a podcast) interrupted; returning to the record screen re-acquires.
  • Full iMessage-style chat interface — building incrementally across three phases.
  • Phase A — chat thread + persistence + text input + multi-turn server memory. Built 2026-05-22. Scrolling bubble thread (ChatView), text and voice input into one rolling conversation, on-device persistence (MessageStore), and a server that keeps each conversation's history (conversations/<id>.jsonl) and replays the last 30 turns to the poll session every turn (_converse + render_window). New endpoints POST /riff/message + GET /riff/conversation; /riff/audio gained an optional X-Riff-Conversation-Id (one-shot stays back-compat without it). The History tab was removed (the thread is the history). See Chat client + conversation store. Deploy needs a riff-server reload and a riff poll-session restart — poll-instructions.md changed.
  • Phase B — file / image attachments. Not built. Compose-bar attachment button (PhotosPicker + document picker); phone uploads files with a message (multipart); the server saves them under attachments/ and references the paths so the poll session can Read them (images → vision like diet-log, docs → read). The Phase A compose bar already renders a disabled attachment button as the placeholder.
  • Phase C — Action Button App Intent. Built 2026-05-22 (client-only — no server / poll change). Replaced the riff://toggle URL re-fire (broken second-press-to-send when foregrounded) with an in-process AppIntent (RiffToggleIntent) auto-registered as an App Shortcut (RiffShortcuts in RiffIntents.swift). perform() runs in-process on every press and posts the existing .riffToggle notification (via a tiny RiffToggleBus), so the Phase A wiring (ChatViewtoggleVoice(), ContentView → Chat tab) is reused verbatim. openAppWhenRun = true foregrounds the app. The riff://toggle URL scheme + .onOpenURL are kept as a zero-cost documented fallback (same notification). The locked-screen Face ID requirement is an OS limit (confirmed on-device 2026-05-22), documented not engineered around. Bind via Settings → Action Button → Shortcut → the auto-registered Riff shortcut. See Action Button configuration.
  • TTS read-aloud toggle — ElevenLabs voice-out; when on, Riff speaks the reply (in addition to / instead of showing it). Enables hands-free. ELEVENLABS_API_KEY already present (shared with newsfeed TTS).
  • "Hey Siri, Riff" invocation — a Siri Shortcut / App Intent that starts Riff listening. Combined with the read-aloud toggle this is the full hands-free + in-car loop (speak → answer spoken back), using only supported APIs.
  • Apple Watch — see Phase 2 (watchOS target), when Mark has a Watch.
  • CarPlay — investigated, not viable. Apple gates CarPlay to fixed app categories (audio / nav / comms / EV / …) with locked templates; a custom AI-assistant chat UI doesn't qualify and the entitlement isn't grantable for it. The in-car experience is delivered via "Hey Siri, Riff" + read-aloud instead — no CarPlay entitlement needed.

Risks

risk mitigation
~~On-device Speech accuracy poor for technical jargon~~ (resolved 2026-05-22) Resolved by the cloud swap. Transcription moved off-device to ElevenLabs Scribe v2 (~2.2% WER) precisely to fix jargon accuracy ("Kalshi", "Hyperliquid", "git rebase"). The audio is uploaded to the Mac, which transcribes it.
Scribe model id (scribe_v2) drift Module constant ELEVENLABS_STT_MODEL (one-line change), overridable via ~/.env; fallback id scribe_v1. Confirmed live 2026-05-22: scribe_v2 accepted, text field present.
Large-upload retry double-billing A blind retry after a timeout could re-run Scribe + double-post to the poll session. RiffClient.postAudio retries only on connection-never-established errors (cannotConnectToHost/notConnectedToInternet), never on timedOut.
Audio leaves the device (privacy) Deliberate, accepted trade for accuracy. /riff/audio is tailnet-only (no Funnel), so audio→Mac never hits the public internet; only Mac→ElevenLabs does (TLS, same as newsfeed TTS). See Privacy delta.
AAC encode / AVAudioFile format friction on device Tap buffers (hardware format) are converted to the file's mono-16kHz processing format via AVAudioConverter in AudioFileWriter before writing. Documented fallback if it ever fights the format on a specific route: a parallel AVAudioRecorder (Option B).
Tailscale on iPhone disconnects (reset, OS update) App detects connection failure, surfaces "Tailscale offline" in the recording UI; airplane-mode mid-record then Send fails with the normal offline error (no transcript fallback, by design). Mark re-enables Tailscale and retries. No iMessage fallback in v1 (would re-introduce the original friction).
APNs auth key compromise Stored at ~/.ssh/apns_riff.p8 mode 600, not committed; rotate by generating a new key in Apple Developer console and updating ~/.env.
Wispr Flow doesn't yield the audio session Document the Settings flag to disable Wispr Flow temporarily; surface a "audio session unavailable" error in the app if recording fails to start.
Personal-use signing expires every 7 days for free certificates Mark has paid membership — use his team's certificate for 1-year expiry. Document re-signing cadence in install.sh.

Divergences from the original spec

The spec was written before the build; these are the deltas the build introduced and that future readers should know about.

  • Pivot to an SSH terminal + voice-inject (2026-05-22): the biggest direction change since the chat build. The primary surface is no longer the iMessage-style chat thread — it's a SwiftTerm terminal attached over SSH to a persistent tmux session running claude, with the Action Button dictating transcribed text straight into the live REPL. The chat UI (ChatView/ChatViewModel/MessageStore) and the chat server endpoints are shelved (kept, unlinked) — see Terminal architecture + Kept / shelved. New iOS code under Riff/Terminal/: TerminalTransport (the swap seam), SSHTransport (SwiftNIO SSH → PTY → tmux), TerminalSurface (SwiftUI wrap of SwiftTerm TerminalView), TerminalController (transport + auto-reconnect), TerminalScreen (the view), VoiceInjectController (record → transcribe-only → inject), SSHKeyStore (on-device ed25519 → Keychain). New server endpoint POST /riff/transcribe-only. New Mac-side scripts/riff-tmux-up.sh + LaunchAgents/com.mark.riff-tmux.plist.
  • mosh deliberately NOT built (2026-05-22): an earlier plan staged mosh as the eventual transport (instant local echo + roaming). Dropped because mosh is GPLv3+ and Riff is being built so it could be sold — a GPL transport imposes distribution obligations and is App-Store-incompatible. SwiftTerm (MIT) + SwiftNIO SSH (Apache-2.0) keep the stack permissive. Network changes are handled by SSH auto-reconnect (re-attach to the persistent tmux session), not roaming. The TerminalTransport seam is where a MoshTransport could land if the licensing decision ever changes.
  • SSH client = SwiftNIO SSH (not libssh2): pure-Swift, no C build, no OpenSSL. The interactive PTY is an exec of the tmux attach-or-create line under a requested PTY (not a plain login shell + injected keystrokes — the exec-under-PTY path is more robust and avoids shell-prompt timing races).
  • Host key: TOFU pinning (hardened 2026-05-24): the original AcceptAllHostKeysDelegate (trust any host key, lean on the tailnet ACL) was replaced by PinnedHostKeyDelegate + HostKeyStore — trust-on-first-use, pin thereafter, hard-fail a changed key. See Host-key trust (TOFU pinning). This was the security gate for distributing Riff beyond Mark.

  • State path: an earlier draft of this README placed sessions under $HOME/riff/sessions/. Implementation moved every Riff artifact (sessions/, _index.jsonl, devices.json) under ~/Library/Application Support/riff/ to match the global CLAUDE.md convention for per-project artifacts (newsfeed, trade, webpage-server all live there too).

  • Simulator target: spec listed iPhone 16 Pro as the dev sim. The Mac mini only has iPhone 17 series simulators installed (17, 17e, 17 Pro, 17 Pro Max). Phase 2 verification used iPhone 17 Pro. The hardware target is Mark's iPhone 16 Pro Max (RogersNet, iPhone17,2 — note: that "17,2" is the Apple model identifier for the 16 Pro Max, not the iPhone 17 line).
  • Wispr Flow yield: still unverified on real device as of this README. The audio-session yield assumption rests on Wispr Flow being a normal foreground-mic app that yields .playAndRecord to whichever app activates it most recently. The app surfaces "audio session unavailable" on failure; if that happens in practice, the user-action workaround is to force-quit Wispr Flow before pressing the Action Button. Will update this paragraph after first device smoke.
  • Action Button: still requires the user to set "Open App: Riff" in iOS Settings — no programmatic surface for that. Documented in the install steps above.
  • Push registration UI: the Settings tab in the iOS app shows the current notification authorization status, a hex-truncated push token preview, the timestamp of the most recent successful registration, and any registration error. This is for diagnostic visibility — ~/Library/Application Support/riff/devices.json is the authoritative record on the server.
  • HMAC keying (2026-05-15): the server originally keyed hmac.new(secret.encode("utf-8"), …), which used the 64-char hex string as a 64-byte ASCII key. The iOS client builds a CryptoKit SymmetricKey from Data.fromHex(hex), a 32-byte raw key — so every request from iOS 401'd. The server now keys with bytes.fromhex(secret) to match the iOS interpretation, and main() fails fast at boot if the secret isn't valid hex.
  • Recording permission gate (2026-05-15): the recording flow now has a .permissionRequired(missing:) phase that surfaces a Settings deep-link when mic permission is denied, instead of silently entering .recording against a dead engine ("(listening…)" with a flat waveform). Mic permission uses the iOS 17 AVAudioApplication.requestRecordPermission API. The Recording tab now also shows a manual "Start Recording" / "Try Again" button when in .idle / .error so the app is recoverable without quitting.
  • Capture rework — manual send + server-side Scribe v2 STT (2026-05-22): the biggest change since Phase 1. (1) Auto-send removed — the 1.2s silence/stability timer is gone; recording ends only on a manual Send, Cancel, or Action-Button toggle, so Mark can pause to think without being cut off. (2) Send moved to the middle-left (a circular thumb-reach button) with Cancel demoted to a slim bottom bar. (3) Action Button now opens riff://toggle (a custom URL scheme + .onOpenURL.riffToggle notification → vm.toggle() / Record-tab select) instead of "Open App: Riff". (Superseded by the Phase C App Intent, 2026-05-22 — the URL scheme is now a fallback; the Action Button binds to the Riff App Shortcut. See Action Button configuration + the Phase C roadmap entry.) (4) SFSpeechRecognizer ripped out entirely (import Speech, the speech-auth, and NSSpeechRecognitionUsageDescription all gone); the phone records audio to an AAC/m4a file (mono 16kHz ~32kbps via AVAudioFile + AVAudioConverter, encapsulated in an off-actor AudioFileWriter so the audio render thread writes without crossing @MainActor isolation) and uploads the raw bytes to the new POST /riff/audio. (5) riff_server transcribes server-side via ElevenLabs Scribe v2 (transcribe_elevenlabs, shared _run_and_respond tail with the text path) and feeds the transcript into the unchanged poll/Claude pipeline. (6) The live transcript UI is gone (waveform + "Recording…" stay; .sending reads "Transcribing…"). Client timeouts raised to 95s request / 150s resource to cover the two-stage Scribe+Claude budget plus the upload. Batch only — streaming (scribe_v2_realtime) is deferred.

See also

  • ~/agents/imessage-dispatcher/ — the iMessage path this app is designed to bypass. The voice → claude-CLI plumbing on the Mac side is conceptually similar.
  • ~/agents/scripts/bb-send.sh — iMessage relay; used today for replies in the iMessage flow. Riff replies use APNs instead.
  • ~/agents/webpage-server/ — HTTP server pattern; riff_server.py follows the same launchd-managed shape.
  • Apple docs: Speech framework on-device transcription, APNs HTTP/2 token auth, Background Modes (audio).