riff
A native iOS client for the self-hosted Claude box you already run. If you
have a Mac on a tailnet with the claude CLI (the OpenClaw setup, or any
self-hosted Claude Code box), Riff puts it on your phone as a real terminal —
full SwiftTerm fidelity driving the live claude REPL over SSH — with the
Action Button dictating voice straight into the session. Not a messaging
bridge: the actual terminal.
- Terminal = zero server. Remote Login on, your phone's SSH key in
authorized_keys,tmux+claudeon the Mac. That's it. (Voice is an optional add-on — see Voice server quickstart.) - Vendor-neutral. Plain SSH + tmux + Claude Code. Riff does not use or require OpenClaw — OpenClaw users are just the audience who've already done the prerequisites. If you stop running OpenClaw, Riff keeps working.
- TestFlight:
<TESTFLIGHT_PUBLIC_LINK — fill in after the App Store Connect external group + public link are created (Phase 4.1)>
See Getting started (any user) and the OpenClaw quickstart to set up in ~30 seconds. The rest of this README is the canonical spec.
A custom iOS app that puts Mark's Mac mini claude CLI on his phone as a
live terminal, with the Action Button driving voice dictation straight
into that terminal. Riff opens to a full-screen SwiftTerm view attached over
SSH to a persistent tmux session running claude; press the Action Button,
talk, and the transcribed text is typed into the live REPL.
Current primary surface: the SSH terminal + voice-inject (2026-05-22). The original build (and builds through 20) was a chat client — an iMessage-style thread that round-tripped through
riff_server→ aclaudepoll session → APNs. That chat UI is shelved (files kept, unlinked from the view tree) in favor of talking toclaudedirectly in a terminal. See Terminal architecture and the Kept / shelved map. The chat-era docs below (chat client, conversation store, APNs reply path) describe the dormant path; they still run server-side but the terminal never calls them.
The Action Button on iPhone 15 Pro and later is a configurable hardware button; today the only sensible voice path on a locked iPhone is Siri, which conflicts with Wispr Flow (which Mark keeps on for everything else). Blink (the obvious off-the-shelf SSH terminal) has no Action-Button voice dictation into the session — that single capability is the reason Riff exists: the shipped record→Scribe pipeline supplies it, repointed to type into the terminal instead of posting to a chat thread.
Terminal architecture (primary surface, 2026-05-22)
iPhone (Terminal tab — SwiftTerm full-bleed)
│ keystrokes (TerminalViewDelegate.send) ▲ output bytes (feed)
▼ │
┌──────────────────────────────────────────────────────────┐
│ TerminalController (owns transport + reconnect state) │
│ • TerminalSurface: UIViewRepresentable<TerminalView> │
│ • VoiceInjectController: record → /transcribe-only → │
│ inject transcript via transport.write (same byte path) │
└───────────────┬────────────────────────────────────────────┘
│ TerminalTransport (the swap seam)
▼
┌──────────────────────────────────────────────────────────┐
│ SSHTransport (SwiftNIO SSH, Apache-2.0) │
│ ClientBootstrap → TCP :22 → NIOSSHHandler (client) │
│ • auth: on-device ed25519 key (Keychain) via publickey │
│ • host key: TOFU pin (trust on 1st connect, hard-fail │
│ on change — PinnedHostKeyDelegate + HostKeyStore) │
│ └─ session channel → pty-req (xterm-256color, cols×rows)│
│ → exec `tmux new-session -A -s riff -c ~/agents …`│
│ • inbound channel/stderr bytes → feed() SwiftTerm │
│ • write() → channel stdin ; resize() → window-chg │
└───────────────┬────────────────────────────────────────────┘
│ SSH over Tailscale (no Funnel — tailnet only)
▼
┌──────────────────────────────────────────────────────────┐
│ Mac mini : sshd (Remote Login) → tmux session `riff` │
│ • $SHELL -lc 'exec tmux -L riff new-session -A -s riff │
│ -c "$HOME" env -u TMUX -u TMUX_PANE claude …' │
│ (login shell → PATH resolves tmux/claude; no abs paths) │
│ • the live `claude` REPL the phone drives │
│ • survives disconnects; every (re)connect re-attaches to │
│ the SAME session (claude keeps its context) │
└────────────────────────────────────────────────────────────┘
The terminal talks to claude directly over SSH. The chat round-trip
through riff_server (/riff/message, conversation store, multi-turn replay,
APNs) is bypassed entirely — claude's own CLI context is the memory. The
only riff_server call on the terminal's hot path is POST /riff/transcribe-only
(Scribe text for voice-inject; no claude, no APNs, no conversation write).
The TerminalTransport seam (and why mosh is deferred)
TerminalTransport (in Terminal/TerminalTransport.swift) abstracts the byte
channel: connect(), write(), resize(), disconnect(), onOutput,
onClosed. The rendering surface, keystroke path, resize logic, and
voice-inject are all written against this protocol, so the transport underneath
is swappable.
Riff ships SSH only and stays permissively licensed (SwiftTerm MIT +
SwiftNIO SSH Apache-2.0). An earlier plan staged mosh as the eventual
transport (instant local echo + roaming across network changes). It is
deliberately not built here: mosh is GPLv3+, and Riff is being built so
it could be sold — shipping a GPL transport (or any reused Blink mosh
component) would impose GPL distribution obligations and is famously
incompatible with App Store terms. The clean-licensing requirement outranks the
roaming nicety. If mosh is ever revisited, the TerminalTransport seam is where
a MoshTransport would drop in — but only behind a deliberate licensing
decision. For now, network changes are handled by auto-reconnect (below),
not roaming.
Auto-reconnect
SSH drops on a network change (Wi-Fi ↔ cellular, tailnet re-route). The
TerminalController watches transport.onClosed; on an unexpected drop it
re-SSHes and re-execs the tmux new-session -A attach line on a bounded
backoff (1, 2, 4, 8, 15s), surfacing connecting / connected / reconnecting /
disconnected (tap to retry) as a status chip. Because the tmux session lives
server-side, a reconnect re-attaches to the same live claude with its
context intact. A deliberate disconnect() (or app teardown) does not
trigger reconnect.
The bottom button bar floods waveformRed while (re)connecting, but the glow is
debounced (connectingGlowDelay, ~0.3s): it appears only if the
connecting/reconnecting state persists past that, so a fast connect — a new
session over the already-live SSH/tmux, or a fast reconnect — does not flash
red. Red-on-tap is reserved for the hold-to-close affordance. (Future, not built:
a minimum-show floor so a glow that does appear can't flash-and-vanish if connect
finishes just after the threshold.)
Voice-inject (the unique value)
Action Button → .riffToggle → VoiceInjectController.toggle():
- not recording → start the mic (reused RecordingViewModel + AudioFileWriter,
unchanged AAC/m4a capture);
- recording → stop, upload the clip to POST /riff/transcribe-only, get back
{transcript}, and type it into the terminal via transport.write — the
same byte path keystrokes take.
Inject does NOT auto-press Return by default (autoSubmitVoice, a Settings
toggle, default off). The transcript lands at the claude prompt; Mark
eyeballs it, edits a misheard word, and presses Return himself. STT is
imperfect and a wrong auto-submitted prompt to a coding agent is costly. Flip
the toggle on to auto-Return (\r, 0x0D) if editing turns out to be rare. The
setting governs every stop gesture uniformly — the mic-button stop, the Action
Button, and a tap on the recording waveform all honor it.
While a clip is transcribing the recording LED grid stays on screen with every bar lit (with a gentle ~0.9s breathing pulse) — the lit grid is the progress indicator; there is no spinner on the mic button. The grid clears the instant the status returns to idle (or shows the error text on failure).
SSH key / authorization
The ed25519 keypair is generated on-device on first run and the private
key is stored in the iOS Keychain (never UserDefaults, never the bundle,
never the repo). The public key is shown in Settings → SSH public key as a
copy-to-clipboard authorized_keys line. The user pastes it into the Mac's
~/.ssh/authorized_keys once to authorize the phone; revoke by deleting that
line. (See Terminal/SSHKeyStore.swift.)
Host-key trust (TOFU pinning)
Riff pins the Mac's SSH host key on a Trust-On-First-Use basis
(Terminal/HostKeyStore.swift + PinnedHostKeyDelegate in SSHTransport.swift),
replacing the original accept-any-key delegate:
- First connect to a host:port → the presented host key is trusted silently
and persisted (keyed by
host:port, in the shared UserDefaults suite — host keys are public, so they do not go in the Keychain). - Every later connect → the presented key must match the pinned one, exactly
as
sshchecksknown_hosts. - A CHANGED key is never auto-accepted — the connection HARD-FAILS with a man-in-the-middle warning and surfaces on the status chip as "Disconnected" with the mismatch reason. This is the security win over accept-any.
The pinned key's SHA256 fingerprint is shown in Settings → Host key, where
a careful user can verify it out-of-band against
ssh-keygen -lf /etc/ssh/ssh_host_ed25519_key.pub on the Mac (or ssh-keyscan).
Settings → Reset trusted host key clears the pin for the current host:port so
a user who legitimately reinstalled the Mac (new host key) or pointed Riff at a
different box can re-TOFU on the next connect — this is the documented recovery
for the hard-fail.
v1 policy (deliberate): silent auto-TOFU on first use + hard-fail on mismatch, with no inline "trust this key?" modal. The eager multi-session bootstrap (
SessionManager.bootstrap) connects several sessions concurrently, so a blocking prompt would race N ways. The Settings fingerprint review/reset is the careful-user affordance instead; an interactive first-use confirm (which must serialize the first connect) is a possible fast-follow.
Mac-side contract (what must be true)
- Remote Login (SSH) ON —
System Settings → General → Sharing → Remote Login = ON(orsudo systemsetup -setremotelogin on). Without it the SSH connect is refused. The agent cannot verify this non-interactively; Mark confirms. - The phone's public key is in
~/.ssh/authorized_keys(copied from Settings → SSH public key). - A persistent
tmuxsession namedriff. Optional to pre-create: the iOSSSHTransportrunstmux -L riff new-session -A -s riff(attach-or- create) on connect, so the session is created on first phone connection. To have it exist before the phone connects (snappier first attach), the optionalscripts/riff-tmux-up.sh(+ a per-user LaunchAgent) brings it up at login. A terminal-only user does NOT need it. - The PATH footgun — solved by a login shell (no absolute paths). A
non-login SSH
execshell has neither Homebrew nor~/.local/binonPATH, which is why the original build hardcoded absolute paths to tmux + claude. The de-Marked build instead runs everything through the user's login shell —$SHELL -lc 'exec tmux -L riff new-session -A -s riff -c "$HOME" env -u TMUX -u TMUX_PANE claude …'— which sources the user's profile sotmuxandclauderesolve on$PATHon ANY Mac. No absolute paths, no bundled launcher script. Theriff.tmux.confessentials are applied inline (tmux\;-chainedset -g).env -u TMUXis the truecolor trick (claude downgrades to 256-color when it sees$TMUX). The start directory defaults to$HOMEand is configurable in Settings (riff.ssh.startDir). The harness — the command launched on that exec slot — is a single free-text string (riff.ssh.harness, default the literalclaude): a binary name, an absolute path, or any shell command (codex,/usr/local/bin/codex,aider --model x), run verbatim, mirroring how the start directory is configurable. The stringclaude(the default) or empty keeps the native Claude launch (RiffTmux.claudeLaunch(), worktree fork intact); any OTHER string launches inline viaRiffTmux.launchForinstead ofclaude. SeeRiffTmuxinSSHTransport.swift; mirror any change inriff-tmux-up.sh.
Worktree (opt-in, default OFF)
A Worktree toggle in iOS Settings ▸ Claude (riff.claude.worktree,
default false) makes each NEW session run claude inside its own git
worktree on its own branch, isolating concurrent sessions' working trees. OFF
(the default) ⇒ behavior is exactly today's. The toggle ONLY buys isolation —
there is no auto-merge, no conflict resolution, no branch UI, no reaper; you
merge riff/<session> yourself if you want it.
How it works when ON: the new session's tmux env carries RIFF_WORKTREE=1 +
RIFF_START_DIR=<start dir> (spliced as tmux -e flags in RiffTmux.createCommand),
and the launch routes through scripts/riff-claude.sh if it's on the Mac's
login $PATH (else it falls back to the inline launch — no wedge, nothing to
install for a worktree-off user). If RIFF_START_DIR is a git work tree, the
script creates-or-reuses a worktree at
~/Library/Application Support/riff/worktrees/<repo>/<session> on branch
riff/<session> (forked from origin/HEAD's default branch, else current HEAD),
serialized by an atomic mkdir lock (no flock on macOS), then cds in. Any
failure falls back to cd "$RIFF_START_DIR" — a worktree problem never wedges a
session. All path logic lives in the shell scripts, never in Swift.
On session close (long-press the + and release with your finger still on the button —
while held the whole bottom bar fades to the waveform red as the hold affordance),
SessionManager.closeCurrent best-effort fires
scripts/riff-worktree-remove.sh <session> over the no-PTY control channel
(gated on the toggle) to reclaim the worktree dir — keeping the branch. It's
fire-and-forget so it can't stall/break close; a leftover worktree is harmless
(the next create's worktree prune tidies it). To use the feature, put both
scripts on the Mac's login $PATH (e.g. symlink into ~/bin).
New Session customization (harness + launch directory)
The harness is a single free-text string — the command a new session
launches, run verbatim: a binary name, an absolute path, or any shell command
(claude, /usr/local/bin/codex, codex, aider --model x). It defaults to the
literal claude. The string claude or empty keeps the native Claude
launch (RiffTmux.claudeLaunch(), worktree fork + exact claude args intact); any
OTHER string launches inline via RiffTmux.launchFor instead. There is no
Claude/Shell/Custom picker — typing $SHELL or bash reaches a bare shell through
the same verbatim path.
One shared store, last-write-wins. The harness lives under the single key
riff.ssh.harness; the launch directory is the existing riff.ssh.startDir. BOTH
surfaces — iOS Settings ▸ Session (Harness + Directory)
and the double-tap "New Session" sheet — read AND write those same two keys.
The sheet is pre-filled from the current Settings.startDir +
Settings.harness (NOT from a recents MRU), and on Create it persists the
edited values back to those keys, so the next double-tap and iOS Settings both
default to the new values. There is no separate "default" vs "last-used" — they are
the same stored values. (Settings.recentStartDirs survives only as a tappable
cwd-autocomplete affordance, never as the pre-fill.)
The + button's tap is an instant new session with those shared defaults.
When the harness is claude/empty the tap is byte-identical to before this
feature: the create path resolves a nil spec
(SessionManager.defaultSpecForPlainTap() returns nil), so it still routes through
claudeLaunch() with the worktree fork intact. Any other harness builds a spec
carrying Settings.startDir + .custom(command) and launches it inline via
launchFor (no worktree script — the launcher only knows claude). The +
ALWAYS creates a genuinely-new session and NEVER attaches to an existing one.
The next free riff-N (N ≥ 2) is computed across the union
of the live server's session names AND the in-memory pages — not the in-memory
list alone. Before minting the name, createSession re-enumerates the live
-L riff sessions (best-effort; a failed refresh falls back to in-memory only)
so it cannot pick a name that exists server-side but was dropped from the
in-memory list off-LAN (which is exactly how the + used to attach to a
forgotten/orphaned session). As a belt-and-suspenders for a name that races in
between the refresh and the create, a new-session -d that tmux rejects with
duplicate session: NAME is surfaced as DuplicateSessionError; the manager
then bumps N and retries (bounded), never attaching to the collision.
Recovering forgotten/orphaned sessions you lost access to is reconcile()'s
job (foreground / reconnect — see below), NOT the +'s; the two roles stay
distinct. Its long-press (0.5s) exactly preserves the existing close behavior:
hold and release with the finger still on the button → Close Session (the bottom
bar floods red while held, same as before). The customization menu is a separate
gesture: hold, then slide the finger ≥44pt off the button before lifting → a small
menu (a SwiftUI .confirmationDialog) opens instead of closing. (44pt is the
original close-tolerance radius, so anything that would have closed before still closes
— the menu only appears on a deliberate slide-off.) The menu:
- New Session… → opens
NewSessionMenu, a compact bottom card with exactly two fields (identical to iOS Settings) + a Create Session button: - Directory — a text field pre-filled from the current
Settings.startDir, accepting$HOME/~/ an absolute path (expanded by the login shell at-c "<dir>"). A short recents list offers tappable MRU suggestions, but the pre-fill isstartDir, not the recents. The cwd is the injection-sensitive field and is contained inside the outer login-shell-lc '…'(covered bycwdWithSpacesAndQuotesIsContainedSafely). - Harness — a single monospaced text field pre-filled from
Settings.harness, holding a command run verbatim after theenv -u TMUX -u TMUX_PANEprefix (e.g.codex,aider --model x; quote your own args, the cwd is escaped for you).claude/empty keeps the native Claude launch. - Create Session persists both edited fields back to
Settings.startDir+Settings.harness(the shared store — so the next double-tap and iOS Settings default to them), records the dir into the recents suggestions (Settings.recentStartDirs, capped 6, deduped), then creates. - Close Session (destructive) — closes the current session. (This is the same action as the plain hold-and-release-on-button; it's in the menu too so a slide-off hold can still reach it.)
How the create is plumbed (bake-at-create): the sheet builds a
NewSessionSpec { cwd, harness } and calls SessionManager.createSession(spec:)
→ SessionManaging.createSession(named:spec:) →
RiffTmux.newDetached(name, cwd:harness:) → createCommand. The cwd/harness are
optional and default to nil, falling back to the globals; a nil/nil create is
byte-identical to today (pinned by noOverrideIsByteIdenticalToTodaysNewDetached).
Every harness still launches under env -u TMUX -u TMUX_PANE via the login shell
(truecolor preserved), composed by RiffTmux.launchFor from the same live
ClaudeArgs globals as the no-override path. The string↔Harness mapping is
Settings.harness(from:) / Harness.rawString; the model + builder are
NewSessionSpec.swift; the sheet is NewSessionMenu.swift.
Worktree interaction (MVP scope): when Worktree is ON and a per-session
cwd override is given, that override is used for RIFF_START_DIR (the worktree
forks from the chosen base) — for the claude harness only. A custom
harness combined with worktree-on is out of MVP: the launcher script only knows
how to launch claude, so a custom command runs the inline launch and does NOT
route through riff-claude.sh even if the toggle is on. Most users have worktree
OFF, so this is a corner of a corner. Not in MVP: a remote directory
browser (text field + recents only).
Session reconcile & enumeration robustness (self-heal a desynced list)
The in-memory session list can silently desync from the live -L riff tmux
server when bootstrap enumeration fails quietly — an off-LAN slow handshake or
an early channel close that yields an empty/partial result indistinguishable from
"no sessions exist." The app would then show only the base riff page even
though riff-2 … riff-N were alive server-side, and recovery used to require a
force-quit on LAN. Three mechanisms close that gap:
management run()distinguishes a timeout / early-empty-close from a genuine empty list, and THROWS so callers can retry. The; echo __RIFF_EOF__sentinel is the dividing line: a successful zero-sessionlist-sessionsstill echoes the sentinel → the buffer is non-empty →.output("")(a real empty list, NOT a throw). Only a handshake that dies before the sentinel echoes resolves to a typedManagementError(.timeouton a hard-timeout with an empty buffer;.closedEmptyon an earlyonClosed(nil);.connecton a real connect/auth failure). So "server up, zero sessions" still yields exactly the base page and never errors — while the false-empty that lost sessions is now a retryable signal.- Bootstrap retries enumeration with backoff (
enumerateWithRetry, default[0, 1, 2]s, 3 attempts) before falling back to base-only. The base session still comes up immediately (time-to-first-paint is unchanged — only the extra-page enumeration waits); a genuine empty list returns[]on the first try (no retry), and a truly-unreachable Mac exhausts the retries and leaves just the base page (which drives its own visible reconnect). - The app re-enumerates +
reconcile()s on foreground and after a reconnect (TerminalScreenobservesscenePhase == .activeand the rising edge of the active session into.connected).reconcile()diffs the in-memory pages against a fresh live list WITHOUT tearing down healthy connections: it appends a page (+ eager-connect) for every live session the list forgot, removes a page whose session vanished server-side (EXCEPT the baseriff, always kept — it's attach-or-create and may be mid-creation), never reorders a surviving page, and anchors the active session by NAME across the reshuffle. A transient enumeration failure is a no-op (keep what we have) — reconcile never drops a live page just because one probe timed out, and it never kills server-side (the × is the only kill). So a desync self-heals without a relaunch; the old force-quit recovery is gone.
The no-PTY management exec's hard timeout is Settings.managementTimeout
(riff.ssh.managementTimeout, default 9s — raised from the old 4s because
off-LAN no-PTY handshakes can exceed 4s, which is what caused the silent
enumeration loss). Internal key only; no Settings row.
Requirements (regression checklist) — multi-session integrity (Features 4)
- M1 — New Session ALWAYS creates a NEW session (never attaches to an existing
one), even when the in-memory list is stale off-LAN: the name is the lowest free
riff-Nacross server ∪ memory, and aduplicate sessioncollision is detected and bumped/retried. - M2 — a desynced session list self-heals on foreground / reconnect (live sessions the in-memory list forgot appear as their own pages; pages whose session vanished server-side drop) with no relaunch. A transient probe failure keeps the current pages; reconcile never kills server-side.
- M3 — a genuine empty server (zero sessions) still yields exactly the base page and does not error (the sentinel keeps a real empty list off the retry/throw path).
Share Extension (image OR video → active session)
A Share Extension (RiffShare, bundle id mark.riff.share) puts Riff in the
iOS share sheet for a single image OR a single video — share a screenshot, a
photo, or a screen recording (from Photos or any source) and it gets attached to
the active terminal session exactly as the in-app photo/video button does. Photos
are encoded JPEG (q 0.85 — a full-res camera photo is a few MB, not ~30 MB as
PNG); videos are deposited as their original bytes, no transcoding.
The extension can't foreground the host app itself (iOS forbids a share extension from opening its container — only Today widgets may), so the media is queued, then a one-tap notification brings Riff to the front to attach it — no Shortcut required:
RiffShare/ShareViewController.swift(a bareUIViewController, programmatic principal class — no storyboard,NSExtensionPrincipalClassin the plist) pulls the first media attachment (preferring a movie, else an image): an image is re-encoded to JPEG; a video's original file is copied vialoadFileRepresentation. It writes the file into the App Group container (group.mark.riff) undershare-inbox/<epoch>-<uuid>.<ext>(the real extension —.jpg/.mov/.mp4) viaSharedImageInbox.deposit(_:ext:), then posts an "Open in Riff" local notification (Shared/ShareNotification.swift) andcompleteRequests immediately (no compose UI — the share feels instant). A local notification scheduled from an app extension is attributed to the containing app, so it shows up as a Riff banner.- Tap the "Open in Riff" notification → Riff foregrounds and the media
attaches — the one-tap, no-Shortcut handoff. The host drains the inbox on
foreground:
RiffAppposts.riffSharedMediaAvailableonscenePhase == .active, and theAppDelegateUNUserNotificationCenterDelegatere-posts it on the tap too (belt-and-suspenders if the scene was already active).TerminalScreenobserves it (it owns theSessionManager) and runs each queued file through the sameRiffClient.uploadMedia→injectTextpath as the photo/video button, consolidated onSessionManager.attachSharedMedia(it derives a content type from the extension; video routes through a dedicated large-uploadURLSession). The Mac path is typed into the active session's input line, andclaudecanRead/ffprobe/inspect it. Notification auth is the same grant as push (requested once at launch) and the handoff degrades gracefully: if notifications are denied, the deposit still lands and attaches the next time you open Riff. A multi-item share coalesces into a single banner (stable request id); one tap drains the whole queue. No Shortcut is required — the optionalShareToRiffIntentApp Intent is a Siri/Shortcuts convenience, not a prerequisite. - Cold launch / no active session yet: a file shared while Riff was
terminated stays in the inbox; the host re-drains on bootstrap and again once
the active session reaches
.connected(it is NOT dropped). A file is removed only AFTER a successful upload+inject; a failed upload leaves it queued for the next foreground.SharedImageInbox.purgeStalecaps leftover lifetime at 24h so a permanently-failing file can't accumulate. An in-flightSet<URL>guards two quick foregrounds from uploading the same file twice (dedupe).
Right-session targeting: shared media (and the in-app picker) inject into the
last-active session, not always the base riff. SessionManager persists the
active session by name (Settings.lastActiveSessionName, set on
page/create/close) and restores currentIndex from it at the end of bootstrap()
— so Riff reopens on the session you last used and media lands there.
Size cap: uploads are bounded at 200 MB (RiffClient.maxUploadBytes mirrors
the server's MAX_UPLOAD_BODY). The in-app picker pre-checks the file size and
shows "Video too large (NNN MB > 200 MB)" rather than starting a doomed upload;
the server's 413 is the backstop.
SharedImageInbox.swift lives in ios/Shared/ and is compiled into both the
host and the extension target — it is the single source of the container contract
(the type name stays SharedImageInbox for less churn; it handles media). The
extension carries only the App Group entitlement (no APNs / audio / location),
so its first provisioning against the pre-registered mark.riff.share App ID
succeeds.
Session paging — architecture & the native-pager trap (READ before touching SessionPager)
Why this is severity-1: the horizontal pager is the only way to reach your other tmux sessions. If the swipe breaks, you are trapped in whichever session is showing — every other session is unreachable. A dead swipe is not a polish bug, it's "half the app is inaccessible."
The trap — it cost two long sieges (builds 62–75, then 107–112). Riff hosts the
terminal stack in SwiftUI via UIViewControllerRepresentable. In that embedding,
UIPageViewController's own scroll pan never recognizes a touch, and neither
does SwiftTerm's UIScrollView pan — SwiftUI's gesture layer suppresses both. An
on-device gesture probe (build 111) proved it conclusively: a raw catch-all
UIPanGestureRecognizer added to the pager view logs every swipe (so the touch
does arrive), but the native pager pan stays silent and its dataSource is never
queried. Every attempt to make the native pager work failed because they all lean
on that dead pan:
| build | attempt | why it failed |
|---|---|---|
| 62–63 | velocity gate / direction-locked vertical pan | a 1-finger scroll pan on the deep terminal excludes the pager's 1-finger pan ("deeper view wins") |
| 66 | TWO-finger scroll so paging is 1-finger-clear | worked then only because the view tree predated the current embedding |
| 67–68 | require(toFail:) arbitration |
deadlocked / "fought the pager"; native pan still never fired |
| 71–72 | drop custom code, isScrollEnabled=true "native cooperation" |
the native pan is suppressed — this is the regression that broke it for good |
| 75–78 | give up, discrete pageRelative snap on release |
not interactive; later read as "swipe does nothing" |
| 107–110 | restore 66/67/68 configs, toggle isScrollEnabled |
all still depend on the dead native pan |
The fix (build 112) — own the gesture. There is no UIPageViewController and
no native paging. SessionPager → PagerHostVC lays every session's persistent
SessionPageVC side-by-side in a content strip and drives EVERYTHING off ONE plain
UIPanGestureRecognizer we add ourselves (the only kind that fires here):
- horizontal-dominant drag → translate the strip 1:1 with the finger; release past
⅓ width (or a velocity flick) commits to the neighbour, else snaps back;
- vertical-dominant drag → forward SGR mouse-wheel to tmux (+ a momentum glide);
- axis is latched once per gesture, BOTH are single-finger, and the pan uses
cancelsTouchesInView=false + simultaneous recognition so taps / typing /
keyboard / link-tap / swipe-down all still work.
terminalView.isScrollEnabled is false (in SessionController) so SwiftTerm's
own pan can't compete; we forward the wheel ourselves anyway (a tmux attach has no
local scrollback).
Scrolled-up input → copy-mode (the stray-q fix, #2). A wheel-up forwards an SGR
mouse-wheel to tmux, which (via its default WheelUpPane binding) enters copy-mode
with copy-mode -e. The -e flag means tmux auto-exits copy-mode the moment the
pane scrolls back to the live bottom — there is NO explicit cancel at the bottom.
Before delivering any input while scrolled, SessionController.exitScrollbackIfNeeded
sends a copy-mode cancel keystroke to snap the pane back to the live bottom first
— otherwise tmux eats the typed/dictated text as copy-mode key bindings.
That cancel keystroke is F12 (\u{1b}[24~), not a bare q
(SessionController.copyModeCancel is the single source of truth). .tmux.conf binds
F12 to send -X cancel in both copy-mode tables (copy-mode + copy-mode-vi) and
root-guards it: bind -T root F12 if -F '#{pane_in_mode}' { send -X cancel }. At a
live prompt pane_in_mode is 0, so tmux consumes F12 as a silent no-op — it never
reaches Claude Code. This is what makes the cancel leak-proof: unlike the old bare q
(a literal character at the prompt), firing F12 against a live prompt writes nothing.
Deploy ordering matters: the tmux binds must be live on the -L riff server
(tmux source-file ~/.tmux.conf) BEFORE a build that emits F12 ships, or F12 forwards
raw to Claude Code in the gap.
SessionController still tracks net scroll depth (scrollState/scrollDepth:
wheel-up adds ticks, wheel-down subtracts, floored at 0) and only emits the cancel when
scrollDepth > 0 — but this is now a best-effort optimization (skip the write when
we were never scrolled), not the correctness mechanism. The client-side counter can
desync from tmux's real position: an over-scroll past the top of scrollback inflates
it (so it reads >0 while tmux is already at the live bottom — the stray-q direction),
and streaming output that shifts the live bottom can leave it at 0 while still in
copy-mode (the build-174 input-eaten direction). Correctness now comes from the
emitted key being harmless at the live prompt (the root guard), not from the counter
being exact. The stray-q direction is fully fixed; the rarer input-eaten direction is
not fully solved here (it needs a tmux-side signal) — this change doesn't regress it
(the cancel still fires whenever depth>0) but don't mistake it for closed.
THE RULE: do not reintroduce UIPageViewController, and do not rely on
any nested UIScrollView/native pan for paging in this SwiftUI embedding — it will
look like it should work and silently won't (that exact assumption burned dozens
of builds). Paging lives on the custom pan in PagerHostVC. If paging breaks, first
confirm PagerHostVC still owns the pan and nobody re-enabled
terminalView.isScrollEnabled.
Requirements (regression checklist) — session paging & render
Paging + vertical scroll are driven by ONE custom pan in PagerHostVC.handlePan;
the terminal render also races page layout. The gesture/render rows (P0–P9) are NOT
testable in the simulator — verify on device on every change to SessionPager.swift,
SessionController.terminalView setup, or SessionManager geometry. The geometry
framing invariant (G0) IS now simulator-tested (RiffTests/PagerGeometryTests);
its keyboard-timing path still needs on-device verification (see ▸ Testing).
Paging (#4 — interactive finger-tracking):
- P0 — every session is reachable by swiping (it's the only way; a dead swipe =
trapped in one session — the severity-1 case above).
- P1 — horizontal drag tracks the finger 1:1. The adjacent session is revealed
as you drag (NOT a release-only snap). Release past ~⅓ width (or a flick) COMMITS;
a short release SNAPS BACK.
- P2 — no flicker / no double-slide on commit. currentIndex syncs via
manager.page(to:) in the animation completion; the page doesn't re-slide.
- P3 — vertical scroll still works (drag forwards SGR mouse-wheel to tmux).
- P4 — vertical flick momentum glides and decays to a stop.
- P5 — axis disambiguated cleanly (latched once per gesture: near-horizontal
never scrolls, near-vertical never pages).
Preserved interactions (must survive ANY paging change — the historic churn zone):
- P6 — tap raises the keyboard; typing reaches claude; the accessory row works.
- P7 — swipe-DOWN dismisses the keyboard while it's up (gated to keyboard-up).
- P8 — tapping a URL opens it; tapping empty space still raises the keyboard.
- P9 — the +/✓/photo bottom-bar buttons and the draggable cluster are
unaffected.
Geometry (the terminal must FILL its page in BOTH keyboard states — builds 114→117):
- G0 — terminal fills the screen with the keyboard DOWN and UP: no black void
below it, no rows clipped under the bottom bar / home indicator. The trap:
PagerHostVC hand-set its child page frames in viewDidLayoutSubviews, which
missed the layout pass when the keyboard toggled. Result — keyboard-up the page
(and the terminal in it) collapsed to a ~12-row sliver with a huge void below
(on-device probe: page 140pt inside a 460pt strip); at other times it rendered
full-screen with claude's bottom rows cut off under the button bar. UIPageViewController
sized its children to its bounds on every pass automatically; a custom pager must
too. The fix is Auto Layout, NOT manual frames: the terminal is pinned to its
SessionPageVC view, and each page is pinned to the content strip (top/bottom =
full height, width = pager width, leading constant = i×pageWidth). The constraint
engine then re-fits them on every bounds change. If a void or a bottom-clip ever
returns, look for a view.frame = that should be a constraint — manual child
framing in this embedding silently misses keyboard-driven resizes. The framing
invariant (given a layout pass, the terminal fills its page at any bounds) is now
simulator-tested — RiffTests/PagerGeometryTests hosts PagerHostVC in a UIWindow
and asserts it at a keyboard-down and a keyboard-up height (see ▸ Testing). The
keyboard-driven TIMING path (the actual builds-114→117 failure — a forced
layoutIfNeeded can't reproduce a missed pass) still requires on-device
verification with the keyboard up, backed by the #if DEBUG layoutPages assert.
- G1 — the center dictation button responds to the FIRST tap at cold launch,
with NO keyboard toggle first: kill Riff → cold launch → tap the Flying-V →
dictation starts. ROOT-CAUSED in build 165 (traced in code, not guessed).
History — six fixes (builds 155/156/158/160/163) each guessed at the MECHANISM
and missed: the gesture-type theories (duck audio → route the tap through a
Button → add a LongPressGesture) all failed, the recognizer was never it;
build 163's coldLaunchNudge (a 1pt frame nudge forcing a relayout) also failed.
Build 164 stopped guessing and shipped a Settings-gated hit-test probe to NAME
the layer eating the touch. The actual cause was then found by reading the code:
bottomSafeInset reads the key window's home-indicator inset, but there is no
key window during the first layout passes after a cold launch, so it returns 0
(its own doc-comment admits this). keyboardLift then can't cancel the
home-indicator double-count, so lift computes as the full ~34pt home inset
while the keyboard is down — rendering the stack in the keyboard-UP geometry
(over-tall + .clipped()), which desyncs the bar's hittable region from where
it's drawn. The button is visible but the touch lands in dead space. The first
keyboard toggle re-reads a now-valid inset → lift snaps to 0 → taps work. This
uniquely explains why 163's nudge failed: a frame nudge never touched lift.
The fix (homeIndicatorInset(windowInset:proxyInset:)): when the window read
is 0, fall back to the GeometryReader's own bottom inset — at cold launch the
keyboard is down, so the proxy bottom IS the home indicator → lift cancels to 0
on the very first layout. Once a key window exists the window read wins, so the
hard-won smooth-slide path is byte-for-byte unchanged. Guarded by
KeyboardLiftTests.coldLaunchUsesProxyInsetWhenNoKeyWindow (red→green: lift 34→0)
and keyWindowInsetWinsOverProxyWhenPresent (slide-path regression guard).
The probe stays in (Settings ▸ Diagnostic ▸ Hit-test probe — toggled live, no
rebuild) as the device-side check: with it on, a cold-launch reading should now
show lift: 0 and down/act/dict all incrementing on the first tap. If the
dead tap ever recurs, the counters name the layer:
- down=0 → the touch never reached the Button (a view ABOVE it — the pager
pan recognizer, SwiftTerm, or a misplaced hit region — swallowed it);
- down↑ act=0 → press recognized, tap canceled before firing (a competing
recognizer claimed the sequence);
- act↑ dict=0 → an early-return guard (inputHoldFired/clusterDidReposition) ate it;
- dict↑ but no dictation → onInputTap/VoiceInjectController is the culprit.
Device-only (no simulator repro of cold-launch touch timing); reproduce by
kill → launch → first-tap, ×3–5.
(Render requirements R1–R5 are added with the single-column render fix.)
Requirements (regression checklist) — Share Extension (Feature 3, #5)
The extension hands media (images + videos) to the host through the App Group;
the host reuses the in-app attach path (SessionManager.attachSharedMedia).
Verify on device — the App Group container + share sheet don't work in the
simulator.
- S1 — Riff appears in the iOS share sheet for an IMAGE or a VIDEO (screenshot
/ photo / screen recording), and NOT for text/URLs
(
NSExtensionActivationSupportsImageWithMaxCount: 1+NSExtensionActivationSupportsMovieWithMaxCount: 1, in BOTHproject.ymland the authoritativeRiffShare/Info.plist). - S2 — sharing media, then opening Riff, types the uploaded server path into
the ACTIVE session's input line — identical to the in-app photo/video button
(same
uploadMedia→injectTextpath viaSessionManager.attachSharedMedia). Photos arrive.jpg; videos arrive.mov/.mp4, full and playable (no transcode). - S3 — cold launch: media shared while Riff was terminated lands after
bootstrap connects (not dropped; stays in the inbox until a session is
.connected, re-drained on the connection-state change). - S4 — dedupe: shared media is injected exactly once, even across two
quick foregrounds (in-flight
Set<URL>guard +remove-on-success;<epoch>-<uuid>.<ext>filenames are unique). - S5 — cleanup: processed files are deleted from the App Group inbox;
purgeStaleremoves anything older than 24h that never got processed. - S6 — minimal entitlements:
RiffSharecarries ONLY the App Group (group.mark.riff) — no APNs/audio/location — so first-build provisioning against the pre-registeredmark.riff.shareApp ID succeeds. - S7 — the extension returns fast (
completeRequest), no long compose UI; the host, not the extension, foregrounds and attaches. - S8 — right-session targeting: media injects into the LAST-ACTIVE session,
not always base
riff.Settings.lastActiveSessionNamepersists by name; the bootstrap restore reopens Riff on it. - S9 — size cap: a >200 MB video shows
"Video too large…"and does NOT upload (RiffClient.exceedsUploadCappre-check; server 413 backstop). - S10 — after sharing a photo/video, an "Open in Riff" notification appears;
tapping it foregrounds Riff and the media attaches to the active session — no
Shortcut, no manual reopen (
ShareNotification.postfrom the extension; the tap is attributed to the host, where the.activedrain attaches). - S11 — exactly-once across the tap: the media is injected once even though the
tap and
scenePhase == .activeboth fire a drain (in-flightSet<URL>+SharedImageInbox.newlyClaimable+remove-on-success). - S12 — graceful degradation: with notifications denied/undetermined, the deposit
still lands and attaches on the next manual foreground — no hard dependency on
the permission (
postsilently no-ops; the inbox + foreground drain still run).
Getting started (any user)
A fresh install (distributed build) opens to a guided onboarding flow
(Onboarding/OnboardingView.swift), not the terminal — because a distributed
build ships with no host/user baked in, so dropping straight into a terminal
would just fail to connect. The flow walks you through:
- What you need — a Mac with Remote Login on, Tailscale on this phone and that Mac, and tmux + claude installed on the Mac.
- Authorize this phone — copy the on-device SSH public key (or the whole
echo '<key>' >> ~/.ssh/authorized_keyscommand) and run it on the Mac. - Host & user — your Mac's Tailscale name (
your-mac.tailXXXX.ts.net) and your macOS account short name (whoami). - Connect — a one-tap test (
SessionManagement.listSessions, a cheap SSH round-trip) proves auth + reachability, with actionable errors ("Key not authorized — did you paste the public key…", "Can't reach the host — is Tailscale on…"). On success you land in the terminal.
The onboarding writes the same UserDefaults keys Settings edits — it's just a guided first pass. Settings remains the canonical editor, and Settings → Re-run setup clears the completion flag to return to the flow (for users who change Macs). The voice button is an optional power-up (see Voice server quickstart); skipping it leaves the terminal fully functional.
Mark's own dev build pre-seeds his host/user/start-dir and marks onboarding complete (
Settings.seedDevDefaultsIfNeeded,#if DEBUG), so he goes straight to the terminal. Only DISTRIBUTED (Release/TestFlight) builds ship empty defaults and run onboarding.
OpenClaw quickstart
Already running an OpenClaw box? You've done Riff's prerequisites — a Mac on
a tailnet with claude installed. Riff is a native iOS client for that same
self-hosted Claude box, in a real terminal instead of a messaging bridge. The
30-second version:
- On the Mac your OpenClaw runs on, confirm Remote Login is ON
(
sudo systemsetup -setremotelogin on) — OpenClaw itself doesn't require it, so this is usually the one missing piece. - In Riff → Settings → copy the SSH public key; on the Mac run
echo '<key>' >> ~/.ssh/authorized_keys. - In Riff set Host = your Mac's Tailscale name (the same
*.ts.netOpenClaw reaches it by) and User = your macOS short name (whoami). - Connect. You're now driving the same
claudeinstall OpenClaw uses, but in a real terminal.
In the app, the onboarding's "Already self-hosting (OpenClaw, etc.)? Quick setup" button jumps straight to those steps.
Riff does not use or require OpenClaw — it talks to the standard Claude Code CLI over plain SSH. If you stop running OpenClaw, Riff keeps working. OpenClaw users are the target audience, not a runtime dependency: no Riff code path imports, shells out to, or assumes OpenClaw. (Optional voice button → see Voice server quickstart; the terminal alone is complete.)
Voice server quickstart (optional)
The voice button (Action Button → speak → transcript types into the live
claude terminal) is the one feature that needs a server — a tiny
riff_server on your own Mac that runs ElevenLabs Scribe v2 on your clip.
The terminal needs none of this.
Minimum setup (verified against riff_server.py boot requirements):
- Two env keys in
~/.env:RIFF_SHARED_SECRET(any 32-byte hex —openssl rand -hex 32; fatal-at-boot) andELEVENLABS_API_KEY(your own ElevenLabs key;/riff/transcribe-onlyreturns 503 without it, but the server still boots). No APNs keys or.p8— those are for the shelved chat path. - Install:
./install.sh --voice-onlyon the Mac. This validates ONLY those two keys (skipping the APNs/.p8 checks that the default install — tuned to Mark's box — would otherwise wall a voice-only stranger on), installs the server under launchd on port 8902, and prints the secret. - Pair: paste that secret into Riff → Settings → Voice server (stored in the iOS Keychain) and set the same Host.
Full walkthrough: server/QUICKSTART.md.
Voice is materially more setup than the terminal (a server process + an ElevenLabs account + a second secret). It's positioned as an optional power-up, not table stakes — the terminal alone is the product; voice is the differentiator on top.
Kept / shelved (the chat-era pieces)
| Piece | Disposition |
|---|---|
RecordingViewModel + AudioFileWriter |
KEPT, reused as-is. Capture half of voice-inject is unchanged; only the sink moved. stopAndTranscribe() is the new transcribe-only variant of stopAndSend(). |
Action Button App Intent (RiffIntents.swift, RiffToggleBus, .riffToggle) |
KEPT, receiver repointed. The intent/shortcut/bus/notification are unchanged; the .riffToggle handler moved from ChatViewModel.toggleVoice to VoiceInjectController.toggle() (TerminalScreen is the receiver; ContentView pulls the Terminal tab front). |
Scribe v2 transcription (POST /riff/audio) |
KEPT. A new POST /riff/transcribe-only shares the same Scribe helper but returns just {transcript} (no conversation write, no claude, no APNs). |
ChatView / ChatViewModel / MessageStore |
SHELVED (files compile, unlinked). Removed from the TabView; MessageStore is still injected by RiffApp so a re-link Just Works. ChatView's own .onReceive(.riffToggle) is dormant (not in the tree). |
riff_server.py chat endpoints (/riff/message, /riff/conversation, conversation store, multi-turn replay) |
SHELVED in place. Still run; the terminal never calls them. The shelved chat UI needs them if re-linked. |
| APNs push of replies | SHELVED. The terminal shows replies live; nothing to push. Registration code stays dormant. |
RiffWidget (weather widget) |
REMOVED (2026-05-28). Extracted into the standalone ~/tops/ app (its own project, bundle IDs, App Group group.mark.tops). Riff no longer ships a "Riff Weather" widget. The App Group group.mark.riff was retained — HostKeyStore (SSH known-hosts) and SharedImageInbox (RiffShare inbox) still use it. |
HMAC shared-secret auth (X-Riff-Secret) |
KEPT for /riff/transcribe-only (and everything else). |
Scope
Phase 1 (the whole iOS app): press the Action Button → Riff app
launches into foreground (works while locked) → audio session opens
in the background-audio mode → recording starts in onAppear →
Mark talks for as long as he likes (no auto-send; pauses to think are
free) → he taps Send (middle-left, thumb-reach) or presses the
Action Button again → the phone uploads the recorded audio to the
Mac mini over Tailscale → riff_server transcribes it via ElevenLabs
Scribe v2 (server-side; the key never leaves the Mac) → server drops
a riff poll event with the transcript and waits → reply comes back two
ways in parallel: synchronous response body (instant in-app) + APNs push
(readable on the lock screen if the phone has gone to sleep). No
unlocking required at any point; the only button is Send.
The capture flow was reworked (2026-05-22): recording is manual-only
(the old 1.2s silence auto-send was removed because it cut Mark off
mid-thought and SFSpeechRecognizer had a hard ~60s on-device ceiling),
and transcription moved off-device to ElevenLabs Scribe v2 for
materially better accuracy. See Why server-side cloud STT below.
Phase 2 (watchOS): Apple Watch counterpart. Tap a complication or Smart Stack widget to start recording; transcript travels to Mac mini through the paired iPhone (or directly over Tailscale on a cellular Watch); reply lands as a Watch haptic + notification. Mark doesn't own a Watch yet — defer the build until hardware arrives. Xcode simulator won't help (no mic in sim).
Why no PushToTalk
An earlier draft proposed Apple's PushToTalk framework for true
lock-screen hold-to-talk. Dropped because (1) the entitlement is gated
on "VOIP communications" use cases and solo-AI-assistant apps
historically get rejected, and (2) it doesn't actually solve a
different problem from Phase 1. iOS does not surface Action Button
press/release events to apps regardless of entitlement — the button
is a "launch this thing" trigger, not a held-down switch. Phase 1's
foreground-audio-session-with-silence-detect achieves the same
workflow (talk while locked, reply on lock screen) without the Apple
review risk.
Distribution (2026-05-24): now TestFlight-bound for the OpenClaw / self-host
crowd. Riff was personal-only (Xcode-direct-to-device); it's now being
prepared for distribution via TestFlight to people who already run a
self-hosted Claude box (a Mac on a tailnet with claude). See
Distributing Riff (TestFlight) and First-time setup. Still out of scope:
cross-platform (Android), a hosted multi-tenant riff_server backend (so users
needn't self-host the voice server), and a full public App Store release —
the latter is gated on the hosted backend, because Apple's Guideline 2.1 (App
Completeness) means a self-host-required app looks dead to a reviewer with no Mac
mini. TestFlight does not face that bar the same way (testers are told to
self-host), which is exactly why it's the right channel for the self-host era.
Architecture
iPhone (Chat tab / Action Button)
│ type or press-to-talk
▼
┌─────────────────────────────────┐
│ Riff.app (SwiftUI chat client) │
│ • ChatView: iMessage-style │
│ bubble thread + compose bar │
│ (mic-when-empty / ▲-when-text)│
│ • MessageStore: ordered thread │
│ persisted to a container JSON│
│ • voice: AVAudioEngine tap → │
│ AAC/m4a file (mono 16kHz) │
│ • signs body w/ shared key │
└────────────────┬────────────────┘
text │ │ voice
POST /riff/message POST /riff/audio (X-Riff-Conversation-Id)
{conversation_id, raw audio body, HMAC over the bytes
text} metadata in X-Riff-* headers
│ over Tailscale (no Funnel — tailnet only)
▼
┌──────────────────────────────────────────────────────┐
│ Mac mini : riff_server.py:8902 │
│ • verifies HMAC over the raw body │
│ • voice: POST audio → ElevenLabs Scribe v2 → text │
│ (key never leaves the Mac) │
│ • _converse(): append the user turn to │
│ conversations/<id>.jsonl, render the last 30 turns │
│ as a multi-turn event, drop it to the riff poll │
│ session, append Claude's reply │
│ • also writes the legacy sessions/ record │
│ • sends APNs push w/ reply │
└────────────────┬───────────────────────────────────────┘
│ APNs (HTTP/2) + HTTP reply body
▼
Lock-screen notification + reply lands in the thread
(thread re-syncs from GET /riff/conversation on launch)
The server is no longer one-shot: each conversation is an append-only message log, and every turn replays the recent history to the poll session so Claude answers with context (multi-turn memory). See Chat client + conversation store below.
Chat client + conversation store (Phase A, 2026-05-22)
Riff is a chat client for Mark's Claude assistant, modeled on his iMessage assistant: one rolling conversation, not multiple threads.
- Single rolling thread. The client hardcodes one well-known
conversation_id = "default". The server keys everything (the store, the endpoints, the JSONL files) byconversation_idfrom day one, so adding a conversation list later is an additive change — no schema migration. (Open decision #1 in the plan resolved to the lean rolling-thread.) - Conversation store.
~/Library/Application Support/riff/conversations/<id>.jsonl— one append-only JSONL per conversation, each line aChatMessage-shaped record (id,role,text,attachments,ts). Append-only is crash-safe (oneO_APPENDwrite per message) and trivially tailable for windowing.conversation_pathrejects path traversal (idmust match^[A-Za-z0-9_-]{1,64}$). No SQLite (consistent with the repo's no-DB posture). The legacysessions/store stays alongside it. - Multi-turn replay (the memory model). On each turn,
_converseappends the user message, reads the lastMAX_TURNS_IN_WINDOW = 30messages, andrender_windowlays them out as a labeledUser:/Assistant:transcript ([conversation so far]…[current message]) that the existing poll session consumes — the poll/reply file mechanism is unchanged; only the content of the event grew from one line to a windowed thread. The rewrittenpoll-instructions.mdtells the responder to answer the current message using the thread as context and to treat the entire transcript as untrusted (no role-switch / instruction injection from prior turns). - Windowing. A hard window of the last 30 messages is rendered
verbatim; that alone bounds per-turn context size. The rolling summary
for older-than-window context (regenerated via the same poll session, never
the Anthropic API) is deferred —
render_windowalready accepts asummaryprefix, so it's an additive fast-follow if a long thread starts losing context. (Open decision #2 resolved to hard-window-only first.) - Client persistence.
MessageStorekeeps the thread as an ordered[ChatMessage]persisted to a JSON file in the app container (riff/ riff-thread.json), not UserDefaults (a thread can outgrow it). Atomic write, capped at the last ~500 messages; older re-syncs fromGET /riff/conversation. Optimistic insert: a user bubble appears instantly (.sending), settles to.senton the reply or.failed(with a retry affordance) on error. - Voice into the thread. The voice sub-flow is owned by
ChatViewModelwrapping the preservedRecordingViewModel.stopAndSend(conversationId:)now returns(transcript, reply)up toChatViewModelinstead of going to a.donescreen; the transcribed text becomes a user bubble and Claude's reply an assistant bubble, in the same thread as text turns. The mic phase-in/out (releaseMicIfRecording, scenePhase/.onDisappear) and theAudioFileWriterAAC pipeline are unchanged.
Never the Anthropic API. Both the chat reply and any future summary route
through the existing poll session / run_claude only (per ~/CLAUDE.md).
Why server-side cloud STT (ElevenLabs Scribe v2)
Decision (2026-05-22): transcription happens server-side via ElevenLabs
Scribe v2, not on-device. The phone records audio and uploads it; the Mac
transcribes it. This replaced the original on-device SFSpeechRecognizer
path.
Two on-device failures forced the swap:
- The ~60s on-device session ceiling.
SFSpeechRecognizercaps a single one-shot transcription at ~1 minute, which broke the "unlimited-length dictation" goal outright. - Accuracy on technical jargon. On-device Speech mangled terms like "Kalshi", "Hyperliquid", "git rebase" — the exact vocabulary Mark dictates most.
Scribe v2 is the best mainstream STT API (~2.2% WER; the only other serious
candidate was OpenAI gpt-4o-transcribe at ~4.1%). The trade-off is
explicit and accepted: an audio upload + a network STT round-trip (and the
latency/cost that implies) in exchange for materially better accuracy and
no on-device model-download UX. Confirmed batch model id: scribe_v2
(verified live against POST /v1/speech-to-text, 2026-05-22). The realtime
variant is the distinct scribe_v2_realtime (not used). Overridable via
ELEVENLABS_STT_MODEL in ~/.env; fallback id is scribe_v1.
Batch, not streaming (v1). The whole clip uploads on Send and the server transcribes it in one Scribe call, then runs Claude, then returns the reply. There is no live transcript while speaking — only the waveform + a "Recording…" indicator, then "Transcribing…", then the reply. The user already waits on Claude's answer, so adding the STT latency to a wait he already tolerates is a small marginal cost for a large simplicity win (one HTTP POST, no WebSocket relay, no partial-result reassembly).
Streaming deferred (Phase 5+, not built). If the missing live preview
turns out to bother Mark in practice, add scribe_v2_realtime via a WS relay
through riff_server (live partials streamed back over a WS the app holds
open) as a follow-up. It is a materially bigger build (a phone↔server↔
ElevenLabs WS relay) and a pricier model (~$0.39/hr vs ~$0.22–0.28/hr batch);
spec it then, not now.
Privacy delta
Audio now leaves the device — to the Mac, then to ElevenLabs — whereas the
old path transcribed on-device and only ~2KB of text left the phone. This is
a deliberate, accepted trade for accuracy. The /riff/audio endpoint stays
tailnet-only (no Funnel), so the audio→Mac leg never touches the public
internet; only the Mac→ElevenLabs leg does (over TLS, exactly as newsfeed
TTS already sends audio to the same account).
Why Tailscale, not Funnel
Funnel is internet-public; Riff would expose a microphone-attached endpoint to the world. Tailscale ACL keeps the endpoint reachable only from Mark's devices on the tailnet — the iPhone is already on it. That's the auth layer: device on tailnet = trusted. Layer a single shared HMAC secret on top so a stolen iPhone with the tailnet still joined can be revoked by rotating the secret server-side.
Reply path
APNs notification with the response truncated to ~250 chars (the iOS
notification body limit). Tapping the notification opens the app to the chat
thread, where the full reply is the latest assistant bubble (the thread
re-syncs from GET /riff/conversation on launch).
For Mark's typical voice command ("what's on my calendar today", "send a text to Sarah saying I'm running late", "what did the kalshi calibration sweep find") the reply will fit in the notification.
Interface
iOS app
| Screen | Purpose |
|---|---|
| Chat | iMessage-style conversation thread (ChatView) — the primary surface. A scrolling bubble thread (user right/accent, assistant left/gray; oldest top, newest bottom; auto-scrolls to the newest). A compose bar that mirrors iMessage: a multiline-growing TextField, a mic button when the field is empty and a send (▲) when it has text. Tapping the mic transforms the bar into a live waveform + Recording… with send/cancel inline; the server transcribes the clip (no live transcript) and the spoken text becomes a user bubble. A typing indicator (three dots) shows while a turn is in flight; a failed send shows a red "tap to retry" on the bubble. Empty state: "Start a conversation". The thread persists across launches (MessageStore, a container JSON file) and re-syncs from GET /riff/conversation on launch. The mic phases in/out — held only while the Chat tab is active, released on background / tab-switch (so a podcast keeps playing). |
| ~~History~~ | Removed. The chat thread is the persistent history Mark scrolls. HistoryView.swift / SessionStore.swift remain in the repo (unlinked) for one release, then get deleted in a later cleanup. |
| Terminal (bottom bar) | The center Flying-V button: tap = one-shot dictation; long-press = enter/exit conversation mode (CallKit hands-free call — see Conversation mode). While in a call it shows a coral pulse. (Phase 3: "Hey Siri, call Riff" reaches the same mode.) |
Terminal — + button |
Tap = new session with the shared defaults (instant; claude/empty harness is byte-identical to before). Long-press + release on the button = Close Session (preserved exactly — bar floods red while held). Long-press + slide ≥44pt off the button opens the New Session… menu: New Session… → a card to set the launch directory and harness (a single free-text command, default claude), both pre-filled from the shared store and persisted back on Create; Close Session (destructive) closes the current one. Baked at create time via NewSessionSpec → RiffTmux.newDetached(cwd:harness:); a claude/empty harness create stays byte-identical. One shared store (riff.ssh.harness + riff.ssh.startDir), last-write-wins; Settings.recentStartDirs is a cwd-autocomplete affordance only. See New Session customization. |
| Settings | Tailscale endpoint, shared secret, notification/push diagnostics, "Send test (ping)". The iOS Settings.app pane (Settings.bundle/Root.plist) is grouped into bare sections (no per-setting footer descriptions — deliberately uncluttered): Connection (Host / User), Voice (auto-submit dictation / Narrate Claude's replies, riff.voice.narrate, default OFF, see Narration (output voice)), Session (Harness — riff.ssh.harness, a single free-text command, default claude, the command a +-tap launches, shared with the double-tap sheet — + Directory, riff.ssh.startDir), and Claude (skip-permissions / Worktree default off, see Worktree / Effort free-text, default max / Tokens auto-compact window, default 200000, blank = Claude's default). Applied at NEW-session create time. |
Mac mini server
POST /riff/message — chat text path (Phase A). JSON body
{conversation_id, text, device_id}, HMAC over the bytes (stays under
MAX_BODY = 8 KB). Routes through _converse: appends the user turn to
conversations/<id>.jsonl, renders the last MAX_TURNS_IN_WINDOW = 30
messages as a multi-turn event, drops it to the riff poll session, appends
Claude's reply, writes the legacy sessions/ record, fires APNs. Returns
{reply, conversation_id, message_id, ts_reply}. Failure modes: empty text →
400; bad/traversal conversation_id → 400; bad HMAC → 401; Claude timeout →
504 (the timeout reply is still appended to the thread).
GET /riff/conversation?conversation_id=default&limit=200 — the
conversation's messages oldest-first ({messages:[…], conversation_id}),
so the client syncs the thread on launch. limit (default 200, max 1000)
returns the last N. Bad/traversal id → 400.
GET /riff/narrate-poll?after=<epoch_float>&cwd_slug=<optional> — output
narration long-poll. The terminal's Claude REPL has no reply-text channel
back to the app (the byte stream is a repainting TUI — unspeakable). So this
endpoint tails the session transcript JSONL (~/.claude/projects/<cwd-slug>/
<session>.jsonl) instead of scraping the terminal: it locates the active
transcript (newest-mtime *.jsonl, constrained to cwd_slug if given, else
global-newest — robust to the Worktree feature's different slug and to a
mid-session cd), finds the latest completed assistant turn newer than
after (a line with type=="assistant", not isSidechain,
message.stop_reason=="end_turn", and a non-empty text block — interim
"let me check…" preambles and thinking blocks are skipped, so each turn is
spoken once), strips it to speakable prose (drops code fences, inline code,
URLs, file paths, #/@ refs, markdown markers; caps at NARRATE_MAX_CHARS =
1200), and synthesizes MP3 via ElevenLabs TTS. Holds up to
NARRATE_POLL_HOLD_S = 25 s polling every 0.4 s. Returns 200
{ts, audio_b64, chars} (base64 MP3) on a hit, 204 if no new turn within
the hold window (the client immediately re-polls with the same after),
503 if ELEVENLABS_API_KEY is absent at boot, 502 on a TTS
non-200/timeout. HMAC over an empty body (a GET — the client signs
Data(), like /riff/sessions and /riff/health). Because it returns only
the single latest turn newer than the cursor, a backlog (two fast turns)
collapses to "speak only the latest". Gated client-side by the
riff.voice.narrate toggle (default OFF). See Narration (output voice).
POST /riff/audio — voice capture path. Body: the raw audio bytes
(AAC/m4a), HMAC-signed over the exact bytes. Per-request metadata in headers:
X-Riff-Device-Id, X-Riff-Session-Id, X-Riff-Ts, X-Riff-Audio-Format
(m4a/wav), and optionally X-Riff-Conversation-Id. The server POSTs
the audio to ElevenLabs Scribe v2 (multipart file + model_id, xi-api-key
header), gets the transcript, then:
- with X-Riff-Conversation-Id → routes the transcript through the
same multi-turn _converse path as /riff/message (voice and text
share memory), returning {reply, conversation_id, message_id, ts_reply,
transcript} (the spliced-in transcript lets the client render the spoken
user bubble without a refetch);
- without the header → keeps its original one-shot behavior (writes
only a sessions/ record, no conversation append), returning
{reply, session_id, ts_reply} — back-compat for any caller on the old
path.
Body cap MAX_AUDIO_BODY = 25 MB (≈30 min mono-16kHz AAC), enforced per-path
so text endpoints stay tight at MAX_BODY = 8 KB. Failure modes: empty body →
400; bad HMAC → 401; over 25 MB → 413; Scribe timeout/error → 502 (error
session record written; Claude not called with an empty transcript); Claude
timeout → 504; ELEVENLABS_API_KEY missing at boot → 503 (STT not
configured). Scribe leg bounded by SCRIBE_TIMEOUT_S = 30, Claude leg by
CLAUDE_TIMEOUT_S = 60 (worst-case server-held time ≈90 s).
POST /riff/transcript — body: {transcript, ts, device_id, session_id} →
drops a riff poll event carrying the transcript, waits for the session's
reply file, returns {reply, session_id}, fires APNs push. Kept for tests /
back-compat / any future text client; the iOS app uses /riff/message and
/riff/audio instead. This path is one-shot (no conversation memory).
GET /riff/sessions?since=<id> — paginated list of past sessions (legacy +
chat). Each chat turn also writes a sessions/ record (now tagged with
conversation_id), so this endpoint stays meaningful.
GET /riff/health — heartbeat.
All endpoints require X-Riff-Secret: <hmac> header. Unauthenticated
requests → 401.
Action Button configuration — Riff App Shortcut (App Intent)
The Action Button is bound to the Riff App Shortcut — an AppIntent
(RiffToggleIntent) auto-registered via AppShortcutsProvider
(RiffShortcuts in RiffIntents.swift). iOS runs the intent's perform()
in the app's process (foregrounding it via openAppWhenRun = true),
which re-runs deterministically on every press. perform() posts the
existing .riffToggle notification → ChatView calls vm.toggleVoice()
and ContentView forces the Chat tab to the front. The toggle contract:
- Not recording (idle / error / done / permission) → launch + start recording.
- Already recording → send (only if audio was captured; a stray press on a silent session is a no-op).
- Already sending → no-op (a send is in flight).
There is no auto-stop: recording runs through arbitrarily long pauses and ends only on a Send tap, the Action Button (a second press = send), or Cancel.
Why an App Intent, not the old riff://toggle URL (fix, 2026-05-22):
the URL scheme's .onOpenURL does not reliably re-fire when the app is
already foregrounded, so a second Action-Button press (the send) did
nothing once Riff was open — the confirmed second-press-to-send bug. An App
Shortcut runs perform() in-process every press regardless of foreground
state, so start → send is deterministic. The intent reaches the live
ChatViewModel through the same .riffToggle notification Phase A already
wired (a tiny RiffToggleBus posts it), so the intent and the retained URL
fallback share one path.
Set up the Action Button (one-time):
1. Settings → Action Button → swipe to Shortcut → tap Choose a
Shortcut.
2. Pick the auto-registered Riff shortcut ("Toggle Riff" /
"Riff Toggle Recording"). It appears automatically after install — there
is no manual Shortcut to build (unlike the old "Open URLs
riff://toggle" flow this replaced).
That's it — no Shortcuts-app authoring step.
URL scheme retained as a fallback. riff://toggle (CFBundleURLTypes +
RiffApp.onOpenURL) is kept as a zero-cost documented fallback; it posts the
same .riffToggle notification, so it's a safety net if the App Shortcut ever
needs re-adding. The App Intent is the primary Action-Button binding.
Locked-screen Face ID requirement (OS limit — documented, not engineered around). iOS requires a Face ID / passcode unlock to launch Riff from a locked screen, even via the App Intent (confirmed on-device 2026-05-22 — same as the URL scheme behaved). This is an OS boundary, not a Riff bug, and there is no Lock Screen widget / Control Center hack that bypasses the unlock for a foregrounding app. The App Intent's win is reliability when the phone is already unlocked / the app is foregrounded — which is exactly the broken second-press case it fixes.
Requested features — working checklist
Live backlog of Mark's requests (this batch opened 2026-05-27). Every actionable item
ships with a test per Tests & the ship gate (global CLAUDE.md) — feature or bug —
and is checked against this list before shipping. Status: [ ] queued · [~] in
progress · [x] done.
- [x] Both guitar marks version-controlled —
FlyingVShape.swift(in-app) +icon-1024.png(app icon, had been gitignored by*.png). Done32b987b. - [x] One guitar mark, one source of truth — the in-app glyph and the app icon now derive from the SAME Swift source (
FlyingVShape+ the newRiffMarkstyle view); the icon PNG is rendered from it (no longer hand-exported), so the fins-rounded look and stroke/dot proportions can't drift between surfaces. See Assets — the guitar mark (single source of truth).RiffMarkRenderTests+ theFlyingVShapeTestssingle-source guards. Pending your on-device icon look. - [x] Dot in the in-app guitar — option A, baked into
FlyingVShape(filledFlyingVDotlayer) + the icon; tested (FlyingVShapeTests).05b6a1c. Live in build 130 — pending your on-device look. - [x] Keyboard rise/fall animation — terminal + button bar track the keyboard in true LOCKSTEP.
bodywraps the stack in aGeometryReader, reads the keyboard's livesafeAreaInsets.bottomeach frame (→keyboardLift), holds the stack full-height (size.height + lift, constant → no SwiftTerm reflow),.ignoresSafeArea(.keyboard)+.offset(y: -lift). No separate animation to diverge. Build 144, confirmed on-device "works very well".KeyboardLiftTests. (Dead-ends 139–143: manual.offset+.easeOutdouble-lifted / overtook the keyboard.) - [x] Dictation: duck other audio, don't stop it, don't reroute it — the real ask (the "full spectrum" wording was a misread). Recording used
mode .measurement, which grabs the audio route exclusively → a podcast/music STOPPED when dictation started; switched to.defaultso other audio ducks + resumes.a55351e, build 133. (The rainbow LED I'd built was reverted, LED left as-is —62b4fbe.) Build 158: also dropped.defaultToSpeaker+ added.allowBluetoothA2DPso the ducked audio stays on AirPods instead of rerouting to the phone speaker (the side effect Mark hit). The intended non-call workflow: press → waveform → dictate → press to stop; other audio ducks (not stops), output unchanged, built-in mic (never grabs the AirPods HFP mic). Option set lives inRecordingViewModel.dictationCategoryOptions, guarded byDictationAudioOptionsTests. - [~] Auto-compact "Context limit" — root-caused + Riff-side fix done 2026-05-27 (pending Mark's confirm it fires). My earlier "NOT a Riff bug" call was WRONG (twice: first blamed value/semantics, then tmux env propagation). Ground truth from the 2.1.153 binary:
CLAUDE_AUTOCOMPACT_PCT_OVERRIDEis parsed byjd8()into an internaltestPctOverridefield — it does NOT drive user-facing auto-compact. The real lever isCLAUDE_CODE_AUTO_COMPACT_WINDOW(a TOKEN count), read byKl(); help text: "…takes precedence. Auto-compact summarizes the conversation when context usage approaches this limit." So the pct var was inert everywhere — the=20on everyclaudeprocess (Riff, poll, digest alike) came from Mark's~/.zshrc:125export(global), NOT from Riff. Two fixes: (1) Mark's~/.zshrc:125→ changeCLAUDE_AUTOCOMPACT_PCT_OVERRIDE=20toCLAUDE_CODE_AUTO_COMPACT_WINDOW=200000(token budget; fixes laptop + Riff-by-inheritance + digest at once — new sessions only). (2) RiffClaudeArgs.env()now emitsCLAUDE_CODE_AUTO_COMPACT_WINDOW=<contextTokens>straight through (no ÷1M pct math);AutocompactArgsTestsrewritten to lock the token var. CAVEAT: no Settings UI bindscontextTokensyet, so Riff's own path is latent — the zshrc line is the live lever today (a Settings field is the obvious follow-up). Verify on device:/contextshows "Auto-compact window: N tokens (from CLAUDE_CODE_AUTO_COMPACT_WINDOW)". - [x] Preserve shared-image format (stop JPEG compression) — original bytes pass through the picker, clipboard, and share extension, labeled by a magic-byte sniff (
Shared/ImageFormat.swift); lossless PNG only as a last resort, never JPEG. Tested (ImageFormatTests).f348789. Live in build 131. - [x] Dictation button must not touch the keyboard — removed the record-start
dismissKeyboard()inonInputTap; the mic only starts/stops recording now. Build 145. - [x] Dictation waveform shows (confirmed by Mark on build 152) — long saga, two real bugs: (1) my build-144 keyboard refactor dropped the
voice.spectrumdependency through theGeometryReader/computed-property, freezing the LEDGrid (broke 144/147/149; 146 "worked" only because a debug overlay readvoicethere) → fixed build 150 with an invisible reactivity anchor that readsvoicein the keyboard closure (see [[reference_riff_waveform_reactivity_anchor]]); (2) build 150 unfroze the UI, revealing the true blocker —setActive(true)intermittently throws "Session activation failed" so recording never starts → fixed build 151 (deactivate + retry once on activation failure). The wrongly-added stale-engine guard (147) was reverted (148). - [x] Dictation transcript ends with a trailing space —
inject()writestrimmed + " ". Tested (VoiceInjectTests). Build 148. - [x] Clickable hyperlinks — literal markdown links
[label](url)are now tappable anywhere on their span, not just the bare-url substring.handleLinkTapfalls back tomarkdownLinkURL(in:atColumn:)(scans the tapped line, built fromgetCharDatain visible-row coords) when SwiftTerm finds no explicit/implicit link. OSC-8 + bare URLs keep working via the existingterm.link(at:.explicitAndImplicit)path. Tested (MarkdownLinkTests). Build 149. - [x] Repositioning the cluster shouldn't press the buttons (confirmed by Mark on build 152) — sliding the cluster slid the photo / Flying-V buttons under the finger, so their own gestures saw ~0 movement and misfired as taps (accidental dictation / photo picker). Added a
clusterDidRepositionflag set when the cluster actually slides + gating the photo and Flying-V actions; cleared on the next runloop so the same-event button release stays suppressed. (sessionButtonwas already immune — it freezes the cluster on touch.) Gesture timing → on-device verification.
Cleared 2026-05-27 (per Mark — not pursuing)
- App icon de-neon/flat guitar (
a561a50, build 135 — shipped, cosmetic final-look approval dropped). - Center the pedalboard (
12d924b, build 132 — shipped the padding bump; not chasing the terminal-gap further). - Decouple weather widget — full standalone split: DONE (2026-05-28). The widget + its
LocationCachewere extracted into a standalone app at~/tops/(own project, bundle IDsmark.tops/mark.tops.TopsWidget, App Groupgroup.mark.tops) and removed from Riff —RiffWidget/,Riff/LocationCache.swift, andRiffTests/WeatherLogicTests.swiftare gone, theRiffWidgettarget + dependency dropped, andNSLocationWhenInUseUsageDescriptionremoved (Riff no longer uses CoreLocation). The App Groupgroup.mark.riffwas retained forHostKeyStore+SharedImageInbox. The new app's first device install needs a one-time Xcode-GUI profile mint (new App IDs can't be created headlessly) — see~/tops/README.md ▸ Signing & first install. Riff's next OTA drops the widget.
Assets — the guitar mark (single source of truth)
The Riff brand mark (the Flying-V silhouette) has one source: the Swift
geometry in ios/Riff/FlyingVShape.swift (FlyingVShape + FlyingVDot) plus the
shared style in ios/Riff/RiffMark.swift. Both surfaces render RiffMark, so
they can't drift:
- In-app glyph (the center-button mark,
TerminalScreen.swift):RiffMark(ink: …).frame(width: 30, height: 36)— transparent, tight bbox fit (the silhouette fills the frame).inkis white idle,conversationCoralin-call. - Home-Screen app icon (
Assets.xcassets/AppIcon.appiconset/icon-1024.png): rendered byRiffMark(drawBackground: true)— opaque#18160fbackground, the padded authoring-box framing (the full 512 author box maps into the 1024 square, so the V keeps its natural margin and does not touch the edges, matching the original icon).
Stroke and dot are sized in 512 authoring units (RiffMark.strokeAuthor ≈ 13.4,
FlyingVDot r = 14) and multiplied by FlyingVShape.fitScale(in:), so the SAME
author weight reproduces ~1.2pt over the 30×36 glyph and ~26.8px over the 1024 icon.
The two fins (vertices 5/7) are rounded via FlyingVShape.finRound = 26; the
headstock tip and inner notch stay sharp.
The icon PNG is NOT hand-maintained — it is rendered from the Swift source:
cd ios && ./render-icon.sh # rebuild + run RiffIconGen → overwrites icon-1024.png
git diff --stat # should show only icon-1024.png changed
# review the PNG, then commit it together with any FlyingVShape/RiffMark change
render-icon.sh runs xcodegen generate, builds the RiffIconGen macOS
type: tool target (a CLI tool — no code-signing/provisioning, so it runs
headless over SSH; CODE_SIGNING_ALLOWED=NO), and runs it. The tool compiles the
SAME FlyingVShape.swift + RiffMark.swift as the app, rasterizes via
ImageRenderer at 1024 px (AppKit NSHostingView fallback if cgImage is nil),
and writes an opaque 8-bit RGB (no-alpha) PNG to match the asset's format. The
RiffIconGen scheme is separate from Riff, so ./test.sh (iOS) never builds it.
Rule: after editing FlyingVShape.swift or RiffMark.swift, re-run
render-icon.sh and commit the regenerated icon-1024.png in the same change.
The ~/www/riff-icon-*.html / riff-fin-rounding.html pages are previews only
(annotated as such) — they re-encode the geometry by hand and ship nothing.
Testing
A simulator-only Swift Testing harness (ios/RiffTests, target RiffTests)
— no physical device, no real SSH, no Mac tmux, no network, no ElevenLabs. Built
with Swift Testing (import Testing, @Test/#expect), not XCTest; the
toolchain (Xcode 26.4.1 / Swift 6.3.1) ships Testing.framework for the iOS
simulator and xcodebuild test auto-discovers @Test in the bundle.unit-test
target. The bundle is hosted in Riff.app (TEST_HOST/BUNDLE_LOADER) so
@testable import Riff resolves and the geometry test gets a real UIWindow.
Policy (per global CLAUDE.md ▸ Tests & the ship gate): every change adds a test
here — new features and bug fixes alike. A bug-fix test is red without the fix,
green with it; a feature test exercises the new behavior. Added in the same change.
And ./test.sh must pass before shipping any build (/riff-ota, /riff-update,
/riff-publish) — a green suite is a deploy precondition.
What's covered
- Logic (
SessionManagerTests,SessionControllerTests,LinkNormalizationTests): session naming/ordering (base first, deduped, stable),nextSessionNamereuse of a freed middle name, the soft cap, page clamp + active-input-dirty sync, thesetGeometrydegenerate guard + fan-out to every session,closeCurrentnavigation (left / clamp-to-zero / recreate base on last close), input-dirty tracking (printable → dirty, CR/LF → clean, escape sequences ignored), the scroll-wheel SGR byte sequences (ESC[<64;1;1Mup /ESC[<65;1;1Mdown, capped at 8 ticks), the net scroll-depth / copy-mode-state model (scrollState: wheel-up arms, wheel-down floored at 0, depth==0 ⇒ back at the live bottom so no strayq— the scroll-up-then-down case), the stray-qleak guard (scrolledUpCancelIsTheSafeKeyNeverBareQ,desyncedScrollDepthStillEmitsSafeCancelNeverBareQ: the emitted copy-mode cancel is the tmux-bound safe key F12 =\u{1b}[24~, never a bareq/0x71, even when the scroll counter has desynced past the live bottom), and link normalization. Driven through the public surface with an injectedMockSessionManagement(canned, synchronous — no SSH) plus aRecordingTransport(captures emitted bytes). - Geometry G0 (
PagerGeometryTests): hostsPagerHostVCin aUIWindowwith 2–3 unconnected mock sessions and asserts the Auto-Layout-pin invariant — every page fills the pager bounds and every terminal fills its page — at BOTH a keyboard-down (393×852) and a keyboard-up (393×516) window height. - Input routing (
PagerGeometryTests.firstResponderFollowsActivePageOnCommit): with the keyboard up, committing a swipe must hand FIRST RESPONDER to the incoming session (SwiftTerm routes typed bytes there) — guards the build-129 "typing always goes to the first session" bug. DrivescommitPageChangedirectly, so no pan synthesis is needed. - Guitar mark single source of truth (
FlyingVShapeTests,RiffMarkRenderTests— see Assets — the guitar mark): the two fins round while the rest stays sharp (finsAreRoundedButTheRestStaysSharp),fitScaleis the bbox uniform scale, the sharedstrokeAuthorreproduces the 1.2pt in-app glyph weight, the stroke scales with the mark (icon ≈ 26.8px), the icon framing keeps the authoring margin (no bbox-fill), and the icon consumes the SAME shape + dot as the glyph. The render smoke (RiffMarkRenderTests) rasterizesRiffMarkviaImageRendereron the sim and asserts a 1024×1024 NON-blank image (a real mix of#18160fbg + white ink) — the in-sim twin of what theRiffIconGentool ships, with no repo write/signing.
How to run
cd ios && ./test.sh # all tests, default sim
SIM='iPhone 17' ./test.sh # override the simulator
./test.sh -only-testing:RiffTests/PagerGeometryTests # forward extra args
make test # Makefile parity
CI / copy-paste (the raw command test.sh runs):
xcodebuild test -project ios/Riff.xcodeproj -scheme Riff \
-destination 'platform=iOS Simulator,name=iPhone 17 Pro'
test.sh runs xcodegen generate first (so the target exists on a fresh
checkout — Riff.xcodeproj is gitignored) and depends on no output
formatter (xcbeautify/xcpretty are not installed). A clean run includes a
SwiftTerm + swift-nio-ssh build and a simulator boot — it is multi-minute.
The honest limit (load-bearing). The original builds-114→117 bug was a
SwiftUI-hosting timing failure — the UIViewControllerRepresentable
embedding missed the keyboard-driven layout pass that re-ran the manual framing
code, collapsing the terminal to a ~12-row sliver. PagerGeometryTests guards
the framing math + the Auto-Layout-pin invariant (given a layout pass, the
terminal fills its page at any bounds) — it cannot reproduce the timing bug,
because a unit test that calls layoutIfNeeded() itself forces the very pass
that was being missed. The complementary on-device guard is a #if DEBUG
assert in PagerHostVC.layoutPages (deferred one runloop tick so it reads
settled bounds, gated on !settling && axis == .undecided) that trips if the
visible terminal doesn't fill the pager — stripped in Release, so it only ever
fires in a DEBUG/dev build on device, exactly where a regression would surface.
Future: XCUITest end-to-end (NOT built). True coverage of the keyboard-up
render needs an XCUITest launching the app with SSHTransport swapped for a
scripted in-process MockTransport (feeds canned bytes, never touches the
network), selected via a launch arg / env (e.g. RIFF_TEST_MODE=1) read in
RiffApp/SessionController. TerminalTransport is already the seam; what's
missing is the launch-arg selection + a deterministic mock + XCUITest
flake-management. It would raise the keyboard, screenshot, and assert the
terminal still fills the area above the keyboard. Materially larger than this
harness — deferred until the geometry bug recurs in a way the DEBUG assert
doesn't catch. (XCUITest is the only reason XCTest would re-enter this repo, and
only for that one tier.)
SourceKit noise. In-editor diagnostics here lie — "No such module
UIKit"/"No such module SwiftTerm"/"cannot find type" are all false positives. A
phase is green only when xcodebuild test exits 0 and prints
** TEST SUCCEEDED **, never on the absence of editor squiggles.
Repo layout
riff/
├── README.md # this file (spec + install + workflow)
├── install.sh # Mac mini server install + ios bootstrap
├── ios/
│ ├── project.yml # XcodeGen manifest (Riff + RiffShare + RiffTests iOS targets + RiffIconGen macOS tool; Riff + RiffIconGen schemes)
│ ├── Makefile # `make project`, `make sim`, `make sim-run`, `make test`
│ ├── test.sh # run the RiffTests suite on the simulator (SIM= override; forwards args)
│ ├── render-icon.sh # render icon-1024.png from the Swift source via the RiffIconGen tool (headless, no signing) — see Assets
│ ├── Riff.xcconfig.example # committed; copy to Riff.xcconfig
│ ├── Riff.xcconfig # generated by --bootstrap-ios; gitignored
│ └── Riff/
│ ├── RiffApp.swift # @main; injects MessageStore + SessionStore; defines Notification.Name.riffToggle; riff://toggle fallback handler
│ ├── RiffIntents.swift # Action Button App Intent (RiffToggleIntent) + AppShortcutsProvider (RiffShortcuts); RiffToggleBus posts .riffToggle in-process
│ ├── AppDelegate.swift # APNs token capture + register-device POST
│ ├── ContentView.swift # TabView: Chat / Settings (History removed)
│ ├── ChatView.swift # iMessage-style thread + compose bar (mic/▲ swap)
│ ├── ChatViewModel.swift # send orchestration (text + voice), optimistic insert, sync
│ ├── MessageStore.swift # ordered [ChatMessage], container-JSON persistence (atomic, ~500 cap)
│ ├── RecordingViewModel.swift# AVAudioSession + AVAudioEngine tap → AAC/m4a file (AudioFileWriter); stopAndSend returns (transcript, reply)
│ ├── NarrationController.swift# output voice: long-polls /riff/narrate-poll, AVAudioPlayer + ducking session; owned by TerminalScreen
│ ├── Conversation/ # hands-free CallKit conversation mode
│ │ ├── CallController.swift # CXProvider/CXCallController wrapper; owns the call + the call-owned AVAudioSession (callOwnsAudioSession)
│ │ ├── ConversationController.swift # orchestrator: ties the call to narration + voice + speech; single funnel for both entry points; half-duplex gate
│ │ └── SpeechTurnController.swift # Phase 2: on-device SFSpeechRecognizer + AVAudioEngine + NLTagger endpointing; per-turn lifecycle, turn-gated
│ ├── RiffClient.swift # HMAC-signed POST + actor wrapper (postMessage / fetchConversation / postAudio / pollNarration)
│ ├── SessionStore.swift # legacy; unlinked from ContentView (kept one release)
│ ├── HistoryView.swift # legacy; unlinked from ContentView (kept one release)
│ ├── SettingsView.swift # host/secret/version + push state + test
│ ├── Settings.swift # static config from Bundle.main
│ ├── Riff.entitlements # aps-environment: development
│ ├── Info.plist # UIBackgroundModes: audio, remote-notification, voip (voip required for CallKit conversation mode)
│ ├── FlyingVShape.swift # the Flying-V geometry (FlyingVShape + FlyingVDot + fitScale) — single source for BOTH the glyph and the icon
│ ├── RiffMark.swift # shared style view: stroke (author units × fitScale), ink, dot, optional dark bg; tight glyph fit vs padded icon framing
│ ├── Terminal/ # SSH+tmux terminal surface: SessionManager (+ attachSharedImage), SessionController, SessionManagement (+ SessionManaging), SessionPager (PagerHostVC/SessionPageVC), SSHTransport, TerminalTransport
│ └── Util/Hex.swift # Data <-> hex helpers
│ ├── Shared/
│ │ └── SharedImageInbox.swift # App Group share-inbox contract; compiled into BOTH Riff and RiffShare
│ └── RiffShare/ # Share Extension target (mark.riff.share): image/video → App Group inbox → host drains
│ ├── ShareViewController.swift # programmatic principal class; deposits the shared JPEG/movie, completeRequest fast
│ ├── Info.plist # NSExtension (com.apple.share-services, image-only activation rule)
│ └── RiffShare.entitlements # App Group only (group.mark.riff)
│ ├── RiffIconGen/ # macOS CLI tool (type: tool, no signing): renders icon-1024.png from FlyingVShape + RiffMark
│ │ └── RiffIconGen.swift # @main ImageRenderer @ 1024px → opaque RGB PNG (AppKit NSHostingView fallback); NOT named main.swift (would force top-level-code mode)
│ └── RiffTests/ # Swift Testing unit suite (simulator-only)
│ ├── SessionManagerTests.swift # naming/paging/geometry-guard/close
│ ├── SessionControllerTests.swift # input-dirty / scroll-wheel SGR / link-norm
│ ├── PagerGeometryTests.swift # G0 Auto-Layout-pin invariant in a UIWindow
│ ├── FlyingVShapeTests.swift # fin-rounding + single-source guards (fitScale, stroke proportionality, icon framing)
│ ├── RiffMarkRenderTests.swift # ImageRenderer smoke: RiffMark → 1024² non-blank (bg + ink) on the sim
│ └── Mocks/ # MockSessionManagement (canned, no SSH) + RecordingTransport (captures bytes)
├── server/
│ ├── riff_server.py # aiohttp endpoint on :8902 (chat store + multi-turn replay)
│ ├── poll-instructions.md # riff poll session contract (multi-turn responder)
│ ├── apns.py # HTTP/2 + .p8 token auth
│ └── tests/
│ ├── test_riff_server.py # HMAC, claude stub, APNs mock, conversation store + windowing
│ ├── test_narrate.py # strip_to_prose, transcript tail, /riff/narrate-poll (stubbed synth)
│ ├── test_apns.py # JWT signing, header shape, truncation
│ ├── fixtures/sample_transcript.jsonl # representative end_turn / interim / sidechain lines
│ └── manual_apns_smoke.py # real-network smoke (--yes-real-apns)
└── LaunchAgents/
└── com.mark.riff-server.plist # KeepAlive, RunAtLoad
Dependencies
iOS
- iOS 17+ (Action Button requires iPhone 15 Pro / 16 / 17 line; iOS 17 ships the modern Action Button API).
- Xcode 15+.
- Frameworks:
AVFAudio/AVFoundation(mic capture + AAC encode viaAVAudioFile+AVAudioConverter),CallKit(conversation mode — outgoing call + the call-owned audio session),Speech(SFSpeechRecognizeron-device capture, conversation mode only — see below),NaturalLanguage(NLTaggerfunction-word endpointing in conversation mode),UserNotifications,Network(Tailscale resolution),Crypto+Security(on-device ed25519 keygen + Keychain storage for the SSH identity),WatchConnectivity(Phase 2). Speechscope: normal tap-dictation transcribes server-side via ElevenLabs Scribe v2 (moved off-device 2026-05-22, because on-device STT had a ~60s continuous ceiling).Speechis back only for hands-free conversation mode, where a freshSFSpeechRecognizerrequest per turn dodges that ceiling.- Phase 3 (Siri "call Riff") will add SiriKit Calling surface
(
com.apple.developer.siri+ an Intents extension +NSSiriUsageDescription); not yet built. PushKit is NOT used (deferred VoIP-wake). - SwiftPM dependencies (added 2026-05-22 for the terminal):
- SwiftTerm (
github.com/migueldeicaza/SwiftTerm, MIT, pinnedfrom: 1.13.0) — terminal emulator + UIKitTerminalView(feed(byteArray:)to paint,TerminalViewDelegate.sendto capture keystrokes; auto-provides the Esc/Ctrl/Tab/arrows keyboard accessory row). - SwiftNIO SSH (
github.com/apple/swift-nio-ssh, Apache-2.0, pinnedfrom: 0.13.0) — pure-Swift SSH client driving the interactive PTY intotmux. No C dependency, no OpenSSL. - Both are permissively licensed by design (Riff may be sold; see Why mosh is deferred). mosh is NOT a dependency — it is GPLv3+ and was dropped.
- Build note (Xcode 26): SwiftTerm bundles Metal shaders, so a device
build needs the Metal Toolchain component
(
xcodebuild -downloadComponent MetalToolchain, one-time). Without it the build fails atCompileMetalFile Shaders.metal(all Swift still compiles).
Mac mini
- Python 3.11+ (matches
~/agents/standard). aiohttpfor the server endpoint and the outbound multipart POST to ElevenLabs Scribe v2 (already a dep; no new package — noelevenlabsPython SDK).httpx[http2]for APNs HTTP/2 push.pyjwt[crypto]for the APNs JWT (the .p8 key flow).- A running
riffpoll session (frompoll-bringup; transcripts are dropped as events and the server waits for the reply file, per Mark's "never call Anthropic API" rule — rides the Claude Code subscription). - ElevenLabs Scribe v2 for STT — reached over the public internet via
TLS, billed to the same account that funds newsfeed TTS (same
ELEVENLABS_API_KEY). See Cost & latency below.
Apple Developer Program
- Membership active (the dev email Mark received).
- APNs auth key (.p8) — generated once in Apple Developer console,
stored at
~/.ssh/apns_riff.p8mode 0600. Not committed. - Bundle ID registered under Mark's team.
- Push Notifications capability +
Background Modes: audioon the Riff target.
Secrets
Per the global CLAUDE.md, all secrets in ~/.env:
RIFF_SHARED_SECRET=<32-byte hex>
APNS_TEAM_ID=<10-char alphanumeric>
APNS_KEY_ID=<10-char alphanumeric>
APNS_BUNDLE_ID=mark.riff
APNS_PRIVATE_KEY_PATH=/Users/mark/.ssh/apns_riff.p8
ELEVENLABS_API_KEY=<elevenlabs key> # SHARED with newsfeed TTS — same var
# ELEVENLABS_STT_MODEL=scribe_v2 # optional override (default scribe_v2)
# RIFF_NARRATE_TTS_MODEL=eleven_multilingual_v2 # optional: narration TTS model
# RIFF_NARRATE_VOICE_ID=21m00Tcm4TlvDq8ikWAM # optional: narration voice (default Rachel)
Server reads these on boot.
RIFF_SHARED_SECRET in distributed (TestFlight) builds is USER-ENTERED, not
baked. Baking Mark's secret into a distributed .ipa would hand every tester
a working credential to his riff_server. So:
- Mark's local dev build (
./install.sh --bootstrap-ios) bakes the real secret + host intoios/Riff.xcconfig(gitignored) for one-tap use. - A distributed build ships an EMPTY
RIFF_SHARED_SECRET/MAC_MINI_HOST(see/riff-publish, which refuses to publish if Mark's real values are baked). A voice-button user pastes the secret their ownriff_serverprinted (./install.sh --voice-only) into Settings → Voice server, where it lives in the Keychain (Terminal/HMACSecretStore.swift, servicemark.riff.hmac) — it IS a secret, unlike the public host key. Settings.sharedSecretreads the Keychain value first, falling back to the baked xcconfig value, so both paths work without code changes. The terminal needs no secret at all; only the voice button does.
ELEVENLABS_API_KEY is the same var newsfeed's TTS already uses —
riff_server.py's load_dotenv() injects ~/.env, so the server picks it
up with no new plumbing. Reused verbatim (do not invent a new key name). If
it's absent, /riff/audio returns 503 (STT not configured) but the
rest of the server (health, text path) stays up — unlike
RIFF_SHARED_SECRET, which is fatal at boot. The key never leaves the Mac;
only the Mac→ElevenLabs leg sends it (as the xi-api-key header, over TLS).
The same ELEVENLABS_API_KEY powers output narration (TTS) — no extra key.
Two optional overrides tune the narration voice, both with safe defaults so the
feature works out of the box:
- RIFF_NARRATE_TTS_MODEL — the ElevenLabs model_id (default
eleven_multilingual_v2, a currently-shipping model; eleven_v3 works on
this account too but access varies by account, so the committed default stays
on multilingual_v2). A model the account can't use returns a non-200 → the
poll surfaces 502 quietly, no crash.
- RIFF_NARRATE_VOICE_ID — the TTS voice_id (default 21m00Tcm4TlvDq8ikWAM,
the ElevenLabs stock "Rachel" voice). STT needs no voice, so there was no
pre-existing Riff voice; swap this to a preferred clone via ~/.env.
Narration (output voice)
The input half of voice (Scribe STT → terminal) has a mirror: output
narration speaks each completed Claude turn aloud via ElevenLabs TTS. It's
opt-in, OFF by default (riff.voice.narrate in iOS Settings ▸ Voice) —
it's loud and costs credits.
Why tail the transcript, never scrape the terminal. Riff renders the live
claude CLI through SwiftTerm — a repainting TUI of ANSI escapes, box-drawing,
a spinner, and a token counter. That byte stream is unspeakable, and there's no
reply-text channel from the terminal back to the app. So the server reads the
session transcript JSONL that claude writes to
~/.claude/projects/<cwd-slug>/<session>.jsonl — structured, one line per
content block — and extracts the clean prose of the latest completed
(stop_reason=="end_turn") assistant turn. This sidesteps ANSI parsing
entirely and gets the exact text Claude emitted.
Flow. NarrationController (@MainActor, owned by TerminalScreen beside
VoiceInjectController) runs a long-poll loop against GET /riff/narrate-poll,
seeding its cursor to "now" at start() so it never speaks history already on
screen. On a hit it advances the cursor past the turn's timestamp before
playing (so the next poll can't replay it; two fast turns collapse to
latest-wins), then plays the MP3 with AVAudioPlayer. The audio session uses
.playAndRecord + .duckOthers (no .allowBluetooth, same rationale as
RecordingViewModel) so a show on another device ducks rather than stops,
and deactivates with .notifyOthersOnDeactivation afterward to restore it.
Coordination with the mic. Both narration and recording use
.playAndRecord, so TerminalScreen drives narration.setRecording(_:) off
voice.status: while recording/transcribing, narration pauses its loop and
releases the session so RecordingViewModel can claim .measurement mode
cleanly. Narration is also interrupted (skip()) the moment Mark acts —
submitting a prompt or starting to dictate — since he's moved on. On
background the loop stops; on foreground it restarts if the toggle is on. The
toggle is read live each iteration, so flipping it takes effect on the next
poll without an app relaunch.
Phase 1 ships whole-message (non-streaming) REST synthesis — simple, and a ~1.5 KB turn returns in a couple seconds. Sentence-chunked WS streaming (to cut start latency) is a deferred Phase 4, built only if the latency annoys.
Conversation mode (CallKit, hands-free) — drive with the phone locked
Hands-free, in-car use: conversation mode presents as a real outgoing system phone call (CallKit) held open for the whole session. A live call is the legitimate iOS mechanism that grants locked-screen background audio, a live mic while locked, CarPlay/Bluetooth routing, and native call controls (mute/end). It is user-initiated, not a keep-alive daemon — the app suspends normally when no call is active.
Two entry points → one identical loop. (1) Long-press the center Flying-V guitar button (tap is still one-shot dictation; long-press toggles the call). (2) "Hey Siri, call Riff" (Phase 3, via the SiriKit Calling domain — the route that can start the call from the lock screen without an unlock; not yet built). While in a call the center button shows a slow coral pulse.
The loop is turn-gated and half-duplex (no barge-in in v1): 1. Call starts → it's your turn → the mic opens. 2. You speak; an on-device endpoint detector finalizes the utterance (below). 3. The transcript is injected into the active session AND auto-submitted (conversation mode always submits — you can't edit while driving). 4. Claude's reply narrates (the call's output). The mic is CLOSED while narration plays. 5. Narration ends → the mic auto-reopens → repeat.
Capture = Apple on-device Speech (SpeechTurnController): SFSpeechRecognizer
+ SFSpeechAudioBufferRecognitionRequest with requiresOnDeviceRecognition = true,
fed by an AVAudioEngine input tap, streaming partial results. A fresh
recognition request per turn (started on mic-open, stopped on endpoint) sidesteps
the on-device recognizer's continuous-duration ceiling — the same ceiling that
sank the chat-era on-device STT (which is why normal tap-dictation uses the
ElevenLabs Scribe batch path; conversation mode is the one place Apple Speech is
used, precisely because per-turn requests dodge the limit). Speech needs only
NSSpeechRecognitionUsageDescription + a runtime requestAuthorization — no portal
capability.
Endpointing is on-device, NO LLM, zero API cost. While listening, track time
since the last newly-recognized words. On a ~3s pause, run
NLTagger(.lexicalClass) on the final token: if it's a
conjunction/preposition/determiner/filler ("and", "to", "the", "because", "um", …)
you're mid-thought → keep listening; otherwise finalize. A ~6s total-silence
backstop finalizes regardless (a complete clause you paused on). No wake word, no
cue. (These thresholds measure time-since-last-recognized-WORD, not acoustic silence
— the on-device recognizer reports partials with lag — so they run generous; a
too-eager cutoff means bump pauseSeconds. Tunable via a Settings dial if 3s isn't
the sweet spot.)
The single call-owned audio session (the load-bearing rule). Under an active
call the audio session is owned by the call — CallKit sets the category
(.playAndRecord / .voiceChat / .allowBluetooth) and activates it in
provider(_:didActivate:); the app never calls setActive/setCategory while
the call owns it. ConversationController pushes a callOwnsSession flag into
NarrationController, RecordingViewModel, and SpeechTurnController; all three
short-circuit their own session setup/teardown while it's true. This is the
build-96/97 dictation regression (guard !recording) generalized — "the call
owns the session, nobody else deactivates it." SpeechTurnController has zero
setActive/setCategory calls; it only runs its engine on the already-active
call-owned session. Narration is forced on for the call (ignores the global
riff.voice.narrate toggle — a silent "call" makes no sense).
voip background mode is required. On-device, CallKit only activated the call's
audio session (fired didActivate) once voip was added to UIBackgroundModes
(build 100); with audio alone the call brought up no audio. voip is present and
load-bearing. It does not mean Riff rings you unprompted — we register for no
VoIP pushes (PushKit wake is deferred, not built).
App Store note. CallKit-for-an-AI-assistant carries real, irreducible review risk (Apple may decide it isn't a "genuine" call). Defenses: real two-way audio, user-initiated only, honest call metadata, clean lifecycle. Fallback is to gate it behind a disclosure or ship it TestFlight/sideload-only. Unknowable in advance.
Requirements (regression checklist)
Conversation mode shares the bottom-bar gesture surface with dictation, sessions,
and the photo button, and it coordinates three audio controllers — so changes
regress each other easily (e.g. the long-press gesture once froze cluster dragging).
Check every change to TerminalScreen's bottom bar or the Conversation/
controllers against this list before shipping. Each item is verified on-device
(CallKit / Speech / locked behavior can't be tested in the simulator).
Bottom-bar gestures (one shared cluster):
- G1 — tap the guitar = one-shot dictation (Scribe path), exactly as before conversation mode existed.
- G2 — long-press the guitar (~0.5s, near-stationary) = toggle conversation mode on/off.
- G3 — the cluster stays draggable left/center/right, INCLUDING by grabbing the guitar. A horizontal drag repositions and must NOT be misread as a tap/long-press, and must NOT freeze. (Build-103 regression: the guitar's press handling must not set the cluster-freeze flag on touch-down.)
- G4 — vertical swipe on the bar hides (down) / raises (up) the keyboard.
- G5 — + tap = new session (global defaults); hold + release ON the button = Close Session (preserved exactly); hold + slide ≥44pt OFF the button = open the New Session… menu (New Session… → cwd+harness sheet; Close Session — destructive); ✓ tap = send, hold = clear; photo = attach. Unchanged by conversation mode. growWork only floods the bar red at the 0.5s threshold — it no longer pops the menu. The classification happens on RELEASE (sessionButtonRelease): on-button hold (moved < 44) → .closeSession (defers past the red-flood fade, as before); slid-off hold (moved ≥ 44) → .presentMenu (no manager mutation, fires immediately). The dirty-✓ hold (.clear) still defers past the fade.
- Which glyph (+ vs ✓) — the dirty signal. inputDirty flips to ✓
(send) when input lands on the line — a printable byte through write(), or a
non-empty dictation transcript — and back to + (new session) on submit
(CR/LF). An empty / whitespace-only dictation does NOT flip it (build 174-era
fix): VoiceInjectController.inject trims and writes nothing on empty input, so
the dictation-end no longer marks dirty unconditionally — no ✓ on an empty box.
Known limitation (unchanged): manually backspacing a line empty still leaves it
✓, since the heuristic watches bytes sent, not Claude's input widget.
- G6 — tapping a link is keyboard-neutral. Tap a URL / markdown link in the terminal: it opens in Safari and the keyboard ends in the same state it started (down stays down, up stays up). SwiftTerm's simultaneous focus-tap would otherwise raise it; the bridge snapshots keyboard state at touch-down (shouldReceive) and resigns first responder iff it wasn't already up (dismissKeyboardAfterLinkTap).
Conversation call (CallKit):
- C1 — entering places a real CallKit call (native call UI); the guitar shows a coral pulse while mode == .active.
- C2 — narration plays while the phone is LOCKED, routed to CarPlay/Bluetooth.
- C3 — native end + mute, and a second long-press, end/mute the call; on end the app suspends, mic released, narration stopped.
- C4 — voip is in UIBackgroundModes (required — without it CXCallController.request fails Code 1 "unentitled" and nothing happens).
- C5 — no setActive/setCategory while the call owns the session (Narration/Recording/SpeechTurn all gate on callOwnsSession). Static-grep on every audio change.
- C6 — no crash on the mic-reopen after narration (validate the input format before installTap; clear any stale tap first — build 102).
Hands-free turn loop (Phase 2):
- T1 — turn-gated, half-duplex: mic open only while narration is NOT speaking; auto-opens after narration ends and at call start. No barge-in in v1.
- T2 — capture is on-device SFSpeechRecognizer (conversation mode only; tap-dictation stays on Scribe), a fresh request per turn.
- T3 — endpoint = NLTagger function-word check after a ~3s pause + ~6s silence backstop. No LLM / claude -p, no stop word, no wake word.
- T4 — finalize injects AND submits. The Return is a SEPARATE, ~0.3s-delayed keystroke — an inline \r in the same burst as the text is swallowed by Claude Code's input as a literal newline (lands the text without submitting).
- T5 — a manual tap-dictation mid-call releases the hands-free mic (the two AVAudioEngines must never contend for the hardware input).
- T6 — the first conversation prompts for Speech Recognition permission.
Cost & latency (Scribe v2)
- Cost: Scribe v2 batch ≈ $0.22–0.28 per hour of audio (the
realtime variant
scribe_v2_realtime, not used, is ≈$0.39/hr). For Mark's usage — short voice commands, seconds to a couple minutes — this is fractions of a cent per request; a 30-second command ≈ $0.002. Billed to the existing ElevenLabs account that funds newsfeed TTS. - Latency (per request):
upload + STT + Claude. The upload is a small AAC clip over the tailnet (sub-second for typical clips); Scribe v2 batch returns a short clip in a few seconds; Claude is the existing wait (bounded byCLAUDE_TIMEOUT_S = 60). Net: the perceived wait grows by the upload + Scribe time (a few seconds for normal commands) on top of today's Claude-only wait — the explicit trade for ≈2.2% WER.
Install
Two install entry points, both run from ~/riff/:
# Mac mini server: deps, ~/bin symlinks, launchd job, env validation.
./install.sh
# iOS bootstrap: writes ios/Riff.xcconfig from ~/.env, then xcodegen.
./install.sh --bootstrap-ios
# Health check (HMAC-signed curl to /riff/health).
./install.sh --health
# Tear down launchd + ~/bin symlinks (keeps Application Support state).
./install.sh --uninstall
The default install prints a final status table (env keys present, .p8 permissions, launchd job loaded, server reachable, ios xcconfig + Xcode project present). Re-running is idempotent.
State lives under ~/Library/Application Support/riff/:
~/Library/Application Support/riff/
├── sessions/ # one JSON per voice command
│ ├── _index.jsonl # append-only history index
│ └── <session_id>.json
└── devices.json # {device_id: {push_token_hex, env, ts}}
The launchd plist lives at ~/Library/LaunchAgents/com.mark.riff-server.plist
(copied, not symlinked, because launchd distrusts symlinked plists). Logs
land at ~/Library/Logs/riff-server.log.
Distributing Riff (TestFlight)
Riff ships to the OpenClaw / self-host crowd via TestFlight (external testers
+ a public link). The build pipeline is automated by the /riff-publish
skill (skills/riff-publish/, the distribution counterpart to /riff-update).
Publishing is deliberate — each upload burns a build number and may trigger Beta
App Review — so it's invoked by hand, never on every commit.
/riff-publish (the automated build)
skills/riff-publish/riff-publish.sh runs: guardrail → bump
CFBundleVersion (host + widget in lockstep) → xcodegen → archive (Release) →
export via ios/ExportOptions.plist (method = app-store-connect, automatic
signing, upload symbols) → upload via xcrun altool (App Store Connect API key
from ~/.env: ASC_KEY_ID + ASC_ISSUER_ID; the .p8 at
~/.appstoreconnect/private_keys/AuthKey_<ASC_KEY_ID>.p8). --no-upload
produces the .ipa only (upload via Xcode Organizer instead).
GUARDRAIL (load-bearing):
/riff-publishREFUSES to publish ifios/Riff.xcconfigbakes a realRIFF_SHARED_SECRET(non-empty, non-zeros) or Mark'sMAC_MINI_HOST. A baked secret in a distributed.ipais a committed credential handed to every tester. Distributed builds ship an empty secret/host; a voice-button tester enters the secret in-app (Settings → Voice server). To produce a clean build:printf 'RIFF_SHARED_SECRET =\nMAC_MINI_HOST =\n' > ios/Riff.xcconfig && (cd ios && xcodegen generate).
App Store Connect (one-time, web console — Mark)
- App record for bundle id
mark.riff(Apps → +). Even a TestFlight-only app needs the record. - TestFlight test info + a reviewer note explaining the self-host model (you drive your own Mac over SSH; provide demo Mac creds if feasible, else explain why a reviewer needs their own box). For Beta App Review, which the first external build must pass — internal testers (up to 100 on the team) do NOT require it, so that's the fast initial loop.
- An external testing group ("OpenClaw / self-host beta") + a public TestFlight link (up to 10,000 testers) for self-enrollment — the distribution surface the landing page points at.
- An App Store Connect API key (Users and Access → Integrations) → store
ASC_KEY_ID+ASC_ISSUER_IDin~/.env,.p8in~/.appstoreconnect/….
Versioning, privacy, ATS
CFBundleVersionmust be unique + monotonic per upload —/riff-publishbumps it (host + widget together).CFBundleShortVersionStringis the marketing version (0.1.0).- Privacy questionnaire: Riff collects ~nothing. The terminal sends keystrokes only to the user's own Mac; audio leaves the device only if the user opts into voice → their own server → ElevenLabs. Answer "no data collected by us."
- ATS:
NSAllowsArbitraryLoads: trueis needed for the plaintext SSH/HTTP- over-tailnet transport (Tailscale provides the encryption). TestFlight generally accepts it; the App Store (Tier-B) may demand a justification. - APNs:
aps-environment: development(project.yml). TestFlight runs against production APNs. The terminal doesn't push (chat/APNs is shelved), so this is irrelevant to the distributed app — but a development-only entitlement uploaded to App Store Connect can trip validation. Verify the archive validates; if it complains, flip Release toproductionor drop the entitlement from the distributed build. (Verify-then-decide — don't blindly flip Mark's working dev build.)
Cadence
- TestFlight builds expire 90 days after upload — testers lose access; re-run
/riff-publishto refresh so the beta doesn't silently go dark. - A new external build may need Beta App Review re-approval if it adds capabilities; metadata-only changes usually don't.
- Internal loop first: upload → install from the TestFlight app on Mark's device → confirm onboarding + terminal on a Release build (catches Release-only issues: empty xcconfig secret, production APNs, signing) → then external Beta App Review.
Distributing to Mark's own phone (OTA — the primary dev-deploy)
/riff-update installs to Mark's device over the CoreDevice tunnel
(devicectl), which requires the phone on the same LAN and keeps dropping
off-LAN ("device unavailable", error 1011). The /riff-ota skill
(skills/riff-ota/) replaces it as the primary dev-deploy: it builds an
ad-hoc-signed .ipa and hosts it + an itms-services manifest + a one-tap
install page on the Tailscale Funnel (public HTTPS, Let's Encrypt cert).
Mark installs by opening the install page in Safari and tapping Install
Riff — from anywhere, off-LAN, over cellular, no cable. devicectl /
/riff-update is now the cabled fallback.
~/.claude/skills/riff-ota/riff-ota.sh # Debug build → publish → print the URL (no iMessage)
~/.claude/skills/riff-ota/riff-ota.sh --release # Release build (TestFlight-equivalent smoke)
~/.claude/skills/riff-ota/riff-ota.sh --send # ALSO iMessage the URL (default OFF — Riff reads it inline)
No-real-secret requirement (load-bearing)
A Funnel-hosted .ipa is publicly downloadable by anyone with the URL, so
the OTA build must never bake a real RIFF_SHARED_SECRET or Mark's host.
Two facts make this work with zero setup:
- The secret is an all-zeros placeholder. Mark's
RIFF_SHARED_SECRET(in~/.envandios/Riff.xcconfig) is 64 zeros — harmless to publish, and it's what the tailnet-onlyriff_serverauthenticates against. So/riff-otaships it as-is (does NOT empty it) — that's what lets voice/narration authenticate on an OTA build with no in-app entry step. The build's verify REFUSES to publish if the baked secret is ever a REAL (non-zero) value, so a public.ipastill can never leak a real secret. - The host is emptied.
/riff-otapasses onlyMAC_MINI_HOST=""as a build setting onarchive(empty host → onboarding, the "clean" choice), leavingios/Riff.xcconfigbyte-for-byte untouched (checksummed before/after) so/riff-update's cabled flow keeps its baked config.
It then unzips the exported .ipa and verifies the embedded Info.plist
carries no real secret (empty or all-zeros only) and no host before publishing.
If Mark ever sets a REAL (non-zero) secret server-side, the OTA path breaks: the build correctly refuses to bake it publicly, AND the terminal-primary build has no in-app secret-entry UI (Settings moved to the iOS Settings.app — Host + Voice toggles, but no secret field). A real secret would require adding a secret-entry UI first. Today's all-zeros placeholder sidesteps this entirely.
The flow
skills/riff-ota/riff-ota.sh: secret-less build-setting override → bump
CFBundleVersion (host + widget lockstep) → xcodegen → archive (Debug
default; --release for Release) → export via the committed
ios/ExportOptions-adhoc.plist (method = development — signs with Mark's
Apple Development cert + the development profile that embeds his registered
device UDID, the same provisioning devicectl uses; itms-services installs
it like an ad-hoc build. release-testing/ad-hoc need a Distribution cert
this keychain lacks, and minting one over SSH hits "No Accounts" — so
development is the working, lower-friction path) → verify the .ipa is
secret-less → publish Riff.ipa + manifest.plist + install.html to
~/www/riff-ota/ → print/bb-send the install-page URL
(https://marks-mac-mini.tail20af9f.ts.net/riff-ota/install.html).
The webpage-server serves the routes via dedicated handlers — the manifest as
text/xml (itms-services refuses octet-stream) and the .ipa with
Range/206 (resumable on cellular). See webpage-server/README.md.
One-time setup (Mark)
- Onboarding (one tap-through). The OTA build ships an empty baked host, so
the first launch shows onboarding (
seedDevDefaultsIfNeededno-ops without a baked host). Enter hostmarks-mac-mini.tail20af9f.ts.net, usermark. This writes UserDefaults that persist across install-over, so subsequent OTA builds open straight to the terminal. - No secret entry needed. Voice/narration work out of the box: the all-zeros placeholder secret ships baked and matches the server (see No-real-secret requirement). There's no in-app secret-entry field in the terminal-primary build anyway; the terminal needs no secret regardless.
- Ad-hoc provisioning profile (only if export fails). If automatic signing can't mint one, do the one-time portal step: developer.apple.com → Profiles →
- → Distribution ▸ Ad Hoc → App ID
mark.riff→ device00008140-001C308A2101401C→ generate → download, then re-run.
Gotchas
- Open the install page in Safari.
itms-services://links are intercepted by SpringBoard; the raw link in Messages/Mail may do nothing — hence a landing page, not the bare URL. - Provisioning-profile expiry (~1 year): a build signed with an expired
development profile won't launch (or install). Re-running
/riff-otare-signs, so routine use self-heals — re-run before debugging a months-old OTA build that won't launch. - iCloud Private Relay can break Funnel pages (TLS error on both cellular and Wi-Fi) — toggle it off for the install.
Workflows
First-time setup
- Apple Developer account: create App ID
mark.riff, enable Push Notifications + Background Audio capabilities, generate APNs auth key (.p8), download to~/.ssh/apns_riff.p8chmod 600. - Add
RIFF_SHARED_SECRET,APNS_TEAM_ID,APNS_KEY_ID,APNS_BUNDLE_ID,APNS_PRIVATE_KEY_PATHto~/.env. - From
~/riff/:./install.sh(validates env + .p8, installs pip deps if missing, symlinksserver/riff_server.pyandserver/apns.pyinto~/bin/, copies the launchd plist into~/Library/LaunchAgents/, bootstraps the job). ./install.sh --bootstrap-ios— writesios/Riff.xcconfigfrom~/.envand runsxcodegento (re)generateRiff.xcodeproj.- Sign Xcode into the Apple Developer team. Open Xcode → Settings
(Cmd-,) → Accounts → "+" → Apple ID. Sign in with the Apple ID that
owns team
6C63UU27YB. Without this,xcodebuildfor a device target fails with "No Accounts" / "No profiles for 'mark.riff'". - Enable Developer Mode on the iPhone. Settings → Privacy & Security → Developer Mode → on. Phone reboots.
- Register App ID
mark.riffin the Apple Developer console (Identifiers → "+" → App IDs → App; bundle ID Explicitmark.riff; capability: Push Notifications). The auto-signing flow in step 8 needs this entry to exist. - Plug iPhone into the Mac mini via USB-C, unlock, "Trust This
Computer" prompt → trust. From
~/riff/ios/:xcodebuild -project Riff.xcodeproj -scheme Riff \ -destination 'id=<your-iphone-udid>' \ -configuration Debug \ -allowProvisioningUpdates build installOn the iPhone: Settings → General → VPN & Device Management → trust the developer profile. - Open the Riff app once on the phone. Grant mic + speech + push
permissions when prompted. Confirm the Settings tab shows "Push
token:
" — that means the device-token POST succeeded and ~/Library/Application Support/riff/devices.jsonnow lists the phone. - iOS Settings → Action Button → swipe to "Shortcut" → Choose a
Shortcut → pick the auto-registered Riff shortcut ("Toggle Riff" /
"Riff Toggle Recording"). It appears automatically after install (the
App Intent's
AppShortcutsProviderregisters it) — no manual Shortcut to author. See Action Button configuration for the rationale and the locked-screen Face ID limit. - Smoke-test: press Action Button, say "ping", expect a reply notification within ~3s.
Typical use (after setup)
- Press Action Button.
- Riff app launches (instant) and starts recording.
- Speak the command — take as long as you like; pauses are fine.
- Tap Send (middle-left), or press the Action Button again, to end recording and upload the audio. (Cancel discards.)
- The Mac transcribes via Scribe v2, runs Claude, and a notification arrives with the reply, readable on the lock screen.
Apple Watch (Phase 2, when Mark gets a Watch)
- Smart Stack widget or complication tap → recording starts on Watch.
- Audio goes through paired iPhone if non-cellular, direct over Tailscale (Tailscale supports cellular Watches) if cellular.
- Reply: haptic tap + notification on the Watch face.
Constraints / gotchas
- Action Button is a one-shot launch trigger, not a held-down
switch. iOS does not surface Action Button press/release events
to apps. The Action Button opens a Shortcut →
riff://toggle, which launches the app (or, if already recording, sends). The audio session starts inonAppearand ends only on a manual Send / Cancel / toggle. The press-and-hold walkie-talkie ergonomic is genuinely unavailable on iOS — see the "Why no PushToTalk" section in Scope. - Lock-screen recording requires Background Audio + an active audio session that started while the device was unlocked. When the Action Button launches Riff on a locked phone, iOS shows the app and lets it run, but interaction (tap Send) requires Face ID. The Action-Button toggle (a second press = send) sidesteps the need to tap the on-screen button — pending the locked-screen verification noted in Action Button configuration.
- No on-device transcription ceiling anymore. The old
SFSpeechRecognizer~1-minute one-shot cap is gone — recording is unbounded (cap is the server'sMAX_AUDIO_BODY = 25 MB≈ 30 min of AAC). Transcription is server-side via Scribe v2; the phone only records audio. - Wispr Flow conflict: Wispr Flow holds the system audio session
for system-wide dictation. Riff explicitly opens its own audio
session with
.playAndRecordcategory and.allowBluetooth. Wispr Flow should yield the session when Riff activates — verify on first install. - Tailnet name resolution from a freshly-launched app: the first
call after a cold launch occasionally times out while the device's
tailnet routes warm up. Build a 1.5s connection timeout + one
retry into
RiffClient. - APNs push delivery is not real-time-guaranteed. For a voice reply to feel "instant" the response should come back via the same HTTP request that delivered the transcript (synchronous reply in the response body), with APNs as a redundant notification path for when the user has already locked the phone. The app handles both.
- App Store distribution is out of scope. A 7-day free signing certificate works for personal use; for a longer window, sign with the paid developer membership.
Weather widget (feels-like temp + clothing + icon)
EXTRACTED & REMOVED (2026-05-28). This widget is no longer part of Riff. It was split into a standalone app at
~/tops/(own project, bundle IDsmark.tops/mark.tops.TopsWidget, App Groupgroup.mark.tops). The design notes below are retained as history — the live code now lives in~/tops/. Riff kept the App Groupgroup.mark.riffonly forHostKeyStore+SharedImageInbox.
Riff bundles a Home Screen + lock-screen widget that shows the apparent (feels-like) temperature for Mark's current location plus an at-a-glance clothing recommendation and a weather icon. Apple's stock weather widget gives true temperature + wind separately and forces him to mental-math the wind-chill; this widget shows the answer directly, and one glance at the SHIRT / SWEATER / JACKET / PARKA label answers "what do I throw on before I walk out the door."
Why bundle it inside Riff instead of a separate app
Riff already has paid Apple Developer signing, the App ID mark.riff
registered, the iPhone UDID added to team 6C63UU27YB, and an
install.sh that knows how to rebuild + deploy via xcodebuild. A
WidgetKit extension is a sibling target to the iOS app, sharing the
parent App ID for signing. Adding a separate "Weather" app would mean
registering another App ID, registering the device again (the dance
from phase 3), and a second project tree to maintain. Bundling is
strictly less work for an additive feature.
The widget extension has its own bundle ID mark.riff.WeatherWidget
(Apple convention is <parent-bundle-id>.<extension-name>). No second
App ID registration in the Developer console — extension bundle IDs
automatically inherit the parent App ID's entitlements + provisioning.
Widget families
| family | size | content |
|---|---|---|
.accessoryRectangular |
lock-screen rectangle (~160×40pt) | Headline display, lock screen. Two-line layout, whole stack centered via VStack(alignment:.center) + outer .frame(alignment:.center) (an HStack maxWidth:.infinity does not center reliably in the accessory frame; never use a greedy Spacer — it flings temp/icon to opposite edges). Top row: large feels-like temp + large worst-of-day weather icon as a tight group (fixed 10pt gap), centered w.r.t. the bottom line. Bottom line: <wind> MPH · <actual>° · <CLOTHING>, U+00B7 middle-dot separated, single line with minimumScaleFactor(0.6) + lineLimit(1). Starting fonts 34/30/13pt, tuned by eye. The icon reflects the worst weather expected across today's local calendar day, not current conditions (see Weather icon mapping). |
.systemMedium |
Home Screen, 4×2 (~330×155pt) | Unchanged — the dense three-column layout (WIND left, big feels-like center with actual XX° beneath, precip-% + current-conditions icon + clothing right) at a roomier scale (70pt center). Optional — add it to Home Screen if you want the richer current-conditions display. |
.accessoryInline |
single line near the clock | 73°F feels like (appends · 30% rain when probability ≥ 20%). |
.accessoryCircular |
circle widget | Just 73°. |
User picks the Home Screen medium widget from the Home Screen widget picker (long-press an empty area → "+" top-left → search "Riff Weather" → pick the medium / 4×2 variant → Add Widget). The accessory variants appear in the lock-screen widget picker (long-press lock screen → Customize → tap widget row → "Riff"). Tap-to-open routes to the Riff host app's Recording tab from any family — same as launching Riff from anywhere else. No new in-app surface for the widget.
Clothing recommendation
The right column of the medium widget shows one of four labels picked from feels-like temperature plus an actual-rate "raining now" boolean:
feels-like F raining now -> label
< 50 any -> PARKA
50–59 true -> JACKET
50–59 false -> SWEATER
>= 60 true -> JACKET (rain trumps shirt-weather)
>= 60 false -> SHIRT
"Raining now" is current.precipitation > 0 mm/h, not the hourly
probability — chosen to fix the misty-day false-negative case where
probability reads 0% but it's actively spitting. The threshold is
tunable; raise to e.g. > 0.1 if trace amounts trip JACKET too often.
Edge: when feels_like_f == nil (placeholder previews or fully
degraded states), the label defaults to SHIRT — least alarming.
Weather icon mapping
The medium widget's icon is an SF Symbol picked from Open-Meteo's WMO
current.weather_code plus current.is_day:
weather_code icon
{0, 1} sun.max.fill (is_day == 1)
{0, 1} moon.fill (is_day == 0)
{2, 3} cloud.fill
{45, 48} cloud.fog.fill
{51..67, 80..82} umbrella.fill
{71..77, 85..86} snowflake
{95..99} cloud.bolt.fill
anything else / nil cloud.fill
Thunderstorm uses cloud.bolt.fill rather than umbrella.fill —
thunderstorm is a distinct hazard from generic rain and benefits from
its own glyph. One-line change in iconName(...) to flip later.
Lock-screen rectangular widget — worst-of-day icon. The
.accessoryRectangular icon does not use current
weather_code. Instead it shows the worst (most weather-severe) WMO
code across today's local calendar day, so a glance at the lock
screen at 8am warns of an afternoon thunderstorm even when the sky is
currently clear. Today's per-hour codes (hourly.weather_code[], all
24 local slots — see Data source) are reduced to one code by the pure
worstWeatherCode(of:) severity rank in OpenMeteoClient.swift:
severity (highest first → surfaced) WMO codes
thunderstorm 95, 96, 99 (also 97, 98)
snow / snow showers 71, 73, 75, 77, 85, 86
rain / rain showers 61, 63, 65, 66, 67, 80, 81, 82
drizzle / freezing drizzle 51, 53, 55, 56, 57
fog 45, 48
cloud (partly / overcast) 2, 3
clear / mainly clear 0, 1
unknown code ranked just above clear (mild)
The tier ranges in worstWeatherCode(of:) deliberately match the
iconName(...) range groupings, so the surfaced icon and the severity
ranking never disagree (rain vs drizzle is split into two severity
tiers within the single shared umbrella.fill icon group — a stricter
split that cannot change which icon shows). On a severity tie the
first-seen code at the max tier is returned (cosmetically irrelevant —
any member of a tier maps to the same icon).
If the hourly forecast fetch fails or returns an empty array,
worst_weather_code is nil and the rectangular icon gracefully
falls back to the current weather_code (rectIconCode =
entry.worst_weather_code ?? entry.weather_code). The medium widget's
icon is always current weather_code + is_day and is unaffected by
this. The rectangular icon is rendered with the day variant (it is a
daily summary; only codes 0/1 branch on is_day, so storm/rain/snow
codes are unaffected).
Data source — Open-Meteo
https://api.open-meteo.com/v1/forecast?latitude=<lat>&longitude=<lng>¤t=temperature_2m,relative_humidity_2m,wind_speed_10m,precipitation,weather_code,is_day&hourly=precipitation_probability,weather_code&forecast_days=1&timezone=auto&temperature_unit=fahrenheit&wind_speed_unit=mph
- Free, no auth, no quota worries for personal use.
forecast_hours=1was removed. Verified live (2026-05-16):forecast_hours=1+forecast_days=1together collapse both hourly arrays to a single element (precipitation_probabilitylen 1,weather_codelen 1) — so the worst-of-day reduction would only see one hour. Withforecast_days=1+timezone=autoonly, both hourly arrays are the full 24-slot local calendar day (pplen 24,wclen 24), index 0 = location-local midnight, last slot = local 23:00.timezone=autolocalizes the ISO-8601 timestamps inhourly.timeandcurrent.timebut does not change any numeric value the widget reads.- Response:
current.temperature_2m(true temp, °F — also the feels-like input; see Feels-like model below),current.relative_humidity_2m(0–100 — NWS heat-index input),current.wind_speed_10m(mph — NWS wind-chill input),current.time(local ISO-8601 — used to index the current hour),hourly.precipitation_probability[](24 local-day slots; the widget reads the slot whosehourly.timeequalscurrent.time, falling back to[0]if the timestamps don't align — % chance of precip in the current hour),hourly.weather_code[](24 local-day per-hour WMO codes — reduced to the single worst code by severity viaworstWeatherCode(of:); drives the lock-screen rectangular icon),current.precipitation(mm/h — drives the "raining now" boolean for clothing logic; documented as mm regardless oftemperature_unit),current.weather_code(WMO integer code — drives the medium icon and is the rectangular icon's fallback when the hourly array is absent),current.is_day(0/1 — selects sun vs moon for clear codes). - ~120ms typical response time over the tailnet's egress.
Apple's WeatherKit is the alternative and was considered. Rejected because it requires (a) enabling the WeatherKit capability on the App ID in the Developer console (extra step), (b) WeatherKit token issuance per request (extra JWT signing), and (c) it's a Mac-side thing not relevant to a widget extension. Open-Meteo's only downside is location metadata travels to a third-party server — documented under Risks.
Feels-like model
The big number is feels-like, computed locally by
apparentTempF(tempF:windMph:humidityPct:) in OpenMeteoClient.swift
using the Steadman shade apparent-temperature model (Australian
BOM):
AT = Ta + 0.33·e − 0.70·ws − 4.00 (Ta °C, ws m/s)
e = (RH/100)·6.105·exp(17.27·Ta / (237.7+Ta)) (vapour pressure, hPa)
History of this decision (don't undo it without reading):
- Started as Open-Meteo's
apparent_temperature— ran ~6 °F colder than Apple at ~47 °F. That field is this Steadman model plus a solar-radiation term; the radiation term is what over-cooled it. - Switched to the US NWS convention (wind chill < 50 °F, heat index ≥ 80 °F, air temp between). Fatal flaw for this use: the 50–80 °F band has no adjustment at all — feels-like == air temp, wind ignored. That band covers most of Mark's weather, so it showed feels==actual even at 67 °F/11 mph. Rejected.
- Now: plain Steadman, no radiation term. Continuous across all temperatures, wind- and humidity-sensitive (no dead band) — this is how Apple's "feels like" behaves directionally.
Not Apple-identical — Apple's formula is proprietary. Steadman reads
a few °F below actual whenever it's breezy (that is apparent
temperature; ~62–63 °F at 67 °F/11 mph). If it lands consistently off
Apple in one direction, add a flat calibration offset in
apparentTempF (noted in its doc comment). Inputs all come from the
same current= block; units already match the formula (°F, mph, RH
0–100).
Location
CoreLocation with When-In-Use permission. Widgets re-acquire
location on each timeline reload (Apple permits this for widgets).
Fallback chain:
CLLocationManager.requestLocation()— fresh one-shot fix.CLLocationManager.location— last cached coord from CoreLocation.UserDefaultskeylastKnownCoord— written by the host Riff app on every foreground launch, read by the widget when 1 and 2 fail.- If all three fail, widget renders a "Tap to grant location" prompt that opens the host app, which deep-links to Settings.
The NSLocationWhenInUseUsageDescription plist string is shared
between host and widget: "Riff shows the feels-like temperature for
your current location on the lock screen." Both plists need it; the
widget's plist is auto-generated by XcodeGen.
Refresh cadence
TimelineProvider returns 4 entries spaced 30 minutes apart on every
reload, so the widget shows fresh-looking numbers for 2 hours without
a re-fetch. Apple's documented widget budget is ~70 timeline reloads
per day; 48 reloads/day (one every 30 min) sits comfortably inside.
If the phone has been locked overnight and the widget's last reload was
8h ago, the displayed number may be stale — the widget shows a tiny
relative-time stamp (updated 14m ago) so Mark can tell at a glance.
On unlock, the widget re-renders within ~5s under normal operation.
Code structure (additive to Riff)
ios/
├── Riff/ # existing host app (unchanged)
└── RiffWidget/ # widget extension target
├── RiffWidgetBundle.swift # @main bundle, lists WeatherWidget
├── WeatherWidget.swift # Widget definition + family list
├── WeatherProvider.swift # TimelineProvider: CoreLocation + Open-Meteo
├── WeatherEntry.swift # TimelineEntry: temp_f, feels_like_f, wind_mph, precip_prob_pct, precip_mm_h, weather_code, worst_weather_code, is_day, fetched_at
├── WeatherView.swift # SwiftUI views per family (medium/rect/inline/circular); pure helpers clothingLabel(...) and iconName(...) live at the bottom of this file (file-private; testable in isolation if a test target is ever added)
├── OpenMeteoClient.swift # Stdlib URLSession wrapper, decodes JSON
└── Info.plist # generated by XcodeGen
ios/project.yml gains a RiffWidget target with type:
app-extension, extensionType: WidgetKit, the standard widget
entitlements (com.apple.security.application-groups shared with the
host so the lastKnownCoord UserDefaults handoff works), and the
WhenInUse location plist string.
Phase 3 of Riff (widget add)
| phase | scope | depends on | gating |
|---|---|---|---|
| 3 | Lock-screen feels-like-temp widget (Open-Meteo + CoreLocation + WidgetKit) | Phase 1 shipped + device install completed (already done) | none past current state |
No new Apple-console steps. No new env vars or secrets. The widget
ships in the same xcodebuild build install pass as the host app.
Verification
- Install build, launch Riff once, grant location permission.
- Long-press an empty area of the Home Screen → "+" top-left → search "Riff Weather" → pick the medium (4×2) variant → Add Widget. (Lock- screen accessory variants use the lock-screen widget picker: Customize → tap widget row → "Riff".)
- Within a minute the widget should populate with a number; if it sticks on "—" check Settings → Privacy → Location Services → Riff is set to "While Using".
- Walk around the block — refresh should pick up location change at next 30-min reload (or sooner on lock-state change).
- Sanity-check the clothing label against the on-screen feels-like
number and the truth table above (
< 50→ PARKA,50–59and dry → SWEATER,>= 60and dry → SHIRT, any temp with rain → JACKET/PARKA).
Risks (widget-specific)
| risk | mitigation |
|---|---|
| CoreLocation flakiness in widget extensions — Apple permits but advises caching | Three-step fallback chain ending in app-shared UserDefaults; "Tap to grant" prompt when all sources fail. |
| Open-Meteo egress: device lat/lng → third-party server | Documented. Mark accepts this for free, no-auth, no-quota access. Alternative is WeatherKit (Apple-first-party) at the cost of more setup; logged but not chosen for v1. |
| Stale data after long lock | Relative timestamp visible on the widget. On unlock the widget re-renders, which usually triggers a refresh within Apple's budget. |
| Widget exhausts daily timeline budget | 30-min cadence yields 48 reloads/day vs Apple's ~70/day budget. Plenty of headroom. |
| User denies location permission | Widget renders "Tap to grant" prompt; tap opens host app's Recording tab, which surfaces a one-shot Location permission ask. If still denied, widget shows "—". |
Phases
| phase | scope | depends on | gating |
|---|---|---|---|
| 1 | Action Button → app → record → on-device transcribe → POST → riff poll event → synchronous reply + APNs push | Apple Developer membership, APNs key, Tailscale on iPhone | none past membership |
| 2 | watchOS target, Smart Stack widget, complication, voice + haptic reply | Phase 1 shipped, Mark has an Apple Watch | Watch hardware |
| 3 | Lock-screen feels-like-temp widget | Phase 1 shipped | none |
Roadmap / backlog (added 2026-05-22)
Direction set with Mark on 2026-05-22 — Riff is evolving from a one-shot voice Q&A into a full native chat client for his Claude assistant (mirroring how he already talks to it over iMessage), with best-in-class frictionless voice input.
- Manual unlimited dictation + Scribe v2 STT — built 2026-05-22 (commit
fa1f8bf); server deployed + verified end-to-end; iOS client build pending device install. No auto-send (pause as long as you want), middle-left Send, Action Buttonriff://toggle(press = start, press = send), audio uploaded to the Mac → ElevenLabs Scribe v2 → Claude. Replaced the on-device SFSpeechRecognizer path (~60s cap + first-word VAD cutoff). The mic phases in/out — held only while the record screen is active, and released the instant the app backgrounds or the user switches tabs (scenePhase→.background/.onDisappear→setActive(false, .notifyOthersOnDeactivation)), so it never hogs the mic or keeps another app's audio (a podcast) interrupted; returning to the record screen re-acquires. - Full iMessage-style chat interface — building incrementally across three phases.
- Phase A — chat thread + persistence + text input + multi-turn server memory. Built 2026-05-22. Scrolling bubble thread (
ChatView), text and voice input into one rolling conversation, on-device persistence (MessageStore), and a server that keeps each conversation's history (conversations/<id>.jsonl) and replays the last 30 turns to the poll session every turn (_converse+render_window). New endpointsPOST /riff/message+GET /riff/conversation;/riff/audiogained an optionalX-Riff-Conversation-Id(one-shot stays back-compat without it). The History tab was removed (the thread is the history). See Chat client + conversation store. Deploy needs a riff-server reload and ariffpoll-session restart —poll-instructions.mdchanged. - Phase B — file / image attachments. Not built. Compose-bar attachment button (PhotosPicker + document picker); phone uploads files with a message (multipart); the server saves them under
attachments/and references the paths so the poll session canReadthem (images → vision like diet-log, docs → read). The Phase A compose bar already renders a disabled attachment button as the placeholder. - Phase C — Action Button App Intent. Built 2026-05-22 (client-only — no server / poll change). Replaced the
riff://toggleURL re-fire (broken second-press-to-send when foregrounded) with an in-processAppIntent(RiffToggleIntent) auto-registered as an App Shortcut (RiffShortcutsinRiffIntents.swift).perform()runs in-process on every press and posts the existing.riffTogglenotification (via a tinyRiffToggleBus), so the Phase A wiring (ChatView→toggleVoice(),ContentView→ Chat tab) is reused verbatim.openAppWhenRun = trueforegrounds the app. Theriff://toggleURL scheme +.onOpenURLare kept as a zero-cost documented fallback (same notification). The locked-screen Face ID requirement is an OS limit (confirmed on-device 2026-05-22), documented not engineered around. Bind via Settings → Action Button → Shortcut → the auto-registered Riff shortcut. See Action Button configuration. - TTS read-aloud toggle — ElevenLabs voice-out; when on, Riff speaks the reply (in addition to / instead of showing it). Enables hands-free.
ELEVENLABS_API_KEYalready present (shared with newsfeed TTS). - "Hey Siri, Riff" invocation — a Siri Shortcut / App Intent that starts Riff listening. Combined with the read-aloud toggle this is the full hands-free + in-car loop (speak → answer spoken back), using only supported APIs.
- Apple Watch — see Phase 2 (watchOS target), when Mark has a Watch.
- CarPlay — investigated, not viable. Apple gates CarPlay to fixed app categories (audio / nav / comms / EV / …) with locked templates; a custom AI-assistant chat UI doesn't qualify and the entitlement isn't grantable for it. The in-car experience is delivered via "Hey Siri, Riff" + read-aloud instead — no CarPlay entitlement needed.
Risks
| risk | mitigation |
|---|---|
| ~~On-device Speech accuracy poor for technical jargon~~ (resolved 2026-05-22) | Resolved by the cloud swap. Transcription moved off-device to ElevenLabs Scribe v2 (~2.2% WER) precisely to fix jargon accuracy ("Kalshi", "Hyperliquid", "git rebase"). The audio is uploaded to the Mac, which transcribes it. |
Scribe model id (scribe_v2) drift |
Module constant ELEVENLABS_STT_MODEL (one-line change), overridable via ~/.env; fallback id scribe_v1. Confirmed live 2026-05-22: scribe_v2 accepted, text field present. |
| Large-upload retry double-billing | A blind retry after a timeout could re-run Scribe + double-post to the poll session. RiffClient.postAudio retries only on connection-never-established errors (cannotConnectToHost/notConnectedToInternet), never on timedOut. |
| Audio leaves the device (privacy) | Deliberate, accepted trade for accuracy. /riff/audio is tailnet-only (no Funnel), so audio→Mac never hits the public internet; only Mac→ElevenLabs does (TLS, same as newsfeed TTS). See Privacy delta. |
AAC encode / AVAudioFile format friction on device |
Tap buffers (hardware format) are converted to the file's mono-16kHz processing format via AVAudioConverter in AudioFileWriter before writing. Documented fallback if it ever fights the format on a specific route: a parallel AVAudioRecorder (Option B). |
| Tailscale on iPhone disconnects (reset, OS update) | App detects connection failure, surfaces "Tailscale offline" in the recording UI; airplane-mode mid-record then Send fails with the normal offline error (no transcript fallback, by design). Mark re-enables Tailscale and retries. No iMessage fallback in v1 (would re-introduce the original friction). |
| APNs auth key compromise | Stored at ~/.ssh/apns_riff.p8 mode 600, not committed; rotate by generating a new key in Apple Developer console and updating ~/.env. |
| Wispr Flow doesn't yield the audio session | Document the Settings flag to disable Wispr Flow temporarily; surface a "audio session unavailable" error in the app if recording fails to start. |
| Personal-use signing expires every 7 days for free certificates | Mark has paid membership — use his team's certificate for 1-year expiry. Document re-signing cadence in install.sh. |
Divergences from the original spec
The spec was written before the build; these are the deltas the build introduced and that future readers should know about.
- Pivot to an SSH terminal + voice-inject (2026-05-22): the biggest
direction change since the chat build. The primary surface is no longer the
iMessage-style chat thread — it's a SwiftTerm terminal attached over SSH to
a persistent
tmuxsession runningclaude, with the Action Button dictating transcribed text straight into the live REPL. The chat UI (ChatView/ChatViewModel/MessageStore) and the chat server endpoints are shelved (kept, unlinked) — see Terminal architecture + Kept / shelved. New iOS code underRiff/Terminal/:TerminalTransport(the swap seam),SSHTransport(SwiftNIO SSH → PTY → tmux),TerminalSurface(SwiftUI wrap of SwiftTermTerminalView),TerminalController(transport + auto-reconnect),TerminalScreen(the view),VoiceInjectController(record → transcribe-only → inject),SSHKeyStore(on-device ed25519 → Keychain). New server endpointPOST /riff/transcribe-only. New Mac-sidescripts/riff-tmux-up.sh+LaunchAgents/com.mark.riff-tmux.plist. - mosh deliberately NOT built (2026-05-22): an earlier plan staged mosh as
the eventual transport (instant local echo + roaming). Dropped because mosh
is GPLv3+ and Riff is being built so it could be sold — a GPL transport
imposes distribution obligations and is App-Store-incompatible. SwiftTerm
(MIT) + SwiftNIO SSH (Apache-2.0) keep the stack permissive. Network changes
are handled by SSH auto-reconnect (re-attach to the persistent tmux
session), not roaming. The
TerminalTransportseam is where aMoshTransportcould land if the licensing decision ever changes. - SSH client = SwiftNIO SSH (not libssh2): pure-Swift, no C build, no
OpenSSL. The interactive PTY is an
execof the tmux attach-or-create line under a requested PTY (not a plain loginshell+ injected keystrokes — theexec-under-PTY path is more robust and avoids shell-prompt timing races). -
Host key: TOFU pinning (hardened 2026-05-24): the original
AcceptAllHostKeysDelegate(trust any host key, lean on the tailnet ACL) was replaced byPinnedHostKeyDelegate+HostKeyStore— trust-on-first-use, pin thereafter, hard-fail a changed key. See Host-key trust (TOFU pinning). This was the security gate for distributing Riff beyond Mark. -
State path: an earlier draft of this README placed sessions under
$HOME/riff/sessions/. Implementation moved every Riff artifact (sessions/,_index.jsonl,devices.json) under~/Library/Application Support/riff/to match the global CLAUDE.md convention for per-project artifacts (newsfeed, trade, webpage-server all live there too). - Simulator target: spec listed iPhone 16 Pro as the dev sim. The
Mac mini only has iPhone 17 series simulators installed (17, 17e, 17
Pro, 17 Pro Max). Phase 2 verification used iPhone 17 Pro. The
hardware target is Mark's iPhone 16 Pro Max (
RogersNet, iPhone17,2 — note: that "17,2" is the Apple model identifier for the 16 Pro Max, not the iPhone 17 line). - Wispr Flow yield: still unverified on real device as of this
README. The audio-session yield assumption rests on Wispr Flow being
a normal foreground-mic app that yields
.playAndRecordto whichever app activates it most recently. The app surfaces "audio session unavailable" on failure; if that happens in practice, the user-action workaround is to force-quit Wispr Flow before pressing the Action Button. Will update this paragraph after first device smoke. - Action Button: still requires the user to set "Open App: Riff" in iOS Settings — no programmatic surface for that. Documented in the install steps above.
- Push registration UI: the Settings tab in the iOS app shows the
current notification authorization status, a hex-truncated push
token preview, the timestamp of the most recent successful
registration, and any registration error. This is for diagnostic
visibility —
~/Library/Application Support/riff/devices.jsonis the authoritative record on the server. - HMAC keying (2026-05-15): the server originally keyed
hmac.new(secret.encode("utf-8"), …), which used the 64-char hex string as a 64-byte ASCII key. The iOS client builds a CryptoKitSymmetricKeyfromData.fromHex(hex), a 32-byte raw key — so every request from iOS 401'd. The server now keys withbytes.fromhex(secret)to match the iOS interpretation, andmain()fails fast at boot if the secret isn't valid hex. - Recording permission gate (2026-05-15): the recording flow now
has a
.permissionRequired(missing:)phase that surfaces a Settings deep-link when mic permission is denied, instead of silently entering.recordingagainst a dead engine ("(listening…)" with a flat waveform). Mic permission uses the iOS 17AVAudioApplication.requestRecordPermissionAPI. The Recording tab now also shows a manual "Start Recording" / "Try Again" button when in.idle/.errorso the app is recoverable without quitting. - Capture rework — manual send + server-side Scribe v2 STT
(2026-05-22): the biggest change since Phase 1. (1) Auto-send
removed — the 1.2s silence/stability timer is gone; recording ends
only on a manual Send, Cancel, or Action-Button toggle, so Mark can
pause to think without being cut off. (2) Send moved to the
middle-left (a circular thumb-reach button) with Cancel demoted to a
slim bottom bar. (3) Action Button now opens
riff://toggle(a custom URL scheme +.onOpenURL→.riffTogglenotification →vm.toggle()/ Record-tab select) instead of "Open App: Riff". (Superseded by the Phase C App Intent, 2026-05-22 — the URL scheme is now a fallback; the Action Button binds to the Riff App Shortcut. See Action Button configuration + the Phase C roadmap entry.) (4)SFSpeechRecognizerripped out entirely (import Speech, the speech-auth, andNSSpeechRecognitionUsageDescriptionall gone); the phone records audio to an AAC/m4a file (mono 16kHz ~32kbps viaAVAudioFile+AVAudioConverter, encapsulated in an off-actorAudioFileWriterso the audio render thread writes without crossing@MainActorisolation) and uploads the raw bytes to the newPOST /riff/audio. (5)riff_servertranscribes server-side via ElevenLabs Scribe v2 (transcribe_elevenlabs, shared_run_and_respondtail with the text path) and feeds the transcript into the unchanged poll/Claude pipeline. (6) The live transcript UI is gone (waveform + "Recording…" stay;.sendingreads "Transcribing…"). Client timeouts raised to 95s request / 150s resource to cover the two-stage Scribe+Claude budget plus the upload. Batch only — streaming (scribe_v2_realtime) is deferred.
See also
~/agents/imessage-dispatcher/— the iMessage path this app is designed to bypass. The voice → claude-CLI plumbing on the Mac side is conceptually similar.~/agents/scripts/bb-send.sh— iMessage relay; used today for replies in the iMessage flow. Riff replies use APNs instead.~/agents/webpage-server/— HTTP server pattern; riff_server.py follows the same launchd-managed shape.- Apple docs: Speech framework on-device transcription, APNs HTTP/2 token auth, Background Modes (audio).