Clicky
Clicky
One-line summary: Open-source macOS menu-bar AI assistant that captures screenshots, responds by voice, and animates a cursor overlay that points at UI elements — a reference implementation of the screen-aware "point and ask" pattern.
What it is
A Swift/macOS menu-bar app (farzaa/clicky, MIT license, ~4,400 stars as of ingest). User holds a global hotkey (Control + Option), speaks a question, and Clicky captures audio plus a screenshot, sends them to Claude for reasoning, plays the response via ElevenLabs TTS, and can animate a floating cursor overlay that points at specific UI elements while explaining them.
The self-description: "an AI teacher that lives as a buddy next to your cursor."
Why it matters to patia
Clicky is not a competitor — it is a reference implementation of a pattern patia could eventually adopt for a v2+ feature. The failure mode it addresses is one SMS and voice chat cannot easily fix: the user cannot describe what they are looking at. A senior who cannot find a setting, identify which button to press, or articulate what a dialog box is saying is stuck inside a conversational-only helper. A screen-aware assistant closes that gap.
Whether this pattern meaningfully helps seniors specifically is an open question tracked at screen-aware-ai-for-seniors.
Key facts
- License: MIT
- Platform: macOS 14.2+ only; no Windows, Linux, or web version
- Frontend: Swift (two transparent
NSPanelwindows — one control, one fullscreen cursor overlay) - Screen capture:
ScreenCaptureKit; requires Screen Recording and Accessibility permissions - Speech-to-text: AssemblyAI (streaming)
- Reasoning / vision: Claude (Anthropic); the model can emit directional tags like
[POINT:x,y:label:screenN]that the client animates - Text-to-speech: elevenlabs
- Backend: Cloudflare Workers (free tier) as an API proxy
- Interaction: push-to-talk hotkey; voice in, voice out, with visual pointer overlay
Strengths (from patia's perspective)
- Demonstrates the pattern end-to-end — we can point at a working implementation, not a concept slide
- Stack is portable — screenshot + STT + vision-capable LLM + TTS is reimplementable on web/PWA or native mobile
- Open source — the
[POINT:x,y:label:screenN]protocol for directing overlay gestures is a reusable idea - Integrates with tools patia already uses (Claude) or plans to evaluate (elevenlabs)
Weaknesses (from patia's perspective)
- macOS-only — most patia pilot seniors will be on iOS, Android, or Windows; the reference implementation does not run on the target platforms
- Four API keys to wire (Anthropic, AssemblyAI, ElevenLabs, Cloudflare) — even with a proxy, this is infeasible for a non-technical user
- Permission-gated by screen recording and accessibility access — dialogs that fraud-aware seniors are already primed to refuse (see senior-fraud-susceptibility)
- Always-on menu-bar monitoring is the wrong privacy posture for a shame-aware product; a senior-facing version would likely need per-session explicit screen-share instead
Open questions
- Does the
[POINT:x,y]overlay reduce or amplify cognitive load for users with selective-attention deficits? (See risks summarized at screen-aware-ai-for-seniors.) - Is the right product form for patia seniors a Clicky-style always-on companion, or a bounded "help me now" mode the user explicitly opts into?
- Could the same protocol work over a web session (browser extension + Chrome's
getDisplayMediaAPI) rather than native-app territory?
Sources
Related
- elevenlabs — Clicky's TTS dependency; also a Phase 2 voice candidate for patia
- screen-aware-ai-for-seniors — the open question this pattern raises
- senior-tech-competitive-landscape — screen-aware AI is an adjacent emerging pattern, not a current category
- senior-fraud-susceptibility — permission-dialog friction and honeypot risk