brain/
← all entities
entitygenericpatia

Clicky

Notes

Clicky

One-line summary: Open-source macOS menu-bar AI assistant that captures screenshots, responds by voice, and animates a cursor overlay that points at UI elements — a reference implementation of the screen-aware "point and ask" pattern.

What it is

A Swift/macOS menu-bar app (farzaa/clicky, MIT license, ~4,400 stars as of ingest). User holds a global hotkey (Control + Option), speaks a question, and Clicky captures audio plus a screenshot, sends them to Claude for reasoning, plays the response via ElevenLabs TTS, and can animate a floating cursor overlay that points at specific UI elements while explaining them.

The self-description: "an AI teacher that lives as a buddy next to your cursor."

Why it matters to patia

Clicky is not a competitor — it is a reference implementation of a pattern patia could eventually adopt for a v2+ feature. The failure mode it addresses is one SMS and voice chat cannot easily fix: the user cannot describe what they are looking at. A senior who cannot find a setting, identify which button to press, or articulate what a dialog box is saying is stuck inside a conversational-only helper. A screen-aware assistant closes that gap.

Whether this pattern meaningfully helps seniors specifically is an open question tracked at screen-aware-ai-for-seniors.

Key facts

  • License: MIT
  • Platform: macOS 14.2+ only; no Windows, Linux, or web version
  • Frontend: Swift (two transparent NSPanel windows — one control, one fullscreen cursor overlay)
  • Screen capture: ScreenCaptureKit; requires Screen Recording and Accessibility permissions
  • Speech-to-text: AssemblyAI (streaming)
  • Reasoning / vision: Claude (Anthropic); the model can emit directional tags like [POINT:x,y:label:screenN] that the client animates
  • Text-to-speech: elevenlabs
  • Backend: Cloudflare Workers (free tier) as an API proxy
  • Interaction: push-to-talk hotkey; voice in, voice out, with visual pointer overlay

Strengths (from patia's perspective)

  • Demonstrates the pattern end-to-end — we can point at a working implementation, not a concept slide
  • Stack is portable — screenshot + STT + vision-capable LLM + TTS is reimplementable on web/PWA or native mobile
  • Open source — the [POINT:x,y:label:screenN] protocol for directing overlay gestures is a reusable idea
  • Integrates with tools patia already uses (Claude) or plans to evaluate (elevenlabs)

Weaknesses (from patia's perspective)

  • macOS-only — most patia pilot seniors will be on iOS, Android, or Windows; the reference implementation does not run on the target platforms
  • Four API keys to wire (Anthropic, AssemblyAI, ElevenLabs, Cloudflare) — even with a proxy, this is infeasible for a non-technical user
  • Permission-gated by screen recording and accessibility access — dialogs that fraud-aware seniors are already primed to refuse (see senior-fraud-susceptibility)
  • Always-on menu-bar monitoring is the wrong privacy posture for a shame-aware product; a senior-facing version would likely need per-session explicit screen-share instead

Open questions

  • Does the [POINT:x,y] overlay reduce or amplify cognitive load for users with selective-attention deficits? (See risks summarized at screen-aware-ai-for-seniors.)
  • Is the right product form for patia seniors a Clicky-style always-on companion, or a bounded "help me now" mode the user explicitly opts into?
  • Could the same protocol work over a web session (browser extension + Chrome's getDisplayMedia API) rather than native-app territory?

Sources

Related

Referenced by