Note: phiên bản Tiếng Việt của bài này ở link dưới.

https://duongnt.com/voice-to-code-vie

voice-to-code

Virtually all AI Agents nowadays only support text input. But when I’m deep in a coding session with hands on keyboard and eyes on code, switching context to type out a long explanation breaks my flow. Voice input lets me talk through ideas naturally: brainstorming architecture, explaining a bug I’m seeing, or planning my next steps without losing momentum. That’s why my colleague Genki Sugimoto and I developed voice-to-code to allow controlling Amp/Claude Code/pretty much any AI Agents, using only our voice.

You can download the app from the link below.

https://github.com/duongntbk/voice-to-code

Quick demo

Here is a 1-minute demo of this app.

Key features

  • Uses whisper-mic to perform speech-to-text conversion. The model runs locally, doesn’t require a network connection (except when downloading it the first time), and is 100% free.
  • Integrates with AI Agents via tmux. This means the tool supports any AI Agent that has a CLI mode, including but not limited to Amp, Claude Code,…
  • Runs on both MacOS and Linux. Support for Windows is a work in progress, and contributions are welcomed.
  • Supports controlling multiple agent sessions at once, allowing hot-swapping between sessions.

General flow

Why whisper-mic and tmux?

whisper-mic: We wanted speech recognition that runs locally, works offline, and costs nothing. whisper-mic wraps OpenAI’s open-source Whisper model, giving us high-quality transcription without sending audio to external servers. Your voice data stays on your machine.

tmux: Rather than building custom integrations for each AI Agent, voice-to-code sends transcribed text to a tmux session—the same way you would type. This means the tool works with any CLI-based AI Agent, present or future, without requiring updates to voice-to-code itself.

How to use

Prerequisites

  • macOS and Homebrew, or Linux
  • tmux, portaudio, and ffmpeg
  • Python 3.x
  • AI Agent with CLI installed and configured

Start voice-to-code app

You can download the pre-built version for your system below.

Alternatively, you can clone the repo and start the app from source.

git clone git@github.com:duongntbk/voice-to-code.git

# Inside the repo
python -m venv venv
./venv/bin/pip install -r requirements.txt
source venv/bin/activate && python main.py

Note: the pre-built file for Linux defaults to use PyTorch-CPU. If you want to use the GPU version, you have to start the app from source.

The app should look like this.

App

And here is the settings page. Please check out the README for details about each config.

Settings page

Control multiple AI Agents with your voice

Start one tmux session for each AI Agent you want to control, give each one a session name. Inside each session, start your AI Agent like normal.

tmux new-session -s <your session name>
amp

# Or to resume a thread
amp threads continue <thread ID>

The default session name is ai-voice-input, case sensitive. But you can add/remove as many session names as you like.

Add new session name

And you can use this drop down to switch between sessions. Note: this can be done on the fly, while the app is running. Whichever session is selected when you speak will receive the transcribed text.

Switch between sessions

Send voice input to AI Agents

Below are some examples. I told the AI Agent to update the help page of this app to say Report all issues instead of Report issues. The speech-to-text was not 100% correct, but the AI Agent still got the gist of it and started working. We can also see that the transcribed text was sent to tmux session ai-voice-input.

First example

First example result

Then I switched session to ai-voice-input-2 and told another AI Agent to proofread my README file. We can see that the transcribed text was sent to ai-voice-input-2, and the other agent also started working.

Second example

Second example result

The second agent reported back that there are a few points in the README it can improve, and asked me for confirmation. I replied back to allow it to make the change, still using the same session.

Third example

Third example result

Conclusion

I had a lot of fun developing this app, and I use it almost every day—especially when I want to have a back-and-forth conversation with AI Agents to flesh out my ideas without breaking my coding flow.

Current limitations:

  • Speech recognition accuracy varies with accents and technical jargon
  • Windows support is still in progress

If you find this useful or want to contribute (especially Windows support!), feel free to open an issue or PR on the GitHub repo. I hope it can be useful for you too.

A software developer from Vietnam and is currently living in Japan.

Leave a Reply