Skip to content

A GitHub Copilot CLI plugin adding image drag-and-drop, screenshot paste (Phase 1), and audio recording with transcription (Phase 2)

License

Notifications You must be signed in to change notification settings

dotnetspark/copilot-multimodal

Repository files navigation

copilot-multimodal

A GitHub Copilot CLI plugin that adds image and audio input to your Copilot conversations.

  • Phase 1 (Image): Drag image files into the CLI or paste screenshots from your clipboard
  • Phase 2 (Audio): Record audio with /record and have it transcribed and answered

Requirements

  • GitHub Copilot CLI installed
  • Node.js 20+
  • For audio (Phase 2): sox on macOS/Windows, arecord on Linux
  • For transcription (Phase 2): OpenAI API key

Installation

1. Build the MCP server

cd mcp-server
npm install
npm run build
cd ..

2. Install the plugin

copilot plugin install ./plugin

3. Verify installation

copilot plugin list

You should see copilot-multimodal in the list.

4. (Phase 2 only) Set up transcription

export OPENAI_API_KEY=your-openai-api-key
# Add to ~/.bashrc or ~/.zshrc to make permanent

Usage

Image Input — Drag & Drop

  1. Start a Copilot CLI session
  2. Drag an image file into the terminal window — the file path appears in the input
  3. Type your question after the path and press Enter
/Users/me/screenshots/error.png What is causing this error?

The multimodal agent automatically detects the image path and includes it in the analysis.

Image Input — Screenshot Paste (Ctrl+V)

Use the clipboard watcher to make Ctrl+V work just like drag-and-drop:

1. Start the clipboard watcher (once per session)

Open a separate PowerShell window and run:

.\clipboard-watcher\start.ps1

It prints a ready message and stays running in the background.

2. Take a screenshot and paste it

  1. Press Win+Shift+S and select a region — screenshot goes to clipboard
  2. Switch back to Copilot CLI
  3. Press Ctrl+V — pastes [📷 copilot-image-xxxx.png] into the input
  4. Type your question and press Enter
[📷 copilot-image-cf6510.png] what does this UI error mean?

The agent reads the image token exactly like drag-and-drop.

macOS / Linux: Cmd+Shift+Ctrl+4 (macOS) or your distro's screenshot tool, then Ctrl+V. The watcher script works on any platform that supports PowerShell 5.1+.

Without the watcher

If you don't run the watcher, you can still ask about a screenshot by describing it visually:

Here's the screenshot I just took — what does this error mean?

The agent will call read_clipboard_image when it detects visual-intent language.

Audio Input — Recording (Phase 2)

Start a recording session:

/record

Copilot responds: 🎙️ Recording... type /stop when done

Speak your question, then type:

/stop

Copilot transcribes the audio and answers your spoken question.

To cancel a recording:

/cancel

Power Users: Create shell aliases for faster invocations:

# .bashrc / .zshrc
alias rec='/record'
alias srec='/stop'
alias crec='/cancel'

Note: Due to GitHub Copilot CLI Plugin API limitations, audio recording is triggered via slash commands rather than a keyboard shortcut. See ARCHITECTURE.md for details.

Choosing an Agent

The plugin provides one agent:

Agent Description
multimodal Full image + audio support — handles drag-and-drop, clipboard paste, and voice recording

Switch agents in your Copilot CLI session:

/agent multimodal

Troubleshooting

"No image found in clipboard"

Make sure you've copied an image (not just text). Use your OS screenshot tool to capture to clipboard, not to a file.

"Image too large — exceeds 5MB"

Resize or crop the image before pasting. Most screenshots are well under 5MB; this typically happens with RAW photos or very large exports.

"sox not found" / "arecord not found"

Install the audio recording tool for your platform:

  • macOS: brew install sox
  • Windows: choco install sox (or download from https://sox.sourceforge.net)
  • Linux: sudo apt install sox or use built-in arecord

"Transcription requires OPENAI_API_KEY"

Set your OpenAI API key:

export OPENAI_API_KEY=sk-...

Plugin not loading after changes

Re-install to pick up changes:

npm run build -C mcp-server
copilot plugin install ./plugin

Development

Running tests

cd mcp-server
npm test

Tests run on Windows, macOS, and Linux via GitHub Actions CI.

Project structure

See ARCHITECTURE.md for full architecture documentation and data flow diagrams.

About

A GitHub Copilot CLI plugin adding image drag-and-drop, screenshot paste (Phase 1), and audio recording with transcription (Phase 2)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors