Gemini Speech-to-Text Skill

Transcribe audio files using Google's Gemini API or Vertex AI. Default model is gemini-2.0-flash-lite for fastest transcription.

Authentication (choose one)

Option 1: Vertex AI with Application Default Credentials (Recommended)

CODEBLOCK0

The script will automatically detect and use ADC when available.

Option 2: Direct Gemini API Key

Set GEMINI_API_KEY in environment (e.g., ~/.env or ~/.clawdbot/.env)

Requirements

- Python 3.10+ (no external dependencies)
Either GEMINIAPIKEY or gcloud CLI with ADC configured

Supported Formats

- .ogg / .opus (Telegram voice messages)
INLINECODE6
INLINECODE7
INLINECODE8

Usage

CODEBLOCK1

Options

Option	Description
INLINECODE9	Path to the audio file (required)
INLINECODE10, INLINECODE11

Supported Models

Any Gemini model that supports audio input can be used. Recommended models:

Model	Notes
INLINECODE20	Default. Fastest transcription speed.
INLINECODE21

See Gemini API Models for the latest list.

How It Works

1. Reads the audio file and base64 encodes it
Auto-detects authentication:

- If ADC is available (gcloud), uses Vertex AI endpoint - Otherwise, uses GEMINIAPIKEY with direct Gemini API

3. Sends to the selected Gemini model with transcription prompt
Returns the transcribed text

Example Integration

For Clawdbot voice message handling:

CODEBLOCK2

Error Handling

The script exits with code 1 and prints to stderr on:

- No authentication available (neither ADC nor GEMINIAPIKEY)
File not found
API errors
Missing GCP project (when using Vertex)

Notes

- Uses Gemini 2.0 Flash Lite by default for fastest transcription
No external Python dependencies (uses stdlib only)
Automatically detects MIME type from file extension
Prefers Vertex AI with ADC when available (no API key management needed)

Gemini 语音转文本技能

使用 Google 的 Gemini API 或 Vertex AI 转录音频文件。默认模型为 gemini-2.0-flash-lite，提供最快转录速度。

身份验证（二选一）

选项 1：使用应用默认凭据的 Vertex AI（推荐）

bash
gcloud auth application-default login
gcloud config set project YOURPROJECTID

脚本在可用时会自动检测并使用 ADC。

选项 2：直接使用 Gemini API 密钥

在环境变量中设置 GEMINIAPIKEY（例如 ~/.env 或 ~/.clawdbot/.env）

系统要求

- Python 3.10+（无外部依赖）
配置了 GEMINIAPIKEY 或已配置 ADC 的 gcloud CLI

支持的格式

- .ogg / .opus（Telegram 语音消息）
.mp3
.wav
.m4a

使用方法

bash

自动检测身份验证（优先尝试 ADC，然后尝试 GEMINIAPIKEY）

python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg

强制使用 Vertex AI

python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex

使用特定模型

python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --model gemini-2.5-pro

使用特定项目和区域的 Vertex AI

python ~/.claude/skills/gemini-stt/transcribe.py /path/to/audio.ogg --vertex --project my-project --region us-central1

使用 Clawdbot 媒体文件

python ~/.claude/skills/gemini-stt/transcribe.py ~/.clawdbot/media/inbound/voice-message.ogg

选项

选项	描述
<audio_file>	音频文件路径（必填）
--model, -m

支持的模型

任何支持音频输入的 Gemini 模型均可使用。推荐模型：

模型	说明
gemini-2.0-flash-lite	默认。最快转录速度。
gemini-2.0-flash

查看 Gemini API 模型获取最新列表。

工作原理

1. 读取音频文件并进行 base64 编码
自动检测身份验证：

- 如果 ADC 可用（gcloud），则使用 Vertex AI 端点 - 否则，使用 GEMINIAPIKEY 直接调用 Gemini API

3. 将转录提示发送到选定的 Gemini 模型
返回转录文本

集成示例

用于 Clawdbot 语音消息处理：

bash

转录传入的语音消息

TRANSCRIPT=$(python ~/.claude/skills/gemini-stt/transcribe.py $AUDIO_PATH)
echo 用户说：$TRANSCRIPT

错误处理

脚本在以下情况下退出并返回代码 1，并将错误信息输出到 stderr：

- 没有可用的身份验证（既没有 ADC 也没有 GEMINIAPIKEY）
文件未找到
API 错误
缺少 GCP 项目（使用 Vertex 时）

说明

- 默认使用 Gemini 2.0 Flash Lite 实现最快转录速度
无外部 Python 依赖（仅使用标准库）
根据文件扩展名自动检测 MIME 类型
在可用时优先使用带 ADC 的 Vertex AI（无需管理 API 密钥）

gemini-sttGemini语音转录

gemini-stt

Gemini Speech-to-Text Skill

Authentication (choose one)

Option 1: Vertex AI with Application Default Credentials (Recommended)

Option 2: Direct Gemini API Key

Requirements

Supported Formats

Usage

Options

Supported Models

How It Works

Example Integration

Error Handling

Notes

Gemini 语音转文本技能

身份验证（二选一）

选项 1：使用应用默认凭据的 Vertex AI（推荐）

选项 2：直接使用 Gemini API 密钥

系统要求

支持的格式

使用方法

自动检测身份验证（优先尝试 ADC，然后尝试 GEMINIAPIKEY）

强制使用 Vertex AI

使用特定模型

使用特定项目和区域的 Vertex AI

使用 Clawdbot 媒体文件

选项

支持的模型

工作原理

集成示例

转录传入的语音消息

错误处理

说明

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

gemini-sttGemini语音转录

gemini-stt

Gemini Speech-to-Text Skill

Authentication (choose one)

Option 1: Vertex AI with Application Default Credentials (Recommended)

Option 2: Direct Gemini API Key

Requirements

Supported Formats

Usage

Options

Supported Models

How It Works

Example Integration

Error Handling

Notes

Gemini 语音转文本技能

身份验证（二选一）

选项 1：使用应用默认凭据的 Vertex AI（推荐）

选项 2：直接使用 Gemini API 密钥

系统要求

支持的格式

使用方法

自动检测身份验证（优先尝试 ADC，然后尝试 GEMINIAPIKEY）

强制使用 Vertex AI

使用特定模型

使用特定项目和区域的 Vertex AI

使用 Clawdbot 媒体文件

选项

支持的模型

工作原理

集成示例

转录传入的语音消息

错误处理

说明

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement