Sheetsmith
Overview
Sheetsmith is a lightweight pandas wrapper that keeps the focus on working with CSV/Excel files: previewing, describing, filtering, transforming, and converting them in one place. The CLI lives at
skills/sheetsmith/scripts/sheetsmith.py, and it automatically loads any CSV/TSV/Excel file, reports structural metadata, runs pandas expressions, and writes the results back safely.
Quick start
- 1. Place the spreadsheet (CSV, TSV, or XLS/XLSX) inside the workspace or reference it via a full path.
- Run
python3 skills/sheetsmith/scripts/sheetsmith.py <command> <path> with the command described below. - When you modify data, either provide
--output new-file to save a copy or pass --inplace to overwrite the source file. - Check
references/usage.md for extra sample commands and tips.
Commands
summary
Prints row/column counts, dtype breakdowns, columns with missing data, and head/tail previews. Use
--rows to control how many rows are shown after the summary and
--tail to preview the tail instead of the head.
describe
Runs
pandas.DataFrame.describe(include='all') (customizable with
--include) so you instantly see numeric statistics, cardinality, and frequency information. Supply
--percentiles to add additional percentile lines.
preview
Shows a quick tabulated peek at the first (
--rows) or last (
--tail) rows so you can sanity-check column order or formatting before taking actions.
filter
Enter a pandas query string via
--query (e.g.,
state == 'CA' and population > 1e6). The command can either print the filtered rows or, when you also pass
--output, write the filtered table to a new CSV/TSV/XLSX file. Add
--sample to inspect a random subset instead of the entire result.
transform
Compose new columns, rename or drop existing ones, and immediately inspect the resulting table. Provide one or more
--expr expressions such as
total = quantity * price. Use
--rename old:new and
--drop column to reshape the table, and persist changes via
--output or
--inplace. The preview version (without writing) reuses the same
--rows/
--tail flags as the other commands.
convert
Convert between supported formats (CSV/TSV/Excel). Always specify
--output with the desired extension, and the helper will detect the proper writer (Excel uses
openpyxl, CSV preserves the comma separator by default, TSV uses tabs). This is the simplest way to normalize data before running other commands.
Workflow rules
- - Always keep a copy of the raw file or write to a new path; the script will only overwrite the original when you explicitly demand
--inplace. - Use the same CLI for both exploration (
summary, preview, describe) and editing (filter, transform). The --output flag works for filter/transform so you can easily branch results. - Behind the scenes, the script relies on pandas +
tabulate for Markdown previews and supports Excel/CSV/TSV, so ensure those dependencies are present (pandas, openpyxl, xlrd, tabulate are installed via apt on this system). - Use
references/usage.md for extended examples (multi-step cleaning, dataset comparison, expression tips) when the basic command descriptions above are not enough.
References
- - Usage guidelines:
references/usage.md (contains ready-to-copy commands, expression patterns, and dataset cleanup recipes).
Resources
- - GitHub: https://github.com/CrimsonDevil333333/sheetsmith
- ClawHub: https://www.clawhub.ai/skills/sheetsmith
Sheetsmith
概述
Sheetsmith 是一个轻量级的 pandas 封装工具,专注于 CSV/Excel 文件的处理:预览、描述、筛选、转换和格式转换一站式完成。CLI 位于 skills/sheetsmith/scripts/sheetsmith.py,可自动加载任何 CSV/TSV/Excel 文件,报告结构元数据,运行 pandas 表达式,并安全地写回结果。
快速开始
- 1. 将电子表格(CSV、TSV 或 XLS/XLSX)放入工作区,或通过完整路径引用。
- 运行 python3 skills/sheetsmith/scripts/sheetsmith.py <命令> <路径>,命令如下所述。
- 修改数据时,提供 --output 新文件 保存副本,或传递 --inplace 覆盖源文件。
- 查看 references/usage.md 获取更多示例命令和技巧。
命令
summary
打印行/列数、数据类型分布、缺失数据列以及头部/尾部预览。使用 --rows 控制摘要后显示的行数,使用 --tail 预览尾部而非头部。
describe
运行 pandas.DataFrame.describe(include=all)(可通过 --include 自定义),立即查看数值统计、基数和频率信息。提供 --percentiles 可添加额外的百分位行。
describe
显示快速表格化的前(--rows)或后(--tail)行预览,以便在操作前检查列顺序或格式。
filter
通过 --query 输入 pandas 查询字符串(例如 state == CA and population > 1e6)。该命令可打印筛选后的行,或同时传递 --output 将筛选后的表格写入新的 CSV/TSV/XLSX 文件。添加 --sample 可检查随机子集而非全部结果。
transform
创建新列、重命名或删除现有列,并立即检查结果表。提供一个或多个 --expr 表达式,如 total = quantity * price。使用 --rename 旧名:新名 和 --drop 列名 重塑表格,通过 --output 或 --inplace 持久化更改。预览版本(不写入)复用与其他命令相同的 --rows/--tail 标志。
convert
在支持的格式(CSV/TSV/Excel)之间转换。始终使用所需扩展名指定 --output,辅助工具将自动检测合适的写入器(Excel 使用 openpyxl,CSV 默认保留逗号分隔符,TSV 使用制表符)。这是在运行其他命令前规范化数据的最简单方法。
工作流规则
- - 始终保留原始文件的副本或写入新路径;只有在明确要求 --inplace 时,脚本才会覆盖原始文件。
- 使用同一个 CLI 进行探索(summary、preview、describe)和编辑(filter、transform)。--output 标志适用于 filter/transform,方便分支结果。
- 在后台,脚本依赖 pandas + tabulate 进行 Markdown 预览,并支持 Excel/CSV/TSV,因此请确保这些依赖项存在(pandas、openpyxl、xlrd、tabulate 已通过 apt 安装在此系统上)。
- 当上述基本命令描述不够用时,使用 references/usage.md 获取扩展示例(多步清洗、数据集比较、表达式技巧)。
参考
- - 使用指南: references/usage.md(包含可直接复制的命令、表达式模式和数据清洗配方)。
资源
- - GitHub: https://github.com/CrimsonDevil333333/sheetsmith
- ClawHub: https://www.clawhub.ai/skills/sheetsmith