Setup

On first use, create ~/pandas/ and read setup.md for initialization. User preferences are stored in ~/pandas/memory.md — users can view or edit this file anytime.

When to Use

User needs to work with tabular data in Python. Agent handles DataFrame operations, data cleaning, aggregations, merges, pivots, and exports.

Architecture

Memory lives in ~/pandas/. See memory-template.md for structure.

CODEBLOCK0

Quick Reference

Topic	File
Setup process	INLINECODE5
Memory template

memory-template.md |

Core Rules

1. Use Vectorized Operations

- NEVER iterate with for loops over DataFrame rows
Use .apply() only when vectorized alternatives don't exist
Prefer df['col'].str.method() over INLINECODE10

2. Chain Methods for Readability

CODEBLOCK1

3. Handle Missing Data Explicitly

- Always check df.isna().sum() before analysis
Choose strategy: dropna(), fillna(), or interpolation
Document WHY missing values exist before removing them

4. Use Categorical for Repeated Strings

CODEBLOCK2

5. Merge with Validation

CODEBLOCK3

6. Prefer query() for Complex Filters

CODEBLOCK4

7. Set Index When Appropriate

CODEBLOCK5

Common Traps

- SettingWithCopyWarning → Use .loc[] for assignment: INLINECODE15
Slow loops → Replace iterrows() with vectorized ops or INLINECODE17
Memory explosion → Use dtype in read_csv(): INLINECODE20
Silent data loss → Check shape before/after merge: INLINECODE21
Index confusion → Use reset_index() after groupby() to get clean DataFrame
Chained indexing → df['a']['b'] fails silently; use INLINECODE25

Security & Privacy

Data storage:

- User preferences stored in INLINECODE26
All DataFrame operations run locally
No data is sent externally

This skill does NOT:

- Upload data to any service
Access files outside ~/pandas/ and the working directory
Modify source data files without explicit instruction

User control:

- View stored preferences: INLINECODE28
Clear all data: INLINECODE29

Related Skills

Install with clawhub install <slug> if user confirms:

- data-analysis — general data analysis patterns
INLINECODE32 — CSV file handling
INLINECODE33 — database queries
INLINECODE34 — Excel file operations

Feedback

- If useful: INLINECODE35
Stay updated: INLINECODE36

设置

首次使用时，创建 ~/pandas/ 目录并阅读 setup.md 进行初始化。用户偏好设置存储在 ~/pandas/memory.md 中——用户可随时查看或编辑此文件。

使用时机

用户需要使用Python处理表格数据。代理负责处理DataFrame操作、数据清洗、聚合、合并、透视及导出。

架构

记忆文件位于 ~/pandas/ 目录下。结构参考 memory-template.md。

~/pandas/
├── memory.md # 用户偏好设置和常用模式
└── snippets/ # 保存的代码片段（可选）

快速参考

主题	文件
设置流程	setup.md
记忆模板

memory-template.md |

核心规则

1. 使用向量化操作

- 绝对不要用 for 循环遍历DataFrame行
仅在无向量化替代方案时使用 .apply()
优先使用 df[col].str.method() 而非 apply(lambda x: x.method())

2. 链式方法提升可读性

python

推荐：方法链式调用

result = (df .query(age > 30) .groupby(city) .agg({salary: mean}) .reset_index())

不推荐：大量中间变量

filtered = df[df[age] > 30] grouped = filtered.groupby(city) result = grouped.agg({salary: mean}).reset_index()

3. 显式处理缺失数据

- 分析前始终检查 df.isna().sum()
选择策略：dropna()、fillna() 或插值法
删除缺失值前需记录其存在原因

4. 对重复字符串使用分类类型

python

对唯一值较少的列节省内存

df[status] = df[status].astype(category) df[country] = df[country].astype(category)

5. 带验证的合并操作

python

始终指定合并方式并验证

result = pd.merge( df1, df2, on=id, how=left, validate=m:1 # 多对一：捕获意外重复 )

6. 复杂筛选优先使用query()

python

可读性强

df.query(age > 30 and city == NYC and salary < 100000)

可读性差

df[(df[age] > 30) & (df[city] == NYC) & (df[salary] < 100000)]

7. 适时设置索引

python

更快的查找，更干净的合并

df = df.setindex(userid) user_data = df.loc[12345] # O(1) 查找

常见陷阱

- SettingWithCopyWarning → 使用 .loc[] 进行赋值：df.loc[mask, col] = value
慢速循环 → 用向量化操作或 apply() 替代 iterrows()
内存爆炸 → 在 readcsv() 中使用 dtype：pd.readcsv(f, dtype={id: int32})
静默数据丢失 → 合并前后检查形状：print(f合并前: {len(df1)}, 合并后: {len(result)})
索引混淆 → groupby() 后使用 reset_index() 获取干净的DataFrame
链式索引 → df[a][b] 静默失败；应使用 df.loc[:, [a, b]]

安全与隐私

数据存储：

- 用户偏好设置存储在 ~/pandas/memory.md
所有DataFrame操作在本地运行
无数据外传

此技能不会：

- 向任何服务上传数据
访问 ~/pandas/ 和工作目录以外的文件
未经明确指令修改源数据文件

用户控制：

- 查看存储的偏好设置：cat ~/pandas/memory.md
清除所有数据：rm -rf ~/pandas/

反馈

- 如有帮助：clawhub star pandas
保持更新：clawhub sync

PandasPandas数据分析

Pandas

Setup

When to Use

Architecture

Quick Reference

Core Rules

1. Use Vectorized Operations

2. Chain Methods for Readability

3. Handle Missing Data Explicitly

4. Use Categorical for Repeated Strings

5. Merge with Validation

6. Prefer query() for Complex Filters

7. Set Index When Appropriate

Common Traps

Security & Privacy

Related Skills

Feedback

设置

使用时机

架构

快速参考

核心规则

1. 使用向量化操作

2. 链式方法提升可读性

推荐：方法链式调用

不推荐：大量中间变量

3. 显式处理缺失数据

4. 对重复字符串使用分类类型

对唯一值较少的列节省内存

5. 带验证的合并操作

始终指定合并方式并验证

6. 复杂筛选优先使用query()

可读性强

可读性差

7. 适时设置索引

更快的查找，更干净的合并

常见陷阱

安全与隐私

相关技能

反馈

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement