officecli: v1.0.23
OfficeCLI DOCX Skill
BEFORE YOU START (CRITICAL)
Every time before using officecli, run this check:
CODEBLOCK0
officecli: v1.0.23
Quick Reference
| Task | Action |
|---|
| Read / analyze content | Use view and get commands below |
| Edit existing document |
Read
editing.md |
| Create from scratch | Read
creating.md |
officecli: v1.0.23
Reading & Analyzing
Text Extraction
CODEBLOCK1
INLINECODE2 mode shows [/body/p[N]] text content, tables as [Table: N rows], equations as readable math. Use --max-lines or --start/--end for large documents to avoid dumping entire content.
Structure Overview
CODEBLOCK2
Output shows file stats (paragraph count, tables, images, equations), watermark presence, headers/footers, and heading hierarchy tree.
Detailed Inspection
CODEBLOCK3
Shows style, font, size, bold/italic per run; equations as LaTeX; images with alt text. Empty paragraphs shown as [] <-- empty paragraph.
Statistics
CODEBLOCK4
Paragraph count, style distribution, font usage, font size usage, empty paragraph count.
Element Inspection
CODEBLOCK5
CSS-like Queries
CODEBLOCK6
officecli: v1.0.23
Design Principles
Professional documents need clear structure and consistent formatting.
Document Structure
Every document needs clear hierarchy -- title, headings, subheadings, body text. Don't create a wall of unstyled Normal paragraphs.
Typography
Choose a readable body font (Calibri, Cambria, Georgia, Times New Roman). Keep body at 11-12pt. Headings should step up: H1=16-18pt bold, H2=14pt bold, H3=12pt bold.
Spacing
Use paragraph spacing (spaceBefore/spaceAfter) instead of empty paragraphs. Line spacing of 1.15x-1.5x for body text.
Page Setup
Always set margins explicitly. US Letter default: pageWidth=12240, pageHeight=15840, margins=1440 (1 inch).
Headers & Footers
Professional documents include page numbers at minimum. Consider company name in header, page X of Y in footer.
Table Design
Alternate row shading for readability. Header row with contrasting background. Consistent cell padding.
Color Usage
Use color sparingly in documents -- accent color for headings or table headers, not rainbow formatting.
Content-to-Element Mapping
| Content Type | Recommended Element(s) | Why |
|---|
| Sequential items | Bulleted list (listStyle=bullet) | Scanning is faster than inline commas |
| Step-by-step process |
Numbered list (
listStyle=numbered) | Numbers communicate order |
| Comparative data | Table with header row | Columns enable side-by-side comparison |
| Trend data | Embedded chart (
chartType=line/column) | Visual pattern recognition |
| Key definition | Hanging indent paragraph | Offset term from definition |
| Legal/contract clause | Numbered list with bookmarks | Cross-referencing via bookmarks |
| Mathematical content | Equation element (
formula=LaTeX) | Proper OMML rendering |
| Citation/reference | Footnote or endnote | Keeps body text clean |
| Pull quote / callout | Paragraph with border + shading | Visual distinction from body |
| Multi-section layout | Section breaks with columns | Column control per section |
officecli: v1.0.23
QA (Required)
Assume there are problems. Your job is to find them.
Issue Detection
CODEBLOCK7
Content QA
CODEBLOCK8
When editing templates, check for leftover placeholder text:
CODEBLOCK9
Validation
CODEBLOCK10
Pre-Delivery Checklist
- - [ ] Metadata set (title, author)
- [ ] Page numbers present (field in header or footer)
- [ ] Heading hierarchy is correct (no skipped levels, e.g., H1 -> H3)
- [ ] No empty paragraphs used as spacing (use spaceBefore/spaceAfter instead)
- [ ] All images have alt text
- [ ] Tables have header rows
- [ ] TOC present if document has 3+ headings
- [ ] Document validates with INLINECODE17
- [ ] No placeholder text remaining
Verification Loop
- 1. Generate document
- Run
view issues + view outline + view text + INLINECODE21 - List issues found (if none found, look again more critically)
- Fix issues
- Re-verify -- one fix often creates another problem
- Repeat until a full pass reveals no new issues
Do not declare success until you've completed at least one fix-and-verify cycle.
NOTE: Unlike pptx, there is no visual preview mode (view svg/view html) for docx. Content verification relies on view text, view annotated, view outline, view issues, and validate. For visual verification, the user must open the file in Word.
QA display notes:
- -
view text shows "1." for ALL numbered list items regardless of their actual rendered number. This is a display limitation -- the actual document renders correct auto-incrementing numbers (1, 2, 3...) in Word and LibreOffice. Do not treat this as a defect. - INLINECODE30 flags "body paragraph missing first-line indent" on centered paragraphs, list items, bibliography entries, and other intentionally non-indented content. These are expected for block-style formatting and can be ignored when the paragraph has explicit
spaceAfter, listStyle, alignment=center, or hangingIndent.
officecli: v1.0.23
Common Pitfalls
| Pitfall | Correct Approach |
|---|
| INLINECODE35 | Use --prop name="foo" -- all attributes go through INLINECODE37 |
| Guessing property names |
Run
officecli docx set paragraph to see exact names |
|
\n in shell strings | Use
\\n for newlines in
--prop text="line1\\nline2" |
| Modifying an open file | Close the file in Word first |
| Hex colors with
# | Use
FF0000 not
#FF0000 -- no hash prefix |
| Paths are 1-based |
/body/p[1],
/body/tbl[1] -- XPath convention |
|
--index is 0-based |
--index 0 = first position -- array convention |
| Unquoted
[N] in zsh/bash | Shell glob-expands
/body/p[1] -- always quote paths:
"/body/p[1]" |
| Spacing in raw numbers | Use unit-qualified values:
'12pt',
'0.5cm',
'1.5x' not raw twips |
| Empty paragraphs for spacing | Use
spaceBefore/
spaceAfter properties on paragraphs |
|
$ and
' in batch JSON | Use heredoc:
cat <<'EOF' \| officecli batch -- single-quoted delimiter prevents shell expansion |
| Wrong border format | Use
style;size;color;space format:
single;4;FF0000;1 |
| listStyle on run instead of paragraph |
listStyle is a paragraph property, not a run property |
| Row-level bold/color/shd | Row
set only supports
height,
header, and
c1/c2/c3 text shortcuts. Use cell-level
set for formatting (bold, shd, color, font) |
| Section vs root property names | Section uses
pagewidth/
pageheight (lowercase). Document root uses
pageWidth/
pageHeight (camelCase) |
officecli: v1.0.23
Performance: Resident Mode
For multi-step workflows (3+ commands on the same file), use open/close:
CODEBLOCK11
Performance: Batch Mode
Execute multiple operations in a single open/save cycle:
CODEBLOCK12
Batch supports: add, set, get, query, remove, move, view, raw, raw-set, validate.
Batch fields: command, path, parent, type, from, to, index, props (dict), selector, mode, depth, part, xpath, action, xml.
INLINECODE99 = container to add into (for add). path = element to modify (for set, get, remove, move).
officecli: v1.0.23
Known Issues
| Issue | Workaround |
|---|
| No visual preview | Unlike pptx (SVG/HTML), docx has no built-in rendering. Use view text/view outline/view annotated/view issues for verification. Users must open in Word for visual check. |
| Track changes creation requires raw XML |
OfficeCLI can accept/reject tracked changes (
set / --prop accept-changes=all) but cannot create tracked changes (insertions/deletions with author markup) via high-level commands. Use
raw-set with XML for tracked change creation. |
|
Tab stops may require raw XML | Tab stop creation is not exposed in officecli docx high-level commands. Use
raw-set to add tab stop definitions in paragraph properties. |
|
Chart series cannot be added after creation | Same as pptx:
set --prop data= can only update existing series, not add new ones. Delete and recreate the chart with all series in the
add command. |
|
Complex numbering definitions |
listStyle=bullet/numbered covers simple cases. For multi-level lists with custom formatting, use
numId/
numLevel properties. Creating new numbering definitions may require understanding the numbering part. |
|
Shell quoting in batch with echo |
echo '...' \| officecli batch fails when JSON values contain apostrophes or
$. Use heredoc:
cat <<'EOF' \| officecli batch doc.docx. |
|
Batch intermittent failure | Approximately 1-in-15 batch operations may fail with "Failed to send to resident" when using batch+resident mode. Retry the command, or close/reopen the file. Split large batch arrays into 10-15 operation chunks. |
|
Table-level padding produces invalid XML | Do not use
set tbl[N] --prop padding=N. It creates invalid
tblCellMar. Use cell-level
padding.top/
padding.bottom instead. If already applied, remove with
raw-set --xpath "//w:tbl[N]/w:tblPr/w:tblCellMar" --action remove. |
|
Internal hyperlinks not supported | The
hyperlink command only accepts absolute URIs (
https://...). Fragment URLs (
#bookmark) are rejected. For internal cross-references, use descriptive text or
raw-set with
<w:hyperlink w:anchor="bookmarkName">. |
|
Table --index positioning unreliable |
--index N on
add /body --type table may be ignored (table appends to end).
move also may not work for tables. Workaround: add content in the desired order, or remove/re-add surrounding elements. |
|
\mathcal in equations causes validation errors | The
\mathcal LaTeX command generates invalid
m:scr XML. Use
\mathit or plain letters instead. |
|
view text shows "1." for all numbered items | Display-only limitation. Rendered output in Word/LibreOffice shows correct auto-incrementing numbers. |
officecli: v1.0.23
Help System
When unsure about property names, value formats, or command syntax, run help instead of guessing. One help query is faster than guess-fail-retry loops.
CODEBLOCK13
officecli: v1.0.23
OfficeCLI DOCX 技能
开始之前(关键)
每次使用 officecli 之前,请运行此检查:
bash
if ! command -v officecli &> /dev/null; then
echo 正在安装 officecli...
curl -fsSL https://raw.githubusercontent.com/iOfficeAI/OfficeCli/main/install.sh -o /tmp/officecliinstall.sh && bash /tmp/officecliinstall.sh && rm -f /tmp/officecli_install.sh
# Windows: irm https://raw.githubusercontent.com/iOfficeAI/OfficeCli/main/install.ps1 -OutFile $env:TEMP\officecliinstall.ps1; & $env:TEMP\officecliinstall.ps1
else
CURRENT=$(officecli --version 2>&1 | grep -oE [0-9]+\.[0-9]+\.[0-9]+ | head -1)
LATEST=$(curl -fsSL https://api.github.com/repos/iOfficeAI/OfficeCLI/releases/latest | grep tag_name | sed -E s/.v?([0-9.]+)./\1/)
if [ $CURRENT != $LATEST ]; then
echo 正在升级 officecli $CURRENT → $LATEST...
curl -fsSL https://raw.githubusercontent.com/iOfficeAI/OfficeCli/main/install.sh -o /tmp/officecliinstall.sh && bash /tmp/officecliinstall.sh && rm -f /tmp/officecli_install.sh
else
echo officecli $CURRENT 已是最新版本
fi
fi
officecli --version
officecli: v1.0.23
快速参考
| 任务 | 操作 |
|---|
| 读取/分析内容 | 使用下面的 view 和 get 命令 |
| 编辑现有文档 |
阅读
editing.md |
| 从头创建 | 阅读
creating.md |
officecli: v1.0.23
读取与分析
文本提取
bash
officecli view doc.docx text
officecli view doc.docx text --max-lines 200
officecli view doc.docx text --start 1 --end 50
text 模式显示 [/body/p[N]] 文本内容,表格显示为 [Table: N rows],公式显示为可读的数学表达式。对于大型文档,使用 --max-lines 或 --start/--end 以避免转储全部内容。
结构概览
bash
officecli view doc.docx outline
输出显示文件统计信息(段落数、表格、图片、公式)、水印存在情况、页眉/页脚以及标题层级树。
详细检查
bash
officecli view doc.docx annotated
显示每个运行(run)的样式、字体、大小、粗体/斜体;公式显示为 LaTeX;图片显示替代文本。空段落显示为 [] <-- empty paragraph。
统计信息
bash
officecli view doc.docx stats
段落数、样式分布、字体使用情况、字号使用情况、空段落数。
元素检查
bash
文档根节点(元数据、页面设置)
officecli get doc.docx /
列出深度为1的 body 子元素
officecli get doc.docx /body --depth 1
特定段落
officecli get doc.docx /body/p[1]
特定运行
officecli get doc.docx /body/p[1]/r[1]
表格结构
officecli get doc.docx /body/tbl[1] --depth 3
样式定义
officecli get doc.docx /styles
特定样式
officecli get doc.docx /styles/Heading1
页眉/页脚
officecli get doc.docx /header[1]
officecli get doc.docx /footer[1]
编号定义
officecli get doc.docx /numbering
用于脚本的 JSON 输出
officecli get doc.docx /body/p[1] --json
CSS 风格查询
bash
按样式查找段落
officecli query doc.docx paragraph[style=Heading1]
查找包含文本的段落
officecli query doc.docx p:contains(quarterly)
查找空段落
officecli query doc.docx p:empty
查找没有替代文本的图片
officecli query doc.docx image:no-alt
查找居中段落中的粗体运行
officecli query doc.docx p[alignment=center] > r[bold=true]
查找大号文本
officecli query doc.docx paragraph[size>=24pt]
按类型查找域
officecli query doc.docx field[fieldType!=page]
officecli: v1.0.23
设计原则
专业文档需要清晰的结构和一致的格式。
文档结构
每个文档都需要清晰的层级——标题、一级标题、二级标题、正文。不要创建一堵未设置样式的普通段落墙。
排版
选择可读性强的正文字体(Calibri、Cambria、Georgia、Times New Roman)。正文保持 11-12pt。标题应逐步增大:H1=16-18pt 粗体,H2=14pt 粗体,H3=12pt 粗体。
间距
使用段落间距(spaceBefore/spaceAfter)而不是空段落。正文行距为 1.15x-1.5x。
页面设置
始终显式设置页边距。美国信纸默认值:pageWidth=12240,pageHeight=15840,页边距=1440(1 英寸)。
页眉与页脚
专业文档至少包含页码。考虑在页眉中添加公司名称,在页脚中添加第 X 页,共 Y 页。
表格设计
交替行底纹以提高可读性。标题行使用对比背景色。一致的单元格内边距。
颜色使用
在文档中谨慎使用颜色——用于标题或表格表头的强调色,而不是彩虹格式。
内容到元素映射
| 内容类型 | 推荐元素 | 原因 |
|---|
| 顺序项目 | 项目符号列表(listStyle=bullet) | 扫描比内联逗号更快 |
| 分步过程 |
编号列表(listStyle=numbered) | 数字传达顺序 |
| 比较数据 | 带标题行的表格 | 列支持并排比较 |
| 趋势数据 | 嵌入式图表(chartType=line/column) | 视觉模式识别 |
| 关键定义 | 悬挂缩进段落 | 将术语与定义分开 |
| 法律/合同条款 | 带书签的编号列表 | 通过书签进行交叉引用 |
| 数学内容 | 公式元素(formula=LaTeX) | 正确的 OMML 渲染 |
| 引用/参考文献 | 脚注或尾注 | 保持正文整洁 |
| 引文/标注 | 带边框和底纹的段落 | 与正文视觉区分 |
| 多节布局 | 带分栏的分节符 | 每节独立控制分栏 |
officecli: v1.0.23
质量保证(必需)
假设存在问题。你的任务是找到它们。
问题检测
bash
自动检查格式、内容和结构问题
officecli view doc.docx issues
按问题类型筛选
officecli view doc.docx issues --type format
officecli view doc.docx issues --type content
officecli view doc.docx issues --type structure
内容质量保证
bash
提取所有文本,检查缺失内容、拼写错误、顺序错误
officecli view doc.docx text
检查结构
officecli view doc.docx outline
检查空段落(常见杂乱内容)
officecli query doc.docx p:empty
检查没有替代文本的图片
officecli query doc.docx image:no-alt
编辑模板时,检查残留的占位符文本:
bash
officecli query doc.docx p:contains(lorem)
officecli query doc.docx p:contains(xxxx)
officecli query doc.docx p:contains(placeholder)
验证
bash
officecli validate doc.docx
交付前检查清单
- - [ ] 元数据已设置(标题、作者)
- [ ] 页码已存在(页眉或页脚中的域)
- [ ] 标题层级正确(没有跳级,例如 H1 -> H3)
- [ ] 没有使用空段落作为间距(改用 spaceBefore/spaceAfter)
- [ ] 所有图片都有替代文本
- [ ] 表格有标题行
- [ ] 如果文档有 3 个以上标题,则存在目录
- [ ]