When to Use
Use when the main artifact is a Microsoft Word document or .docx file, especially when tracked changes, comments, headers, numbering, fields, tables, templates, or compatibility matter.
Core Rules
1. Treat DOCX as OOXML, not plain text
- - A
.docx file is a ZIP of XML parts, so structure matters as much as visible text. - The critical parts are usually
word/document.xml, styles.xml, numbering.xml, headers, footers, and relationship files. - Text may be split across multiple runs; never assume one word or sentence lives in one XML node.
- Use different workflows on purpose: structured extraction for quick reading, style-driven generation for new files, and OOXML-aware editing for fragile existing documents.
- If the job is mainly reading, extracting, or reviewing, prefer a structure-preserving read path before touching OOXML.
- For deep edits, inspect the package layout instead of relying only on rendered output.
- Reading, generating, and preserving an existing reviewed document are different jobs even when the format is the same.
- Legacy
.doc inputs usually need conversion before you can trust modern .docx assumptions.
2. Preserve styles and direct formatting deliberately
- - Prefer named styles over direct formatting so the document stays editable.
- Styles layer: paragraph styles, character styles, and direct formatting do not behave the same.
- Removing direct formatting is often safer than stacking more inline formatting on top.
- When editing an existing file, extend the current style system instead of inventing a parallel one.
- Copying content between documents can silently import foreign styles, theme settings, and numbering definitions.
3. Lists and numbering are their own system
- - Bullets and numbering belong to Word's numbering definitions, not pasted Unicode characters.
- INLINECODE7 ,
num, and paragraph numbering properties all matter, so restart behavior is rarely "visual only". - Indentation and numbering are related but not identical; a list can have broken numbering even if the indent looks right.
- A list that looks correct in one editor can restart, flatten, or renumber itself later if the underlying numbering state is wrong.
4. Page layout lives in sections
- - Margins, orientation, headers, footers, and page numbering are section-level behavior.
- First-page and odd/even headers can differ inside the same document, so one header fix may not fix the document.
- Set page size explicitly because A4 and US Letter defaults change pagination and table widths.
- Use section breaks for layout changes; manual spacing and stray page breaks usually create drift.
- Header and footer media use part-specific relationships, so copied IDs often break images or links.
- Tables, page breaks, and headers often drift together, so treat layout fixes as document-wide, not local cosmetic edits.
- Table geometry depends on page width, margins, and fixed widths, so "close enough" table edits often break later in Google Docs or LibreOffice.
5. Track changes, comments, and fields need precise edits
- - Visible text is not the full document when tracked changes are enabled.
- Insertions, deletions, and comments carry metadata that can survive careless edits.
- Deleted text may still exist in the XML even when it no longer appears on screen.
- Comment anchors and review ranges can break if edits move text without preserving the surrounding structure.
- Comment markers and review wrappers do not behave like inline formatting, so moving text carelessly can orphan or misplace them.
- Comments, footnotes, bookmarks, and linked media may live in separate parts, not only in the main document body.
- Tables of contents, page numbers, dates, cross-references, and mail merge placeholders are fields.
- Edit the field source carefully and expect cached display values to lag until refresh.
- Hyperlinks, bookmarks, and references can break if IDs or relationships stop matching.
- Bookmarks, footnotes, comment ranges, and cross-references depend on stable anchors even when the visible text seems untouched.
- A document can look correct while still containing stale field output that refreshes later into something different.
- For review workflows, make minimal replacements instead of rewriting whole paragraphs.
- In tracked-change workflows, only the changed span should look changed; broad rewrites create noisy reviews and can destroy the original formatting context.
- For legal, academic, or business review documents, default to review-style edits over wholesale paragraph rewrites unless the user explicitly wants a rewrite.
6. Verify round-trip compatibility before delivery
- - Complex documents can shift between Word, LibreOffice, Google Docs, and conversion tools.
- Tables, headers, embedded fonts, and copied styles are common sources of layout drift.
- Treat
.docm as macro-bearing and higher risk; treat .doc as legacy input that may need conversion first. - When layout matters, explicit table widths are safer than auto-fit or percentage-style behavior that different editors reinterpret.
- A document that passes a text check can still fail on pagination, table widths, or reference refresh after the recipient opens it.
Common Traps
- - Copy-paste can import unwanted styles and numbering definitions.
- Header or footer images use part-specific relationships, so reusing IDs blindly breaks them.
- Empty paragraphs used as spacing make templates fragile; spacing belongs in paragraph settings.
- A clean-looking export can still hide unresolved revisions, comments, or stale field values.
- Restarting lists "by eye" usually fails because numbering state lives outside the paragraph text.
- One visible phrase can be split across several runs, bookmarks, revision tags, or field boundaries.
- Replacing a whole paragraph to change one clause often breaks review quality, bookmarks, comments, or nearby inline formatting.
- Deleting all visible text from a paragraph or list item can still leave behind an empty paragraph mark, empty bullet, or unstable numbering.
- Table auto-fit and percentage-like width behavior can look acceptable in Word and still drift in Google Docs or LibreOffice.
- LibreOffice and Google Docs can shift complex tables, section behavior, and embedded fonts even when Word looks perfect.
- Compatibility mode can silently cap newer features or change pagination behavior.
- A single change in page size or margin defaults can ripple through tables, headers, TOC, and cross-references.
- A revision workflow can look accepted on screen while leftover metadata, comments, or field caches still make the file unstable later.
- TOC entries, footnotes, and cross-references can look correct until the recipient updates fields and exposes broken anchors.
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
documents — General document handling and format conversion. - INLINECODE13 — Concise business writing and structured summaries.
- INLINECODE14 — Long-form drafting and editorial structure.
Feedback
- - If useful: INLINECODE15
- Stay updated: INLINECODE16
技能名称:Word / DOCX
使用场景
当主要交付物是Microsoft Word文档或.docx文件时使用,尤其涉及修订、批注、页眉、编号、域、表格、模板或兼容性问题时。
核心规则
1. 将DOCX视为OOXML,而非纯文本
- - .docx文件是XML部件的ZIP压缩包,因此结构与可见文本同等重要。
- 关键部件通常包括word/document.xml、styles.xml、numbering.xml、页眉、页脚和关系文件。
- 文本可能分散在多个运行块中;切勿假设一个单词或句子存在于单个XML节点中。
- 有目的地使用不同工作流:结构化提取用于快速阅读,样式驱动生成用于新文件,OOXML感知编辑用于脆弱的现有文档。
- 如果任务主要是阅读、提取或审阅,在接触OOXML之前,优先采用保留结构的读取路径。
- 对于深度编辑,检查包布局,而非仅依赖渲染输出。
- 即使格式相同,阅读、生成和保留现有已审阅文档也是不同的任务。
- 旧版.doc输入通常需要转换后,才能信任现代.docx的假设。
2. 有意识地保留样式和直接格式
- - 优先使用命名样式而非直接格式,以保持文档可编辑性。
- 样式层次:段落样式、字符样式和直接格式的行为各不相同。
- 移除直接格式通常比叠加更多内联格式更安全。
- 编辑现有文件时,扩展当前样式系统,而非创建平行系统。
- 在文档间复制内容可能会静默引入外来样式、主题设置和编号定义。
3. 列表和编号自成体系
- - 项目符号和编号属于Word的编号定义,而非粘贴的Unicode字符。
- abstractNum、num和段落编号属性都至关重要,因此重新开始行为很少是“仅视觉上的”。
- 缩进和编号相关但不相同;即使缩进看起来正确,列表的编号也可能出错。
- 在一个编辑器中看起来正确的列表,如果底层编号状态错误,稍后可能会重新开始、扁平化或重新编号。
4. 页面布局存在于节中
- - 边距、方向、页眉、页脚和页码是节级行为。
- 首页和奇偶页页眉在同一文档中可以不同,因此修复一个页眉可能无法修复整个文档。
- 明确设置页面大小,因为A4和US Letter默认值会改变分页和表格宽度。
- 使用分节符进行布局更改;手动间距和零散分页符通常会导致漂移。
- 页眉和页脚媒体使用部件特定的关系,因此复制的ID通常会破坏图像或链接。
- 表格、分页符和页眉经常一起漂移,因此将布局修复视为文档范围的操作,而非局部美化编辑。
- 表格几何取决于页面宽度、边距和固定宽度,因此“差不多”的表格编辑稍后在Google Docs或LibreOffice中通常会出错。
5. 修订、批注和域需要精确编辑
- - 启用修订时,可见文本并非完整文档。
- 插入、删除和批注携带的元数据可能在不经意的编辑中幸存。
- 即使屏幕上不再显示,已删除文本可能仍存在于XML中。
- 如果编辑移动文本而未保留周围结构,批注锚点和审阅范围可能会损坏。
- 批注标记和审阅包装器的行为不同于内联格式,因此粗心移动文本可能导致它们孤立或错位。
- 批注、脚注、书签和链接媒体可能存在于独立部件中,而非仅主文档正文。
- 目录、页码、日期、交叉引用和邮件合并占位符都是域。
- 仔细编辑域源,并预期缓存显示值在刷新前会滞后。
- 如果ID或关系不再匹配,超链接、书签和引用可能会损坏。
- 即使可见文本看似未受影响,书签、脚注、批注范围和交叉引用也依赖于稳定的锚点。
- 文档可能看起来正确,但仍包含过时的域输出,稍后刷新时会变成不同内容。
- 对于审阅工作流,进行最小替换,而非重写整个段落。
- 在修订工作流中,只有更改的跨度应显示为更改;大范围重写会产生嘈杂的审阅,并可能破坏原始格式上下文。
- 对于法律、学术或商业审阅文档,默认使用审阅式编辑而非整段重写,除非用户明确要求重写。
6. 交付前验证往返兼容性
- - 复杂文档可能在Word、LibreOffice、Google Docs和转换工具之间发生变化。
- 表格、页眉、嵌入字体和复制的样式是布局漂移的常见来源。
- 将.docm视为包含宏且风险较高;将.doc视为可能需要先转换的旧版输入。
- 当布局重要时,显式表格宽度比不同编辑器重新解释的自动适应或百分比行为更安全。
- 通过文本检查的文档,在收件人打开后仍可能在分页、表格宽度或引用刷新上失败。
常见陷阱
- - 复制粘贴可能引入不需要的样式和编号定义。
- 页眉或页脚图像使用部件特定关系,因此盲目重用ID会破坏它们。
- 使用空段落作为间距会使模板脆弱;间距应属于段落设置。
- 看似干净的导出仍可能隐藏未解决的修订、批注或过时的域值。
- “凭眼力”重新开始列表通常会失败,因为编号状态存在于段落文本之外。
- 一个可见短语可能分散在多个运行块、书签、修订标签或域边界中。
- 为更改一个条款而替换整个段落,通常会破坏审阅质量、书签、批注或附近的内联格式。
- 从段落或列表项中删除所有可见文本,仍可能留下空段落标记、空项目符号或不稳定的编号。
- 表格自动适应和类似百分比宽度行为在Word中可能看起来可接受,但在Google Docs或LibreOffice中仍会漂移。
- 即使Word看起来完美,LibreOffice和Google Docs也可能改变复杂表格、节行为和嵌入字体。
- 兼容模式可能静默限制较新功能或更改分页行为。
- 页面大小或边距默认值的单一更改可能波及表格、页眉、目录和交叉引用。
- 修订工作流在屏幕上可能看起来已接受,而残留的元数据、批注或域缓存稍后仍会使文件不稳定。
- 目录条目、脚注和交叉引用可能看起来正确,直到收件人更新域并暴露损坏的锚点。
相关技能
如果用户确认,使用 clawhub install 安装:
- - documents — 通用文档处理和格式转换。
- brief — 简洁商务写作和结构化摘要。
- article — 长篇起草和编辑结构。
反馈
- - 如果有用:clawhub star word-docx
- 保持更新:clawhub sync