Essay Humanizer (corpus-informed)
Rewrites AI-generated argumentative/academic essays toward human baseline style informed by CAWSE (M/D bands) LOCNESS, and contrast with DeepSeek-generated counterparts. Ships with a fine-tuned LoRA adapter (9.3 MB) and inference script.
Skill contract
| Component | Path | Notes |
|---|
| Inference script | INLINECODE0 | Entry point — humanize() function or CLI |
| LoRA adapters |
assets/adapters/adapters.safetensors.json | 12.3 MB base64 JSON; auto-decoded to binary on first run |
| Pattern weights |
data/analysis/weights.json | Corpus-derived, loaded by inference at runtime |
| Decoder |
scripts/decode_adapters.py | Reconstructs .safetensors binary from JSON (auto or manual) |
| Installer |
scripts/install_deps.sh | One-time:
pip install mlx mlx-lm transformers + decode |
| Base model |
Qwen/Qwen3-8B-MLX-4bit | Downloaded from HuggingFace on first run (~4.5 GB, cached) |
Requirements: Apple Silicon macOS with Python 3.9+.
Quick Start
CODEBLOCK0
Or from Python:
CODEBLOCK1
Weighted pattern table (descending priority)
When humanizing, address higher-weight rows first. Weights are data-driven from corpus analysis (Mann-Whitney); zero-weight rows were not statistically significant.
| ID | Weight | Category | Pattern |
|---|
| P06CLICHEMETAPHORS | 0.1358 | vocabulary | Cliche metaphors |
| P15EMDASH_OVERKILL |
0.1358 | punctuation | Em dash overkill |
| P21
MARKDOWNARTIFACTS | 0.1358 | formatting | Markdown artifacts |
| P23
TEXTBOOKBOLDING | 0.1358 | formatting | Textbook bolding |
| P12
PRESENTPARTICIPLE_TAIL | 0.1133 | rhetorical | Present participle tailing |
| P10
RULEOF_THREES | 0.0806 | rhetorical | Rule of threes |
| P04
AIVOCABULARY | 0.0621 | vocabulary | AI vocabulary |
| P14
COMPULSIVESUMMARIES | 0.0598 | rhetorical | Compulsive summaries |
| P05
EXCESSIVEADVERBS | 0.0540 | vocabulary | Excessive adverbs |
| P13
OVERATTRIBUTION | 0.0529 | rhetorical | Over-attribution |
| P11
FALSERANGES | 0.0341 | rhetorical | False ranges |
| P17
TRANSITIONOVERUSE | 0.0001 | punctuation | Overuse of transition words |
| P01
UNDUEEMPHASIS | 0.0000 | content | Undue emphasis |
| P02
SUPERFICIALANALYSIS | 0.0000 | content | Superficial analysis |
| P03
REGRESSIONTO_MEAN | 0.0000 | content | Regression to the mean |
| P07
REDUNDANTMODIFIERS | 0.0000 | vocabulary | Redundant modifiers |
| P08
FILLERHEDGING | 0.0000 | vocabulary | Filler hedging |
| P09
NEGATIVEPARALLELISM | 0.0000 | rhetorical | Negative parallelisms |
| P16
ENDASH_AVOIDANCE | 0.0000 | punctuation | En dash / hyphen misuse for ranges |
| P18
COLLABORATIVEREGISTER | 0.0000 | register | Collaborative register |
| P19
LETTERFORMALITY | 0.0000 | register | Letter-style formality |
| P20
INSTRUCTIONALCONDESCENSION | 0.0000 | register | Instructional condescension |
| P22
EXCESSIVELISTS | 0.0000 | formatting | Excessive bulleted/numbered lists |
| P24
EMOJISYMBOL | 0.0000 | formatting | Emoji/symbol injection |
Syntactic complexity (MDD / ADD advisory)
Human Merit / Distinction-range writing in CAWSE often shows variable mean dependency distance (MDD); AI prose may cluster more tightly. When humanizing:
- - Reference MDD means from analysis: human ~2.333775514332394, AI ~2.4553791855163483.
- Variance ratio (human/AI) ~1.7153931408079544: prefer natural mix of shorter and longer dependency links, not uniformly smoothed sentences.
- Avoid flattening every sentence to minimal dependency length; that can read as a different kind of machine polish.
Mandatory rule (orchestrator)
- 1. Output continuous prose suitable for submission (no chat-signoffs, no "hope this helps").
- Plain text only for math if any — no raw
$$ LaTeX unless user explicitly requests LaTeX. - Preserve author stance and citations if present; do not fabricate references.
Hosted HTTP API (optional, for non-Mac or remote use)
For non-Apple-Silicon machines or multi-user deployments, run the optional FastAPI server on a Mac host and connect via HTTP/OpenAPI:
- 1. Install: INLINECODE9
- Run:
uvicorn api.main:app --host 0.0.0.0 --port 8765 (set HUMANIZE_API_KEY env var for auth) - Point MCP / OpenAPI tools at INLINECODE12
- Call
POST /v1/humanize with JSON {"text":"..."} (+ Authorization: Bearer …)
See references/hosted_api.md for details.
References
论文人性化改写工具(基于语料库)
将AI生成的议论文/学术论文改写为人类基准风格,参考CAWSE(M/D等级)LOCNESS语料库,并与DeepSeek生成的文本进行对比。附带微调后的LoRA适配器(9.3 MB)和推理脚本。
技能合约
| 组件 | 路径 | 说明 |
|---|
| 推理脚本 | scripts/inference.py | 入口点 — humanize() 函数或命令行接口 |
| LoRA适配器 |
assets/adapters/adapters.safetensors.json | 12.3 MB base64格式JSON;首次运行时自动解码为二进制 |
| 模式权重 | data/analysis/weights.json | 基于语料库生成,运行时由推理模块加载 |
| 解码器 | scripts/decode_adapters.py | 从JSON重建.safetensors二进制文件(自动或手动) |
| 安装脚本 | scripts/install_deps.sh | 一次性操作:pip install mlx mlx-lm transformers + 解码 |
| 基础模型 | Qwen/Qwen3-8B-MLX-4bit | 首次运行时从HuggingFace下载(约4.5 GB,已缓存) |
系统要求: Apple Silicon macOS + Python 3.9+。
快速开始
bash
bash scripts/install_deps.sh # 一次性操作:安装依赖 + 解码适配器
python scripts/inference.py --file draft.txt # 适配器若未解码则自动解码
或通过Python调用:
python
from scripts.inference import humanize
print(humanize(此处输入您的AI草稿论文文本...))
加权模式表(按优先级降序排列)
进行人性化改写时,优先处理权重较高的行。权重基于语料库分析(Mann-Whitney检验)的数据驱动结果;权重为零的行不具备统计显著性。
| ID | 权重 | 类别 | 模式 |
|---|
| P06陈词滥调隐喻 | 0.1358 | 词汇 | 陈词滥调的隐喻 |
| P15破折号过度使用 |
0.1358 | 标点 | 破折号过度使用 |
| P21
Markdown痕迹 | 0.1358 | 格式 | Markdown痕迹 |
| P23
教科书式加粗 | 0.1358 | 格式 | 教科书式加粗 |
| P12
现在分词结尾 | 0.1133 | 修辞 | 现在分词结尾 |
| P10
三段式结构 | 0.0806 | 修辞 | 三段式结构 |
| P04
AI词汇 | 0.0621 | 词汇 | AI词汇 |
| P14
强迫性总结 | 0.0598 | 修辞 | 强迫性总结 |
| P05
过度副词 | 0.0540 | 词汇 | 过度使用副词 |
| P13
过度归因 | 0.0529 | 修辞 | 过度归因 |
| P11
虚假范围 | 0.0341 | 修辞 | 虚假范围 |
| P17
过渡词过度使用 | 0.0001 | 标点 | 过渡词过度使用 |
| P01
不当强调 | 0.0000 | 内容 | 不当强调 |
| P02
肤浅分析 | 0.0000 | 内容 | 肤浅分析 |
| P03
均值回归 | 0.0000 | 内容 | 均值回归 |
| P07
冗余修饰语 | 0.0000 | 词汇 | 冗余修饰语 |
| P08
填充模糊 | 0.0000 | 词汇 | 填充模糊语 |
| P09
否定平行结构 | 0.0000 | 修辞 | 否定平行结构 |
| P16
连接号回避 | 0.0000 | 标点 | 连接号/连字符用于范围时的误用 |
| P18
协作性语域 | 0.0000 | 语域 | 协作性语域 |
| P19
书信式正式度 | 0.0000 | 语域 | 书信式正式度 |
| P20
说教式居高临下 | 0.0000 | 语域 | 说教式居高临下 |
| P22
过度列表 | 0.0000 | 格式 | 过度使用项目符号/编号列表 |
| P24
表情符号符号 | 0.0000 | 格式 | 表情符号/符号插入 |
句法复杂度(MDD / ADD 建议)
CAWSE语料库中人类优秀/良好等级写作的平均依存距离(MDD)通常呈现可变性;AI文本可能聚类更紧密。进行人性化改写时:
- - 参考分析中的MDD均值:人类约2.333775514332394,AI约2.4553791855163483。
- 方差比(人类/AI)约1.7153931408079544:偏好短长依存链接的自然混合,而非均匀平滑的句子。
- 避免将每个句子都压平至最小依存长度;那可能读起来像另一种机器抛光。
强制性规则(编排器)
- 1. 输出适合提交的连续散文(无聊天结束语,无希望这有帮助)。
- 如有数学内容仅使用纯文本 — 除非用户明确要求LaTeX,否则不使用原始$$ LaTeX。
- 保留作者立场和引用(如有);不得编造参考文献。
托管HTTP API(可选,适用于非Mac或远程使用)
对于非Apple Silicon机器或多用户部署,可在Mac主机上运行可选的FastAPI服务器,通过HTTP/OpenAPI连接:
- 1. 安装:pip install fastapi uvicorn[standard]
- 运行:uvicorn api.main:app --host 0.0.0.0 --port 8765(设置HUMANIZEAPIKEY环境变量用于认证)
- 将MCP/OpenAPI工具指向https:///openapi.json
- 调用POST /v1/humanize,传入JSON {text:...}(+ Authorization: Bearer …)
详见references/hosted_api.md。
参考资料