Humanize AI Text
Comprehensive CLI for detecting and transforming AI-generated text to bypass detectors. Based on Wikipedia's Signs of AI Writing.
Quick Start
CODEBLOCK0
Detection Categories
The analyzer checks for 16 pattern categories from Wikipedia's guide:
Critical (Immediate AI Detection)
| Category | Examples |
|---|
| Citation Bugs | INLINECODE0 , turn0search, INLINECODE2 |
| Knowledge Cutoff |
"as of my last training", "based on available information" |
| Chatbot Artifacts | "I hope this helps", "Great question!", "As an AI" |
| Markdown |
**bold**,
## headers, ``
code blocks `
|
### High Signal
| Category | Examples |
|----------|----------|
| AI Vocabulary | delve, tapestry, landscape, pivotal, underscore, foster |
| Significance Inflation | "serves as a testament", "pivotal moment", "indelible mark" |
| Promotional Language | vibrant, groundbreaking, nestled, breathtaking |
| Copula Avoidance | "serves as" instead of "is", "boasts" instead of "has" |
### Medium Signal
| Category | Examples |
|----------|----------|
| Superficial -ing | "highlighting the importance", "fostering collaboration" |
| Filler Phrases | "in order to", "due to the fact that", "Additionally," |
| Vague Attributions | "experts believe", "industry reports suggest" |
| Challenges Formula | "Despite these challenges", "Future outlook" |
### Style Signal
| Category | Examples |
|----------|----------|
| Curly Quotes | "" instead of "" (ChatGPT signature) |
| Em Dash Overuse | Excessive use of — for emphasis |
| Negative Parallelisms | "Not only... but also", "It's not just... it's" |
| Rule of Three | Forced triplets like "innovation, inspiration, and insight" |
---
## Scripts
### detect.py — Scan for AI Patterns
CODEBLOCK1
**Output:**
- Issue count and word count
- AI probability (low/medium/high/very high)
- Breakdown by category
- Auto-fixable patterns marked
### transform.py — Rewrite Text
CODEBLOCK2
**Auto-fixes:**
- Citation bugs (oaicite, turn0search)
- Markdown (**, ##, `
)
- Chatbot sentences
- Copula avoidance → "is/has"
- Filler phrases → simpler forms
- Curly → straight quotes
**Aggressive (-a):**
- Simplifies -ing clauses
- Reduces em dashes
### compare.py — Before/After Analysis
CODEBLOCK3
Shows side-by-side detection scores before and after transformation
---
## Workflow
1. **Scan** for detection risk:
CODEBLOCK4
2. **Transform** with comparison:
CODEBLOCK5
3. **Verify** improvement:
CODEBLOCK6
4. **Manual review** for AI vocabulary and promotional language (requires judgment)
---
## AI Probability Scoring
| Rating | Criteria |
|--------|----------|
| Very High | Citation bugs, knowledge cutoff, or chatbot artifacts present |
| High | >30 issues OR >5% issue density |
| Medium | >15 issues OR >2% issue density |
| Low | <15 issues AND <2% density |
---
## Customizing Patterns
Edit scripts/patterns.json
to add/modify:
- ai
vocabulary — words to flag
- significanceinflation
— puffery phrases
- promotional
language — marketing speak
- copulaavoidance
— phrase → replacement
- filler
replacements — phrase → simpler form
- chatbotartifacts` — phrases triggering sentence removal
Batch Processing
CODEBLOCK7
Reference
Based on Wikipedia's Signs of AI Writing, maintained by WikiProject AI Cleanup. Patterns documented from thousands of AI-generated text examples.
Key insight: "LLMs use statistical algorithms to guess what should come next. The result tends toward the most statistically likely result that applies to the widest variety of cases."
人机化AI文本
用于检测和转换AI生成文本以绕过检测器的综合命令行工具。基于维基百科的AI写作迹象。
快速开始
bash
检测AI模式
python scripts/detect.py text.txt
转换为类人文本
python scripts/transform.py text.txt -o clean.txt
对比前后效果
python scripts/compare.py text.txt -o clean.txt
检测类别
分析器检查维基百科指南中的16种模式类别:
关键(即时AI检测)
| 类别 | 示例 |
|---|
| 引用错误 | oaicite、turn0search、contentReference |
| 知识截止日期 |
根据我上次训练的数据、基于现有信息 |
| 聊天机器人痕迹 | 希望这能帮到你、好问题!、作为AI |
| Markdown |
粗体、## 标题、 代码块 |
高信号
| 类别 | 示例 |
|---|
| AI词汇 | 深入探讨、织锦、格局、关键、强调、促进 |
| 重要性夸大 |
作为证明、关键时刻、不可磨灭的印记 |
| 宣传性语言 | 活力、开创性、坐落、令人叹为观止 |
| 系词回避 | 用作为替代是,用拥有替代具有 |
中等信号
| 类别 | 示例 |
|---|
| 表面化-ing | 强调重要性、促进合作 |
| 填充短语 |
为了、由于、此外 |
| 模糊归因 | 专家认为、行业报告显示 |
| 挑战公式 | 尽管面临这些挑战、未来展望 |
风格信号
| 类别 | 示例 |
|---|
| 花括号引号 | 替代(ChatGPT特征) |
| 破折号过度使用 |
过度使用—进行强调 |
| 否定平行结构 | 不仅...而且、不仅仅是...更是 |
| 三法则 | 强行使用三连词如创新、灵感与洞察 |
脚本
detect.py — 扫描AI模式
bash
python scripts/detect.py essay.txt
python scripts/detect.py essay.txt -j # JSON输出
python scripts/detect.py essay.txt -s # 仅输出分数
echo text | python scripts/detect.py
输出:
- - 问题数量和字数
- AI概率(低/中/高/极高)
- 按类别细分
- 标记可自动修复的模式
transform.py — 重写文本
bash
python scripts/transform.py essay.txt
python scripts/transform.py essay.txt -o output.txt
python scripts/transform.py essay.txt -a # 激进模式
python scripts/transform.py essay.txt -q # 静默模式
自动修复:
- - 引用错误(oaicite、turn0search)
- Markdown(、##、)
- 聊天机器人语句
- 系词回避 → 是/具有
- 填充短语 → 简化形式
- 花括号引号 → 直引号
激进模式(-a):
compare.py — 前后对比分析
bash
python scripts/compare.py essay.txt
python scripts/compare.py essay.txt -a -o clean.txt
显示转换前后并排的检测分数
工作流程
- 1. 扫描检测风险:
bash
python scripts/detect.py document.txt
- 2. 转换并进行对比:
bash
python scripts/compare.py document.txt -o document_v2.txt
- 3. 验证改进效果:
bash
python scripts/detect.py document_v2.txt -s
- 4. 人工审查AI词汇和宣传性语言(需判断力)
AI概率评分
| 评级 | 标准 |
|---|
| 极高 | 存在引用错误、知识截止日期或聊天机器人痕迹 |
| 高 |
>30个问题 或 >5%问题密度 |
| 中 | >15个问题 或 >2%问题密度 |
| 低 | <15个问题 且 <2%密度 |
自定义模式
编辑scripts/patterns.json以添加/修改:
- - aivocabulary — 需标记的词汇
- significanceinflation — 夸大性短语
- promotionallanguage — 营销用语
- copulaavoidance — 短语→替换词
- fillerreplacements — 短语→简化形式
- chatbotartifacts — 触发句子删除的短语
批量处理
bash
扫描所有文件
for f in *.txt; do
echo === $f ===
python scripts/detect.py $f -s
done
转换所有markdown文件
for f in *.md; do
python scripts/transform.py $f -a -o ${f%.md}_clean.md -q
done
参考
基于维基百科的AI写作迹象,由维基百科AI清理项目维护。模式来源于数千个AI生成文本示例。
关键见解:LLM使用统计算法猜测接下来应该出现什么。结果倾向于适用于最广泛情况的统计最可能结果。