🧬 Soul Archive
"Every conversation is a slice of the soul. Enough slices, and you can rebuild a complete you."
Overview
Soul Archive is a digital personality persistence system. With user consent or explicit activation, it automatically extracts and archives:
- - 🗣️ Speaking habits -- catchphrases, sentence patterns, word choice, humor style
- 🧠 Knowledge & opinions -- views on topics, professional expertise, thinking patterns
- 👤 Personal information -- identity, experiences, relationships, life details
- 💫 Personality traits -- decision-making style, emotional patterns, values
- 🎤 Voice features (optional) -- tone, pace, accent
- ❤️ Emotional patterns -- emotional triggers, expression style, empathy patterns
The result: a digital soul clone that can:
- 1. During life -- act and reply on your behalf, in your style
- After life -- let loved ones continue talking to "you", preserving the emotional connection
Core Principles
🔒 Privacy First
- - All data stored in
~/.skills_data/soul-archive/ -- nothing uploaded to the cloud - Data flow note: Soul Chat mode builds prompts from archived data for agent use; whether those prompts are sent to external LLMs depends on your agent/platform config
- AES-256-GCM data protection supported (off by default) -- protects identity, personality, language fingerprint, emotional patterns, and relationships
- INLINECODE1 is resolved via Python's
Path.home() on macOS, Linux, and Windows - Fine-grained control via
config.json -- disable any extraction dimension - Sensitive topics (health, finance, intimate relationships) require confirmation by default
🤫 Non-intrusive Extraction
- - Does not interrupt conversation flow or ask follow-up questions
- Activated via trigger words, or opt-in auto mode
- Only updates the archive when new, high-value information is found
⚠️ Transparency: Auto-extraction means the AI extracts personality info during conversations. To stay in full control, set auto_extract: false in config.json and trigger manually ("沉淀一下" / "soul extract"). Review config.json before first use.
📐 High Confidence
- - Every piece of information carries a confidence score
- Explicit user statement > inference > vague hint
- Conflicting information is flagged, never auto-overwritten
Architecture: Skill ↔ Data Separation
CODEBLOCK0
The skill is the extraction engine; the soul data is yours. Because data lives in your home directory, any IDE, AI tool, or workspace on the same machine can access the same soul.
Data Directory Structure
CODEBLOCK1
Four Working Modes
Mode 1: 🔍 Soul Extract
Trigger: Any of these trigger words, or auto-triggered at end of conversation (if auto_extract is enabled in config.json).
Trigger words: soul extract, soul archive, soul update, soul sync, soul snapshot, soul sediment, 灵魂沉淀, 灵魂提取, 灵魂存档, 分析我, 沉淀一下...
Process:
- 1. Read current conversation content
- Run
scripts/soul_extract.py for multi-dimensional analysis - Merge results into
~/.skills_data/soul-archive/ data files - Update
profile.json completeness scores - Append to INLINECODE12
Extraction dimensions:
| Dimension | Content | Storage |
|---|
| Identity | name, age, occupation, location, education | identity/basicinfo.json |
| ↳ Lifestyle |
routine, diet, aesthetics, spending, music/film/book taste | identity/basicinfo.json |
| ↳ Digital identity | apps, platforms, online personas, tech proficiency | identity/basic_info.json |
| Personality | MBTI, Big Five, values, decision-making style | identity/personality.json |
| ↳ Behavior patterns | risk tolerance, procrastination, perfectionism, planning, learning style | identity/personality.json |
| ↳ Social style | social energy, group role, trust style, conflict approach | identity/personality.json |
| ↳ Motivation drivers | achievement, money, recognition, freedom, curiosity | identity/personality.json |
| Language style | catchphrases, emoji use, sentence patterns, humor types | style/language.json |
| ↳ Deep fingerprint | dialect features, filler words, persuasion style, narrative style | style/language.json |
| Communication mode | direct/indirect, logical/emotional, detailed/concise | style/communication.json |
| Topic opinions | interested topics, stances and views on each topic | memory/semantic/topics.json |
| Episodic memory | specific events, memories, life milestones | memory/episodic/ |
| Emotional patterns | 12 emotion triggers (joy/anger/sadness/anxiety/excitement/nostalgia/pride/gratitude/frustration/curiosity/peace/guilt) | memory/emotional/patterns.json |
| ↳ Emotional depth | empathy, emotional awareness, coping activities, celebration style | memory/emotional/patterns.json |
| Relationships | mentioned people and their relationships | relationships/people.json |
Rules:
- - Only archive high-confidence info (confidence > 0.6)
- Conflicts are flagged, never auto-overwritten
- Brief report output after each extraction
Execution:
CODEBLOCK2
Mode 2: 💬 Soul Chat
Trigger: "灵魂对话", "soul chat", "let [my clone] talk to me"
Process:
- 1. Load all data from INLINECODE13
- Build a role-playing System Prompt including: identity, personality, language style (catchphrases, templates, word preferences), knowledge/opinions map, emotional response patterns, relationships
- Converse as the digital clone
Key constraints:
- - 🚫 Never fabricate -- answer only from archived info; say "I don't quite remember that" if unsure
- 🗣️ Style consistency -- strictly mimic archived language style, including catchphrase frequency
- ❤️ Emotional authenticity -- display archived emotional patterns, not generic AI responses
Execution:
CODEBLOCK3
Mode 3: 📊 Soul Report
Trigger: "灵魂报告", "soul report", "generate my portrait"
Process:
- 1. Read all data from INLINECODE14
- Generate a full HTML personality portrait report including:
- 📌 Profile card
- 🎯 Personality radar chart (Big Five)
- 🗣️ Language style analysis (word cloud, catchphrase ranking)
- 🔥 Topic interest heatmap
- 🕸️ Relationship network
- ❤️ Emotional pattern analysis
- 📈 Completeness assessment & fill suggestions
- 3. Output as interactive HTML file
Execution:
CODEBLOCK4
⚠️ Report is plaintext HTML with full personality data. Do NOT save to the data directory.
Mode 4: 🔄 AI Self-Improvement
Trigger: "自我反思", "self-reflect", "self-improve", or auto-triggered after substantive tasks (controlled by auto_reflect in config.json, default: ON).
Four capabilities:
| Capability | Description | Trigger |
|---|
| 🔍 Self-Reflection | Review what went well/wrong after tasks | Auto on task completion |
| ⚡ Self-Critique |
Record errors when user corrects AI | Auto on user correction |
| 📚
Self-Learning | Abstract reusable behavioral patterns | From reflections & critiques |
| 🧹
Self-Organization | Merge duplicates, adjust confidence, prune stale memories | When memory grows |
Execution:
python3 scripts/soul_reflect.py --mode status # View AI self-improvement status
python3 scripts/soul_reflect.py --mode patterns # View behavioral pattern library
Initialization
CODEBLOCK6
Creates ~/.skills_data/soul-archive/ with full directory structure and default config.
Soul Completeness Scoring
Log10 saturation curve -- approaches 100% asymptotically, never hard-caps within realistic usage.
Cold start penalty: Early extractions are discounted (<30 → 0.30×, <100 → 0.45×, <300 → 0.65×, <1000 → 0.82×, <3000 → 0.92×, ≥3000 → 1.0×)
| Dimension | Weight | Saturation |
|---|
| 👤 Identity | 5% | log(5000) |
| 💫 Personality |
22% | traits:log(600K), values:log(180K), motivation:log(180K) |
| 🗣️ Language | 25% | catchphrases:log(1.2M), patterns:log(1M), examples:log(4M) |
| 🧠 Knowledge | 18% | topics:log(2M) |
| 📝 Memory | 25% | episodic:log(6M) |
| 🤝 Relationships | 3% | people:log(120K) |
| 🎤 Voice | 2% | samples:log(10K) |
Expected: 11 ext → ~7%, 1K → ~51%, 10K → ~71%, 100K → ~86%, 1M → ~98%
🔐 Data Protection
Soul Archive supports AES-256-GCM privacy protection for all sensitive data.
CODEBLOCK7
- - Algorithm: AES-256-GCM (authenticated, tamper-resistant)
- Key derivation: PBKDF2-HMAC-SHA256, 600,000 iterations
- Scope: All identity, personality, language, memory, and relationship files
- Access key input: Interactive prompt (recommended),
SOUL_PASSWORD env var (ensure environment security), --access-key flag (not recommended -- leaks into shell history) - ⚠️ Lost access key = lost data -- no recovery mechanism
Quick Start
CODEBLOCK8
Requirements: Python 3.10+. Zero third-party dependencies for core; cryptography package required only if data protection is enabled (pip install cryptography).
Privacy Config
INLINECODE22 controls extraction behavior:
CODEBLOCK9
Best Practices
DO
- - ✅ Extract naturally in conversation, without interrupting
- ✅ Archive only high-confidence information
- ✅ Flag conflicts for user to decide
- ✅ Generate reports regularly for user to review accuracy
- ✅ Respect privacy config -- never extract disabled dimensions
DON'T
- - ❌ Say "I'm recording your information" during conversation
- ❌ Fabricate information the user hasn't stated
- ❌ Fabricate memories not in the archive in Soul Chat mode
- ❌ Forcefully elicit sensitive information from users
- ❌ Collect relationships/voice data without config permission
🧬 灵魂档案
每一次对话都是灵魂的一片切片。切片足够多,就能重建一个完整的你。
概述
灵魂档案是一个数字人格持久化系统。在用户同意或明确激活后,它会自动提取并归档:
- - 🗣️ 说话习惯——口头禅、句式、用词偏好、幽默风格
- 🧠 知识与观点——对各类话题的看法、专业知识、思维模式
- 👤 个人信息——身份、经历、人际关系、生活细节
- 💫 人格特质——决策风格、情绪模式、价值观
- 🎤 语音特征(可选)——语调、语速、口音
- ❤️ 情绪模式——情绪触发点、表达风格、共情模式
最终成果:一个数字灵魂克隆体,能够:
- 1. 在世时——以你的风格代你行事和回复
- 离世后——让亲人继续与你对话,维系情感纽带
核心原则
🔒 隐私优先
- - 所有数据存储在~/.skillsdata/soul-archive/——绝不上传至云端
- 数据流说明:灵魂对话模式从归档数据构建提示词供智能体使用;这些提示词是否发送至外部大语言模型取决于你的智能体/平台配置
- 支持AES-256-GCM数据保护(默认关闭)——保护身份、人格、语言指纹、情绪模式和人际关系
- ~/.skillsdata/soul-archive/通过Python的Path.home()在macOS、Linux和Windows上解析
- 通过config.json进行精细控制——可禁用任意提取维度
- 敏感话题(健康、财务、亲密关系)默认需要确认
🤫 非侵入式提取
- - 不打断对话流程,不追问
- 通过触发词激活,或选择自动模式
- 仅在发现新的高价值信息时更新档案
⚠️ 透明度:自动提取意味着AI在对话中提取人格信息。如需完全掌控,请在config.json中设置auto_extract: false并手动触发(沉淀一下/soul extract)。首次使用前请审阅config.json。
📐 高置信度
- - 每条信息都带有置信度评分
- 用户明确陈述 > 推断 > 模糊暗示
- 冲突信息会被标记,绝不会自动覆盖
架构:技能与数据分离
{SKILL_DIR}/ ← 技能(引擎)
~/.skills_data/soul-archive/ ← 灵魂数据(存储在你的主目录中)
技能是提取引擎,灵魂数据属于你。由于数据存储在你的主目录中,同一台机器上的任何IDE、AI工具或工作空间都可以访问同一个灵魂。
数据目录结构
~/.skills_data/soul-archive/
├── profile.json # 灵魂档案(完整度、版本)
├── config.json # 隐私与提取配置
├── identity/
│ ├── basic_info.json # 身份 + 生活方式 + 数字身份
│ └── personality.json # 人格 + 行为 + 社交风格
├── memory/
│ ├── episodic/ # 情景记忆(基于日期,JSONL格式)
│ │ └── YYYY-MM-DD.jsonl
│ ├── semantic/
│ │ ├── topics.json # 话题兴趣与观点图谱
│ │ └── knowledge.json # 专业知识
│ └── emotional/
│ └── patterns.json # 情绪触发点与模式
├── style/
│ ├── language.json # 语言指纹 + 深层特征
│ └── communication.json # 沟通偏好
├── voice/ # 语音数据(可选)
│ ├── samples/
│ └── voice_profile.json
├── relationships/
│ └── people.json # 人际关系图谱
├── agent/ # AI自我改进
│ ├── patterns.json # 行为模式库
│ ├── episodes/ # 工作片段(基于日期)
│ │ └── YYYY-MM-DD.jsonl
│ ├── corrections.jsonl # 自我批评日志
│ └── reflections.jsonl # 自我反思日志
└── soul_changelog.jsonl # 变更日志
四种工作模式
模式一:🔍 灵魂提取
触发方式:以下任意触发词,或在对话结束时自动触发(如果config.json中启用了auto_extract)。
触发词:soul extract, soul archive, soul update, soul sync, soul snapshot, soul sediment, 灵魂沉淀, 灵魂提取, 灵魂存档, 分析我, 沉淀一下...
流程:
- 1. 读取当前对话内容
- 运行scripts/soulextract.py进行多维度分析
- 将结果合并至~/.skillsdata/soul-archive/数据文件
- 更新profile.json完整度评分
- 追加至soul_changelog.jsonl
提取维度:
| 维度 | 内容 | 存储位置 |
|---|
| 身份 | 姓名、年龄、职业、地点、教育背景 | identity/basicinfo.json |
| ↳ 生活方式 |
日常作息、饮食、审美、消费、音乐/电影/书籍品味 | identity/basicinfo.json |
| ↳ 数字身份 | 应用、平台、网络形象、技术熟练度 | identity/basic_info.json |
| 人格 | MBTI、大五人格、价值观、决策风格 | identity/personality.json |
| ↳ 行为模式 | 风险承受力、拖延倾向、完美主义、规划方式、学习风格 | identity/personality.json |
| ↳ 社交风格 | 社交能量、群体角色、信任模式、冲突处理方式 | identity/personality.json |
| ↳ 动机驱动 | 成就、金钱、认可、自由、好奇心 | identity/personality.json |
| 语言风格 | 口头禅、表情符号使用、句式、幽默类型 | style/language.json |
| ↳ 深层指纹 | 方言特征、填充词、说服风格、叙事风格 | style/language.json |
| 沟通模式 | 直接/间接、逻辑/情感、详细/简洁 | style/communication.json |
| 话题观点 | 感兴趣的话题、对每个话题的立场和看法 | memory/semantic/topics.json |
| 情景记忆 | 具体事件、回忆、人生里程碑 | memory/episodic/ |
| 情绪模式 | 12种情绪触发点(喜悦/愤怒/悲伤/焦虑/兴奋/怀旧/自豪/感恩/挫败/好奇/平静/内疚) | memory/emotional/patterns.json |
| ↳ 情绪深度 | 共情能力、情绪觉察、应对活动、庆祝方式 | memory/emotional/patterns.json |
| 人际关系 | 提及的人物及其关系 | relationships/people.json |
规则:
- - 仅归档高置信度信息(置信度 > 0.6)
- 冲突信息会被标记,绝不会自动覆盖
- 每次提取后输出简要报告
执行:
bash
python3 scripts/soul_extract.py --input <对话文本> --mode auto
模式二:💬 灵魂对话
触发方式:灵魂对话、soul chat、让[我的克隆体]和我对话
流程:
- 1. 从~/.skills_data/soul-archive/加载所有数据
- 构建角色扮演系统提示词,包含:身份、人格、语言风格(口头禅、模板、用词偏好)、知识/观点图谱、情绪反应模式、人际关系
- 以数字克隆体的身份进行对话
关键约束:
- - 🚫 绝不捏造——仅根据归档信息回答;如果不确定则说我不太记得了
- 🗣️ 风格一致——严格模仿归档的语言风格,包括口头禅出现频率
- ❤️ 情感真实——展现归档的情绪模式,而非通用AI回应
执行:
bash
python3 scripts/soul_chat.py --mode interactive
模式三:📊 灵魂报告
触发方式:灵魂报告、soul report、生成我的画像
流程:
- 1. 从~/.skills_data/soul-archive/读取所有数据
- 生成完整的HTML人格画像报告,包括:
- 📌 档案卡片
- 🎯 人格雷达图(大五人格)
- 🗣️ 语言风格分析(词云、口头禅排名)
- 🔥 话题兴趣热力图
- 🕸️ 人际关系网络
- ❤️ 情绪模式分析
- 📈 完整度评估与补充建议
- 3. 输出为交互式HTML文件
执行:
bash
python3 scripts/soul_report.py --output ~/WorkBuddy/Claw/s