🧬 Soul Archive

"Every conversation is a slice of the soul. Enough slices, and you can rebuild a complete you."

Overview

Soul Archive is a digital personality persistence system. With user consent or explicit activation, it automatically extracts and archives:

- 🗣️ Speaking habits -- catchphrases, sentence patterns, word choice, humor style
🧠 Knowledge & opinions -- views on topics, professional expertise, thinking patterns
👤 Personal information -- identity, experiences, relationships, life details
💫 Personality traits -- decision-making style, emotional patterns, values
🎤 Voice features (optional) -- tone, pace, accent
❤️ Emotional patterns -- emotional triggers, expression style, empathy patterns

The result: a digital soul clone that can:

1. During life -- act and reply on your behalf, in your style
After life -- let loved ones continue talking to "you", preserving the emotional connection

Core Principles

🔒 Privacy First

- All data stored in ~/.skills_data/soul-archive/ -- nothing uploaded to the cloud
Data flow note: Soul Chat mode builds prompts from archived data for agent use; whether those prompts are sent to external LLMs depends on your agent/platform config
AES-256-GCM data protection supported (off by default) -- protects identity, personality, language fingerprint, emotional patterns, and relationships
INLINECODE1 is resolved via Python's Path.home() on macOS, Linux, and Windows
Fine-grained control via config.json -- disable any extraction dimension
Sensitive topics (health, finance, intimate relationships) require confirmation by default

🤫 Non-intrusive Extraction

- Does not interrupt conversation flow or ask follow-up questions
Activated via trigger words, or opt-in auto mode
Only updates the archive when new, high-value information is found

⚠️ Transparency: Auto-extraction means the AI extracts personality info during conversations. To stay in full control, set auto_extract: false in config.json and trigger manually ("沉淀一下" / "soul extract"). Review config.json before first use.

📐 High Confidence

- Every piece of information carries a confidence score
Explicit user statement > inference > vague hint
Conflicting information is flagged, never auto-overwritten

Architecture: Skill ↔ Data Separation

CODEBLOCK0

The skill is the extraction engine; the soul data is yours. Because data lives in your home directory, any IDE, AI tool, or workspace on the same machine can access the same soul.

Data Directory Structure

CODEBLOCK1

Four Working Modes

Mode 1: 🔍 Soul Extract

Trigger: Any of these trigger words, or auto-triggered at end of conversation (if auto_extract is enabled in config.json).

Trigger words: soul extract, soul archive, soul update, soul sync, soul snapshot, soul sediment, 灵魂沉淀, 灵魂提取, 灵魂存档, 分析我, 沉淀一下...

Process:

1. Read current conversation content
Run scripts/soul_extract.py for multi-dimensional analysis
Merge results into ~/.skills_data/soul-archive/ data files
Update profile.json completeness scores
Append to INLINECODE12

Extraction dimensions:

Dimension	Content	Storage
Identity	name, age, occupation, location, education	identity/basicinfo.json
↳ Lifestyle

Rules:

- Only archive high-confidence info (confidence > 0.6)
Conflicts are flagged, never auto-overwritten
Brief report output after each extraction

Execution:
CODEBLOCK2

Mode 2: 💬 Soul Chat

Trigger: "灵魂对话", "soul chat", "let [my clone] talk to me"

Process:

1. Load all data from INLINECODE13
Build a role-playing System Prompt including: identity, personality, language style (catchphrases, templates, word preferences), knowledge/opinions map, emotional response patterns, relationships
Converse as the digital clone

Key constraints:

- 🚫 Never fabricate -- answer only from archived info; say "I don't quite remember that" if unsure
🗣️ Style consistency -- strictly mimic archived language style, including catchphrase frequency
❤️ Emotional authenticity -- display archived emotional patterns, not generic AI responses

Execution:
CODEBLOCK3

Mode 3: 📊 Soul Report

Trigger: "灵魂报告", "soul report", "generate my portrait"

Process:

1. Read all data from INLINECODE14
Generate a full HTML personality portrait report including:

- 📌 Profile card
- 🎯 Personality radar chart (Big Five)
- 🗣️ Language style analysis (word cloud, catchphrase ranking)
- 🔥 Topic interest heatmap
- 🕸️ Relationship network
- ❤️ Emotional pattern analysis
- 📈 Completeness assessment & fill suggestions

3. Output as interactive HTML file

Execution:
CODEBLOCK4

⚠️ Report is plaintext HTML with full personality data. Do NOT save to the data directory.

Mode 4: 🔄 AI Self-Improvement

Trigger: "自我反思", "self-reflect", "self-improve", or auto-triggered after substantive tasks (controlled by auto_reflect in config.json, default: ON).

Four capabilities:

Capability	Description	Trigger
🔍 Self-Reflection	Review what went well/wrong after tasks	Auto on task completion
⚡ Self-Critique

Execution:

python3 scripts/soul_reflect.py --mode status    # View AI self-improvement status
python3 scripts/soul_reflect.py --mode patterns  # View behavioral pattern library

Initialization

CODEBLOCK6

Creates ~/.skills_data/soul-archive/ with full directory structure and default config.

Soul Completeness Scoring

Log10 saturation curve -- approaches 100% asymptotically, never hard-caps within realistic usage.

Cold start penalty: Early extractions are discounted (<30 → 0.30×, <100 → 0.45×, <300 → 0.65×, <1000 → 0.82×, <3000 → 0.92×, ≥3000 → 1.0×)

Dimension	Weight	Saturation
👤 Identity	5%	log(5000)
💫 Personality

Expected: 11 ext → ~7%, 1K → ~51%, 10K → ~71%, 100K → ~86%, 1M → ~98%

🔐 Data Protection

Soul Archive supports AES-256-GCM privacy protection for all sensitive data.

CODEBLOCK7

- Algorithm: AES-256-GCM (authenticated, tamper-resistant)
Key derivation: PBKDF2-HMAC-SHA256, 600,000 iterations
Scope: All identity, personality, language, memory, and relationship files
Access key input: Interactive prompt (recommended), SOUL_PASSWORD env var (ensure environment security), --access-key flag (not recommended -- leaks into shell history)
⚠️ Lost access key = lost data -- no recovery mechanism

Quick Start

CODEBLOCK8

Requirements: Python 3.10+. Zero third-party dependencies for core; cryptography package required only if data protection is enabled (pip install cryptography).

Privacy Config

INLINECODE22 controls extraction behavior:

CODEBLOCK9

Best Practices

DO

- ✅ Extract naturally in conversation, without interrupting
✅ Archive only high-confidence information
✅ Flag conflicts for user to decide
✅ Generate reports regularly for user to review accuracy
✅ Respect privacy config -- never extract disabled dimensions

DON'T

- ❌ Say "I'm recording your information" during conversation
❌ Fabricate information the user hasn't stated
❌ Fabricate memories not in the archive in Soul Chat mode
❌ Forcefully elicit sensitive information from users
❌ Collect relationships/voice data without config permission

🧬 灵魂档案

每一次对话都是灵魂的一片切片。切片足够多，就能重建一个完整的你。

概述

灵魂档案是一个数字人格持久化系统。在用户同意或明确激活后，它会自动提取并归档：

- 🗣️ 说话习惯——口头禅、句式、用词偏好、幽默风格
🧠 知识与观点——对各类话题的看法、专业知识、思维模式
👤 个人信息——身份、经历、人际关系、生活细节
💫 人格特质——决策风格、情绪模式、价值观
🎤 语音特征（可选）——语调、语速、口音
❤️ 情绪模式——情绪触发点、表达风格、共情模式

最终成果：一个数字灵魂克隆体，能够：

1. 在世时——以你的风格代你行事和回复
离世后——让亲人继续与你对话，维系情感纽带

核心原则

🔒 隐私优先

- 所有数据存储在~/.skillsdata/soul-archive/——绝不上传至云端
数据流说明：灵魂对话模式从归档数据构建提示词供智能体使用；这些提示词是否发送至外部大语言模型取决于你的智能体/平台配置
支持AES-256-GCM数据保护（默认关闭）——保护身份、人格、语言指纹、情绪模式和人际关系
~/.skillsdata/soul-archive/通过Python的Path.home()在macOS、Linux和Windows上解析
通过config.json进行精细控制——可禁用任意提取维度
敏感话题（健康、财务、亲密关系）默认需要确认

🤫 非侵入式提取

- 不打断对话流程，不追问
通过触发词激活，或选择自动模式
仅在发现新的高价值信息时更新档案

⚠️ 透明度：自动提取意味着AI在对话中提取人格信息。如需完全掌控，请在config.json中设置auto_extract: false并手动触发（沉淀一下/soul extract）。首次使用前请审阅config.json。

📐 高置信度

- 每条信息都带有置信度评分
用户明确陈述 > 推断 > 模糊暗示
冲突信息会被标记，绝不会自动覆盖

架构：技能与数据分离

{SKILL_DIR}/ ← 技能（引擎）
~/.skills_data/soul-archive/ ← 灵魂数据（存储在你的主目录中）

技能是提取引擎，灵魂数据属于你。由于数据存储在你的主目录中，同一台机器上的任何IDE、AI工具或工作空间都可以访问同一个灵魂。

数据目录结构

~/.skills_data/soul-archive/
├── profile.json # 灵魂档案（完整度、版本）
├── config.json # 隐私与提取配置
├── identity/
│ ├── basic_info.json # 身份 + 生活方式 + 数字身份
│ └── personality.json # 人格 + 行为 + 社交风格
├── memory/
│ ├── episodic/ # 情景记忆（基于日期，JSONL格式）
│ │ └── YYYY-MM-DD.jsonl
│ ├── semantic/
│ │ ├── topics.json # 话题兴趣与观点图谱
│ │ └── knowledge.json # 专业知识
│ └── emotional/
│ └── patterns.json # 情绪触发点与模式
├── style/
│ ├── language.json # 语言指纹 + 深层特征
│ └── communication.json # 沟通偏好
├── voice/ # 语音数据（可选）
│ ├── samples/
│ └── voice_profile.json
├── relationships/
│ └── people.json # 人际关系图谱
├── agent/ # AI自我改进
│ ├── patterns.json # 行为模式库
│ ├── episodes/ # 工作片段（基于日期）
│ │ └── YYYY-MM-DD.jsonl
│ ├── corrections.jsonl # 自我批评日志
│ └── reflections.jsonl # 自我反思日志
└── soul_changelog.jsonl # 变更日志

四种工作模式

模式一：🔍 灵魂提取

触发方式：以下任意触发词，或在对话结束时自动触发（如果config.json中启用了auto_extract）。

触发词：soul extract, soul archive, soul update, soul sync, soul snapshot, soul sediment, 灵魂沉淀, 灵魂提取, 灵魂存档, 分析我, 沉淀一下...

流程：

1. 读取当前对话内容
运行scripts/soulextract.py进行多维度分析
将结果合并至~/.skillsdata/soul-archive/数据文件
更新profile.json完整度评分
追加至soul_changelog.jsonl

提取维度：

维度	内容	存储位置
身份	姓名、年龄、职业、地点、教育背景	identity/basicinfo.json
↳ 生活方式

规则：

- 仅归档高置信度信息（置信度 > 0.6）
冲突信息会被标记，绝不会自动覆盖
每次提取后输出简要报告

执行：
bash
python3 scripts/soul_extract.py --input <对话文本> --mode auto

模式二：💬 灵魂对话

触发方式：灵魂对话、soul chat、让[我的克隆体]和我对话

流程：

1. 从~/.skills_data/soul-archive/加载所有数据
构建角色扮演系统提示词，包含：身份、人格、语言风格（口头禅、模板、用词偏好）、知识/观点图谱、情绪反应模式、人际关系
以数字克隆体的身份进行对话

关键约束：

- 🚫 绝不捏造——仅根据归档信息回答；如果不确定则说我不太记得了
🗣️ 风格一致——严格模仿归档的语言风格，包括口头禅出现频率
❤️ 情感真实——展现归档的情绪模式，而非通用AI回应

执行：
bash
python3 scripts/soul_chat.py --mode interactive

模式三：📊 灵魂报告

触发方式：灵魂报告、soul report、生成我的画像

流程：

1. 从~/.skills_data/soul-archive/读取所有数据
生成完整的HTML人格画像报告，包括：

- 📌 档案卡片
- 🎯 人格雷达图（大五人格）
- 🗣️ 语言风格分析（词云、口头禅排名）
- 🔥 话题兴趣热力图
- 🕸️ 人际关系网络
- ❤️ 情绪模式分析
- 📈 完整度评估与补充建议

3. 输出为交互式HTML文件

执行：
bash
python3 scripts/soul_report.py --output ~/WorkBuddy/Claw/s

soul-archive灵魂存档

soul-archive

🧬 Soul Archive

Overview

Core Principles

🔒 Privacy First

🤫 Non-intrusive Extraction

📐 High Confidence

Architecture: Skill ↔ Data Separation

Data Directory Structure

Four Working Modes

Mode 1: 🔍 Soul Extract

Mode 2: 💬 Soul Chat

Mode 3: 📊 Soul Report

Mode 4: 🔄 AI Self-Improvement

Initialization

Soul Completeness Scoring

🔐 Data Protection

Quick Start

Privacy Config

Best Practices

DO

DON'T

🧬 灵魂档案

概述

核心原则

🔒 隐私优先

🤫 非侵入式提取

📐 高置信度

架构：技能与数据分离

数据目录结构

四种工作模式

模式一：🔍 灵魂提取

模式二：💬 灵魂对话

模式三：📊 灵魂报告

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement