Data Vault

Installation

CODEBLOCK0

A persistent data store using the Lance columnar format for fast ML data access.

Quick Start

CODEBLOCK1

Note: list-datasets-info shows dataset metadata (schema, field types, record count) — it does not return the actual data rows. Use read-dataset to retrieve records.

Storage Location

DataSets are created and stored on the current path '.'

Critical Behavior: Data Type Strictness

⚠️ Lance is strict about data types — they CANNOT change after the first record

When you append the first record to a dataset, Lance infers the data type for each field. All subsequent records MUST use the same types.

Example — this FAILS:

CODEBLOCK2

Correct approach — maintain consistent types:

CODEBLOCK3

Why This Matters

Unlike traditional databases that may coerce types, Lance rejects type mismatches. If you store numbers as strings initially, you must always pass strings. Plan your schema carefully.

Initialization Workflow

When starting a session, always initialize by listing existing datasets first:

CODEBLOCK4

Example output:

CODEBLOCK5

Understanding `field_types`

State	Meaning
INLINECODE3 (empty)	Dataset exists but no records yet — types not yet defined
populated

Types are locked — appends must match |

Important: If field_types is empty, the first append will define types. Be deliberate about the first record's types.

Commands Reference

Create Dataset

CODEBLOCK6

Creates a metadata entry. Fields have no types until first append.

Append Record

CODEBLOCK7

Appends one record. Types are inferred from first record.

Batch Append

CODEBLOCK8

Example: INLINECODE5

Update Record

CODEBLOCK9

Updates fields for a specific record by ID.

Delete Record

CODEBLOCK10

List All Datasets

CODEBLOCK11

Get Dataset Info

CODEBLOCK12

Returns schema, field types (if data exists), and record count.

List All Datasets with Full Info

CODEBLOCK13

Recommended for initialization. Returns all datasets with complete metadata.

Get Dataset Path

CODEBLOCK14

Backup Dataset

CODEBLOCK15

Count Records

CODEBLOCK16

Read All Records

Returns all records from the dataset as a list of objects.

CODEBLOCK17

Drop Dataset

Requires confirmation if have not created a backup beforehand.

Delete the entire dataset and its metadata.

CODEBLOCK18

Internal fields available in every dataset:

Field	Type	Description
INLINECODE6	string	UUID — unique record identifier
INLINECODE7

timestamp | When the record was last inserted or updated |

List Records (Paginated)

CODEBLOCK19

Returns records with optional pagination.

Get Single Record

CODEBLOCK20

Retrieves a specific record by its UUID.

Get Dataset Info

CODEBLOCK21

Returns schema, field types (if data exists), and record count.

Response Format

All commands return JSON:

CODEBLOCK22

Internal Fields

Every dataset automatically includes:

- _id — UUID for each record
INLINECODE9 — timestamp of last insert/update

These are managed automatically — when appending, only provide your defined fields.

Data Type Inference

Lance infers types from the first record:

Python Type	Lance Type
INLINECODE10	INLINECODE11
INLINECODE12 (int)

CLI caveat: When passing via command line, all values are strings. To ensure integer types, initialize with actual integers in a script rather than CLI.

Tips

1. Initialize at session start: Run list-datasets-info to understand what data already exists
Plan your schema: First record determines types for the entire dataset
Use batch append when adding multiple records: More efficient than individual appends

Requirements

Dependencies are declared in frontmatter (metadata.openclaw.install) and handled by the OpenClaw install system via uv. The Python packages required are:

- pylance — The Lance columnar format library.

⚠️ Naming note: Despite the PyPI package being named pylance, the library is imported as import lance in Python code. This is the official Lance project naming convention — it is NOT the VS Code "pylance" language server. See lance.org for details.

- pandas — Data manipulation

Data Vault

安装

bash
uv pip install pylance pandas

一个使用 Lance 列式格式的持久化数据存储，用于快速机器学习数据访问。

快速开始

bash

列出所有数据集及其元数据

python3 scripts/command.py list-datasets-info

创建数据集

python3 scripts/command.py create-dataset <名称> <字段1> <字段2> ...

追加数据

python3 scripts/command.py append-to-dataset <名称> <值1> <值2> ...

读取数据集中的所有记录

python3 scripts/command.py read-dataset <名称>

注意： list-datasets-info 显示数据集元数据（模式、字段类型、记录数）——它不返回实际数据行。请使用 read-dataset 来检索记录。

存储位置

数据集创建并存储在当前路径 .

关键行为：数据类型严格性

⚠️ Lance 对数据类型要求严格——它们在第一条记录后不能更改

当你向数据集追加第一条记录时，Lance 会推断每个字段的数据类型。所有后续记录必须使用相同的类型。

示例——这将会失败：

第一条记录：年龄为字符串

append-to-dataset users John 25 john@test.com

第二条记录：年龄为整数（将会失败！）

append-to-dataset users Jane 30 jane@test.com

错误：age 应为 large_string 类型，但实际为 int64

正确做法——保持类型一致：

第一条记录：年龄为字符串

append-to-dataset users John 25 john@test.com

第二条记录：年龄为字符串

append-to-dataset users Jane 30 jane@test.com

为什么这很重要

与传统数据库可能进行类型转换不同，Lance 会拒绝类型不匹配。如果你最初将数字存储为字符串，则必须始终传递字符串。请仔细规划你的模式。

初始化工作流

当开始一个会话时，始终先通过列出已有数据集来初始化：

bash

此命令返回所有数据集及其结构

python3 scripts/command.py list-datasets-info

示例输出：

json
{
skill: data-vault,
operation: listdatasetsinfo,
status: success,
data: [
{
dataset_name: users,
path: /data/users,
fields: [name, age, email],
field_types: {
id: largestring,
updatedat: timestamp[us],
name: large_string,
age: large_string,
email: large_string
},
record_count: 2,
columns: [id, updatedat, name, age, email],
last_updated: 2026-03-21T17:57:44.595628
}
],
error: null
}

理解 field_types

状态	含义
{}（空）	数据集存在但尚无记录——类型尚未定义
已填充

类型已锁定——追加必须匹配 |

重要提示： 如果 field_types 为空，第一次追加将定义类型。请慎重决定第一条记录的类型。

命令参考

创建数据集

bash
python3 scripts/command.py create-dataset <名称> <字段1> <字段2> ...

创建元数据条目。字段在第一次追加前没有类型。

追加记录

bash
python3 scripts/command.py append-to-dataset <名称> <值1> <值2> ...

追加一条记录。类型从第一条记录推断。

批量追加

bash
python3 scripts/command.py batch-append-to-dataset <名称>

示例：batch-append-to-dataset users [[Alice, 22, alice@test.com], [Bob, 35, bob@test.com]]

更新记录

bash
python3 scripts/command.py update-dataset-record <名称> <记录ID> <值1> <值2> ...

按 ID 更新特定记录的字段。

删除记录

bash
python3 scripts/command.py delete-dataset-record <名称> <记录ID>

列出所有数据集

bash
python3 scripts/command.py list-datasets

获取数据集信息

bash
python3 scripts/command.py get-dataset-info <名称>

返回模式、字段类型（如果数据存在）和记录数。

列出所有数据集及完整信息

bash
python3 scripts/command.py list-datasets-info

推荐用于初始化。 返回所有数据集及其完整元数据。

获取数据集路径

bash
python3 scripts/command.py get-dataset-path-info <名称>

备份数据集

bash
python3 scripts/command.py backup-dataset <名称> <备份路径>

计数记录

bash
python3 scripts/command.py count-records <名称>

读取所有记录

以对象列表形式返回数据集中的所有记录。

bash
python3 scripts/command.py read-dataset <名称>

删除数据集

如果事先未创建备份，则需要确认。

删除整个数据集及其元数据。

bash
python3 scripts/command.py drop-dataset <名称>

每个数据集中可用的内部字段：

字段	类型	描述
id	字符串	UUID——唯一记录标识符
updated_at

时间戳 | 记录最后插入或更新的时间 |

列出记录（分页）

bash
python3 scripts/command.py list-records <名称> --limit 10 --offset 0

返回带有可选分页参数的记录。

获取单条记录

bash
python3 scripts/command.py get-record <名称> <记录ID>

通过 UUID 检索特定记录。

获取数据集信息

bash
python3 scripts/command.py get-dataset-info <名称>

返回模式、字段类型（如果数据存在）和记录数。

响应格式

所有命令返回 JSON：

json
{
skill: data-vault,
operation: <操作名称>,
status: success|error,
data: <结果数据或null>,
error: <错误消息或null>
}

内部字段

每个数据集自动包含：

- id——每条记录的 UUID
updated_at——最后插入/更新的时间戳

这些字段自动管理——追加时，只需提供你定义的字段。

数据类型推断

Lance 从第一条记录推断类型：

Python 类型	Lance 类型
string	large_string
25（整数）

CLI 注意事项： 通过命令行传递时，所有值都是字符串。为确保整数类型，请在脚本中使用实际整数进行初始化，而不是通过 CLI。

提示

1. 在会话开始时初始化： 运行 list-datasets-info 了解已存在哪些数据
规划你的模式： 第一条记录决定了整个数据集的类型
添加多条记录时使用批量追加： 比逐条追加更高效

依赖项

依赖项在 frontmatter（metadata.openclaw.install）中声明，并由 OpenClaw 安装系统通过 uv 处理。所需的 Python 包包括：

- pylance——Lance 列式格式库。

⚠️ 命名说明： 尽管 PyPI 包名为 pylance，但在 Python 代码中该库以 import lance 方式导入。这是官方 Lance 项目的命名约定——它不是 VS Code 的 pylance 语言服务器。详情请参见 lance.org。

- pandas——数据处理

data-vault数据存储

data-vault

Data Vault

Installation

Quick Start

Storage Location

Critical Behavior: Data Type Strictness

Why This Matters

Initialization Workflow

Understanding field_types

Commands Reference

Create Dataset

Append Record

Batch Append

Update Record

Delete Record

List All Datasets

Get Dataset Info

List All Datasets with Full Info

Get Dataset Path

Backup Dataset

Count Records

Read All Records

Drop Dataset

Requires confirmation if have not created a backup beforehand.

List Records (Paginated)

Get Single Record

Get Dataset Info

Response Format

Internal Fields

Data Type Inference

Tips

Requirements

Data Vault

安装

快速开始

列出所有数据集及其元数据

创建数据集

追加数据

读取数据集中的所有记录

存储位置

关键行为：数据类型严格性

第一条记录：年龄为字符串

第二条记录：年龄为整数（将会失败！）

错误：age 应为 large_string 类型，但实际为 int64

第一条记录：年龄为字符串

第二条记录：年龄为字符串

为什么这很重要

初始化工作流

此命令返回所有数据集及其结构

理解 field_types

命令参考

创建数据集

追加记录

批量追加

更新记录

删除记录

列出所有数据集

获取数据集信息

列出所有数据集及完整信息

获取数据集路径

备份数据集

计数记录

读取所有记录

删除数据集

如果事先未创建备份，则需要确认。

列出记录（分页）

获取单条记录

获取数据集信息

响应格式

内部字段

数据类型推断

提示

依赖项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

Understanding `field_types`