ML Visualizer

A data toolkit for ingesting, transforming, querying, and visualizing machine learning datasets. Manage your entire data pipeline — from raw ingestion through profiling and validation — all from the command line.

Commands

Command	Description
INLINECODE0	Ingest raw data or record a data source entry
INLINECODE1

Each data command (ingest, transform, query, etc.) works in two modes:

- Without arguments — displays the 20 most recent entries of that type
With arguments — saves the input as a new timestamped entry

Data Storage

All data is stored as plain-text log files in ~/.local/share/ml-visualizer/:

- Each command type gets its own log file (e.g., ingest.log, transform.log, visualize.log)
Entries are stored in timestamp|value format for easy parsing
A unified history.log tracks all activity across command types
Export to JSON, CSV, or TXT at any time with the export command

Set the ML_VISUALIZER_DIR environment variable to override the default data directory.

Requirements

- Bash 4.0+ (uses set -euo pipefail)
Standard Unix utilities: date, wc, du, tail, grep, sed, INLINECODE37
No external dependencies or API keys required

When to Use

1. Building a data pipeline journal — use ingest, transform, and pipeline to document each step of your ML data preparation workflow
Tracking data quality — use validate and profile to log validation checks and profiling runs, ensuring data integrity before model training
Logging visualization requests — use visualize to record what charts and plots you've generated for model diagnostics (confusion matrices, ROC curves, feature importance)
Managing dataset schemas — use schema to document the structure of your datasets, track schema changes over time, and share definitions with your team
Auditing data operations — use search, recent, and stats to review your complete data processing history and find specific operations

Examples

CODEBLOCK0

Output

All commands print results to stdout. Redirect to a file if needed:

CODEBLOCK1

技能名称: Yellowbrick

ML Visualizer

一个用于摄取、转换、查询和可视化机器学习数据集的数据工具包。从原始数据摄取到数据剖析和验证，全程管理您的整个数据管道——全部通过命令行完成。

命令

命令	描述
ml-visualizer ingest <输入>	摄取原始数据或记录数据源条目
ml-visualizer transform <输入>

每个数据命令（ingest、transform、query等）有两种工作模式：

- 无参数 — 显示该类型最近的20条条目
有参数 — 将输入保存为新的带时间戳的条目

数据存储

所有数据以纯文本日志文件形式存储在 ~/.local/share/ml-visualizer/ 中：

- 每种命令类型拥有自己的日志文件（例如 ingest.log、transform.log、visualize.log）
条目以时间戳|值格式存储，便于解析
统一的 history.log 跟踪所有命令类型的活动
随时使用 export 命令导出为 JSON、CSV 或 TXT 格式

设置 MLVISUALIZERDIR 环境变量可覆盖默认数据目录。

要求

- Bash 4.0+（使用 set -euo pipefail）
标准 Unix 工具：date、wc、du、tail、grep、sed、cat
无需外部依赖或 API 密钥

使用场景

1. 构建数据管道日志 — 使用 ingest、transform 和 pipeline 记录 ML 数据准备工作的每一步
跟踪数据质量 — 使用 validate 和 profile 记录验证检查和剖析运行，确保模型训练前的数据完整性
记录可视化请求 — 使用 visualize 记录为模型诊断生成的图表和图形（混淆矩阵、ROC 曲线、特征重要性）
管理数据集模式 — 使用 schema 记录数据集的结构，跟踪模式随时间的变化，并与团队共享定义
审计数据操作 — 使用 search、recent 和 stats 查看完整的数据处理历史并查找特定操作

示例

bash

摄取新的数据源

ml-visualizer ingest 从 s3://ml-data/train.csv 加载训练集 — 50,000 行，24 个特征

记录转换步骤

ml-visualizer transform 对数值列应用 StandardScaler，对分类变量进行独热编码

记录可视化

ml-visualizer visualize 为 RandomForest 分类器生成混淆矩阵 — 准确率 94%

定义模式条目

ml-visualizer schema 用户表：id(int)、age(int)、income(float)、segment(str)、churn(bool)

搜索过去的操作

ml-visualizer search StandardScaler

输出

所有命令将结果打印到标准输出。如有需要，可重定向到文件：

bash
ml-visualizer stats > pipeline-report.txt
ml-visualizer export json

由 BytesAgain 提供支持 | bytesagain.com | hello@bytesagain.com

Yellowbrick黄砖可视化

Yellowbrick

ML Visualizer

Commands

Data Storage

Requirements

When to Use

Examples

Output

ML Visualizer

命令

数据存储

要求

使用场景

示例

摄取新的数据源

记录转换步骤

记录可视化

定义模式条目

搜索过去的操作

输出

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

Yellowbrick黄砖可视化

Yellowbrick

ML Visualizer

Commands

Data Storage

Requirements

When to Use

Examples

Output

ML Visualizer

命令

数据存储

要求

使用场景

示例

摄取新的数据源

记录转换步骤

记录可视化

定义模式条目

搜索过去的操作

输出

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement