Employee Skills Importer

This skill automates the process of importing employee skills from CSV files into a Supabase database. It parses the CSV, checks what already exists in the database, and generates idempotent SQL scripts to insert missing data.

Overview

The skill performs a 3-step process:

1. Identify and insert missing skill categories - Extract categories from CSV headers, check database, generate INSERT script
Identify and insert missing skills - Extract skills with their categories, check database, generate INSERT script
Generate employee_skills INSERT script - Map employees by name, link skills, create final INSERT statements

CSV Format Requirements

The CSV must have:

- Row 1: Empty or metadata (ignored)
Row 2: Skill category names spanning multiple columns
Row 3+: Individual skill names (column headers, may span multiple rows due to line breaks)
Employee data rows: Employee data with First Name, Last Name in first two columns, followed by skill experience values

Example structure:
CODEBLOCK0

Workflow

Step 1: Skill Categories

1. Parse row 2 to extract unique category names
Query the database to check existing categories:

   SELECT name FROM skill_categories

3. Generate idempotent INSERT for missing categories:

CODEBLOCK2

Step 2: Skills

1. Parse skill name rows and map to categories from row 2
Query database for existing skills:

   SELECT s.name, sc.name as category_name 
   FROM skills s 
   LEFT JOIN skill_categories sc ON s.category_id = sc.id

3. For each skill to insert:

- Find the category_id using a subquery - Generate idempotent INSERT: CODEBLOCK4

Step 3: Employee Skills

1. Parse employee rows (firstname, lastname, skill values)
Query employees table to get employee IDs:

   SELECT id, first_name, last_name FROM employees

3. For each employee, for each skill with non-zero experience:

- Look up employeeid by matching firstname + last_name - Look up skill_id using subquery - CRITICAL: Use TRIM() in WHERE clause to handle whitespace variations in database - Generate INSERT: CODEBLOCK6

Important Notes

Database Schema

- skill_categories table: id (uuid), name (text, unique)
INLINECODE1 table: id (uuid), name (text, unique), categoryid (uuid FK to skillcategories)
INLINECODE2 table: id (uuid), firstname (text), lastname (text)
INLINECODE3 table: id (uuid), employeeid (uuid FK), skillid (uuid FK), yearsofexperience (real)

Idempotency

All generated SQL scripts use ON CONFLICT clauses to ensure they can be run multiple times without errors:

- For categories and skills: INLINECODE5
For employee_skills: INLINECODE6

Data Handling

- Skip employees with zero or empty experience values for a skill
Handle numeric experience values (can be integers or decimals like 0.5, 1.7, etc.)
Clean up skill names by trimming whitespace and removing line breaks
Skip rows where employee lookup fails (employee not found in database)
Handle multi-line CSV cells properly
CRITICAL: Deduplicate employee-skill pairs before generating SQL - Keep the highest years value when duplicates exist
CRITICAL: Automatically correct employee name spellings - Use fuzzy matching to find and correct minor spelling differences (e.g., "Victoriia" → "Viktoriia")
CRITICAL: Trim all employee names - Remove leading/trailing whitespace from all names
CRITICAL: Use TRIM() in SQL WHERE clauses - Database may have extra spaces (e.g., "Yurii Solokha" with 3 spaces)
CRITICAL: Skip employees with no match - If no close match found in database, exclude those records and report them

Error Prevention

- Always use subqueries for foreign key lookups rather than hardcoding UUIDs
Validate that category names match between row 2 and skill lookups
Report any employees from CSV not found in the database
Report any skills that couldn't be mapped to categories

CRITICAL - Prevent Duplicate Key Violations:

1. Before generating the employeeskills INSERT, deduplicate all records by (firstname, lastname, skill)
When duplicates exist, keep the record with the highest yearsof_experience value
This prevents: INLINECODE7

CRITICAL - Automatic Name Correction:

1. Before generating SQL, validate ALL employees exist in the database
For employees not found by exact match:

- Use fuzzy matching (Levenshtein distance or similar) to find close matches in database
- If a close match is found (e.g., "Victoriia" → "Viktoriia"), automatically use the database spelling
- If no close match is found, skip the employee entirely

3. Generate a report showing:

- Employees with automatic corrections applied: "CSV name → Database name"
- Employees skipped (no match found): List with number of skills skipped

4. This prevents: INLINECODE8

Output Format

The skill produces three SQL scripts plus one report file:

1insertcategories.sql
CODEBLOCK7

2insertskills.sql
CODEBLOCK8

3insertemployee_skills.sql
CODEBLOCK9

Execution Steps

When the user provides a CSV file:

1. Parse the CSV structure

- Read the file and validate format - Extract category names from row 2 - Extract skill names from subsequent rows (handling multi-line cells) - Map each skill to its category based on column positions

2. Query existing data from Supabase

- Fetch all existing skill_categories - Fetch all existing skills with their categories - Fetch all employees (id, firstname, lastname)

3. Generate Script 1: Categories

- Compare CSV categories against database - Create INSERT statement for missing categories - Save to file and present to user

4. Generate Script 2: Skills

- Compare CSV skills against database - For missing skills, include category lookup subquery - Create INSERT statement - Save to file and present to user

5. Generate Script 3: Employee Skills

- Parse employee rows - VALIDATE: Compare all CSV employees against database using exact matching - FUZZY MATCH: For non-exact matches, find closest database employee using similarity algorithm - Calculate similarity score for firstname and lastname separately - If combined similarity is above threshold (e.g., 85%), automatically use database name - Track all automatic corrections for reporting - CORRECT: Replace CSV names with database names for matched employees - FILTER: Skip employees with no close match found - DEDUPLICATE: Remove duplicates by (employee, skill), keeping highest years value - Generate INSERT statements using corrected employee names - Generate report showing corrections and skipped employees - Save SQL file and report to outputs directory - Present both files to user

6. Present all files to the user

- Three SQL scripts (1insertcategories.sql, 2insertskills.sql, 3insertemployee_skills.sql) - One report file (skippedemployeesreport.txt) if any employees were skipped - User can execute SQL scripts in order: 1 → 2 → 3 - User should review report to fix name mismatches if needed

Usage Example

User uploads CSV file and says:
"Parse this employee skills CSV and generate SQL insert scripts"

Skill responds:

1. Analyzes the CSV structure
Connects to Supabase SkillsSystem project
Checks existing data in all three tables
Generates three SQL files
Reports summary (e.g., "Found 5 new categories, 23 new skills, generating inserts for 47 employees")
Presents the three SQL files for download

Project Configuration

This skill is configured to work with the Supabase project:

- Project Name: SkillsSystem
Project ID: ypibfhbklinkvybgotef
Region: eu-central-1

The skill automatically connects to this project when executing queries.

Common Errors and Solutions

Error 1: "ON CONFLICT DO UPDATE command cannot affect row a second time"

Cause: Duplicate employee-skill pairs in the generated INSERT statement Solution: The skill now deduplicates all records before generating SQL. If you see this error, it means deduplication was not performed. Prevention: Always deduplicate by (firstname, lastname, skill) and keep the highest years value

Error 2: "null value in column 'employee_id' violates not-null constraint"

Cause: Employee from CSV not found in database (usually due to name spelling differences or whitespace issues) Solution: The skill now:

1. Automatically corrects spelling differences using fuzzy matching
Trims all whitespace from names
Uses TRIM() in SQL WHERE clauses to match database records with extra spaces

Common issues:

- Spelling variations: "Victoriia"↔"Viktoriia", "Karasyov"↔"Karasov"
Extra whitespace in database: "Yurii Solokha" (3 spaces)
Leading/trailing spaces

How it works:

- Compares CSV names against database using similarity algorithm
If close match found (>83% similarity), automatically uses database spelling
Trims all names before comparison
Uses TRIM(first_name) and TRIM(last_name) in SQL to handle database whitespace
If no close match found, skips the employee and reports it

Result: This error should no longer occur as names are automatically corrected and whitespace is handled

Name Matching Algorithm

The skill uses the following approach:

1. Try exact match first (firstname AND lastname)
If no exact match, calculate similarity score using:

- Levenshtein distance or similar algorithm - Handles common variations: "Victoriia"↔"Viktoriia", "Karasyov"↔"Karasov"

3. If similarity > 83% threshold, accept as match
If multiple close matches found, pick the closest one
If no close match, skip the employee
Always trim whitespace from both CSV and database names
Use TRIM() in SQL queries to match records with extra spaces in database

员工技能导入器

该技能自动化了将员工技能从CSV文件导入Supabase数据库的过程。它解析CSV，检查数据库中已存在的内容，并生成幂等SQL脚本以插入缺失数据。

概述

该技能执行3步流程：

1. 识别并插入缺失的技能类别 - 从CSV表头提取类别，检查数据库，生成INSERT脚本
识别并插入缺失的技能 - 提取技能及其类别，检查数据库，生成INSERT脚本
生成employee_skills INSERT脚本 - 按姓名映射员工，关联技能，创建最终INSERT语句

CSV格式要求

CSV必须包含：

- 第1行：空行或元数据（忽略）
第2行：跨多列的技能类别名称
第3行及以上：单个技能名称（列标题，可能因换行跨多行）
员工数据行：员工数据，前两列为名字和姓氏，后跟技能经验值

示例结构：

,,,,,,.NET,,,,,前端,,,Java,,,
名字,姓氏,全名,部门,...,C#,ASP.net,MVC,...,JavaScript,HTML,CSS,...,Java,Spring,...
张三,张三,张三,部门1,...,5,4,3,...,6,6,5,...,0,0,...

工作流程

第1步：技能类别

1. 解析第2行提取唯一的类别名称
查询数据库检查现有类别：

sql SELECT name FROM skill_categories

3. 为缺失类别生成幂等INSERT：

sql INSERT INTO skill_categories (name) VALUES (类别1), (类别2), (类别3) ON CONFLICT (name) DO NOTHING;

第2步：技能

1. 解析技能名称行并映射到第2行的类别
查询数据库现有技能：

sql SELECT s.name, sc.name as category_name FROM skills s LEFT JOIN skillcategories sc ON s.categoryid = sc.id

3. 对每个要插入的技能：

- 使用子查询查找category_id - 生成幂等INSERT： sql INSERT INTO skills (name, category_id) VALUES (C#, (SELECT id FROM skill_categories WHERE name = .NET)), (JavaScript, (SELECT id FROM skill_categories WHERE name = 前端)) ON CONFLICT (name) DO NOTHING;

第3步：员工技能

1. 解析员工行（名字、姓氏、技能值）
查询员工表获取员工ID：

sql SELECT id, firstname, lastname FROM employees

3. 对每个员工，对每个非零经验值的技能：

- 通过匹配firstname + lastname查找employee_id - 使用子查询查找skill_id - 关键：在WHERE子句中使用TRIM()处理数据库中的空白字符差异 - 生成INSERT： sql INSERT INTO employeeskills (employeeid, skillid, yearsof_experience) VALUES ( (SELECT id FROM employees WHERE TRIM(firstname) = John AND TRIM(lastname) = Doe), (SELECT id FROM skills WHERE name = C#), 5 ) ON CONFLICT (employeeid, skillid) DO UPDATE SET yearsofexperience = EXCLUDED.yearsofexperience;

重要说明

数据库架构

- skillcategories表：id (uuid), name (text, unique)
skills表：id (uuid), name (text, unique), categoryid (uuid FK to skillcategories)
employees表：id (uuid), firstname (text), lastname (text)
employeeskills表：id (uuid), employeeid (uuid FK), skillid (uuid FK), yearsofexperience (real)

幂等性

所有生成的SQL脚本使用ON CONFLICT子句确保可多次运行而不出错：

- 对于类别和技能：ON CONFLICT (name) DO NOTHING
对于员工技能：ON CONFLICT (employeeid, skillid) DO UPDATE SET yearsofexperience = EXCLUDED.yearsofexperience

数据处理

- 跳过技能经验值为零或空的员工
处理数值型经验值（可以是整数或小数，如0.5、1.7等）
通过修剪空白和移除换行符清理技能名称
跳过员工查找失败的行（数据库中未找到的员工）
正确处理多行CSV单元格
关键：在生成SQL前对员工-技能对去重 - 存在重复时保留最高年限值
关键：自动纠正员工姓名拼写 - 使用模糊匹配查找并纠正细微拼写差异（例如，Victoriia → Viktoriia）
关键：修剪所有员工姓名 - 移除所有姓名前后的空白字符
关键：在SQL WHERE子句中使用TRIM() - 数据库可能有多余空格（例如，Yurii Solokha有3个空格）
关键：跳过无匹配的员工 - 如果在数据库中未找到接近匹配，排除这些记录并报告

错误预防

- 始终使用子查询进行外键查找，而非硬编码UUID
验证类别名称在第2行和技能查找之间匹配
报告CSV中未在数据库中找到的员工
报告无法映射到类别的技能

关键 - 防止重复键冲突：

1. 在生成employeeskills INSERT之前，按(firstname, lastname, skill)对所有记录去重
存在重复时，保留yearsof_experience值最高的记录
这防止了：ON CONFLICT DO UPDATE command cannot affect row a second time

关键 - 自动名称纠正：

1. 在生成SQL之前，验证所有员工在数据库中存在
对于未通过精确匹配找到的员工：

- 使用模糊匹配（Levenshtein距离或类似算法）在数据库中查找接近匹配
- 如果找到接近匹配（例如，Victoriia → Viktoriia），自动使用数据库中的拼写
- 如果未找到接近匹配，完全跳过该员工

3. 生成报告显示：

- 自动纠正的员工：CSV名称 → 数据库名称
- 跳过的员工（未找到匹配）：列出跳过的技能数量

4. 这防止了：null value in column employee_id violates not-null constraint

输出格式

该技能生成三个SQL脚本加一个报告文件：

1insertcategories.sql
sql
-- 插入缺失的技能类别
INSERT INTO skill_categories (name)
VALUES (.NET), (前端), (Java)
ON CONFLICT (name) DO NOTHING;

2insertskills.sql
sql
-- 插入缺失的技能及类别映射
INSERT INTO skills (name, category_id)
VALUES
(C#, (SELECT id FROM skill_categories WHERE name = .NET)),
(ASP.net, (SELECT id FROM skill_categories WHERE name = .NET)),
(JavaScript, (SELECT id FROM skill_categories WHERE name = 前端))
ON CONFLICT (name) DO NOTHING;

3insertemployee_skills.sql
sql
-- 插入员工技能
-- 记录已去重并仅保留有效员工
-- 在WHERE子句中使用TRIM()处理数据库中的空白字符
INSERT INTO employeeskills (employeeid, skillid, yearsof_experience)
VALUES
(
(SELECT id FROM employees WHERE TRIM(firstname) = John AND TRIM(lastname) = Doe),
(SELECT id FROM skills WHERE name = C#),
5
),
(
(SELECT id FROM employees WHERE TRIM(firstname) = John AND TRIM(lastname) = Doe),
(SELECT id FROM skills WHERE name = JavaScript),
6
)
ON CONFLICT (employeeid, skillid) DO UPDATE
SET yearsofexperience = EXCLUDED.yearsofexperience;

执行步骤

当用户提供CSV文件时：

1. 解析CSV结构

- 读取文件并验证格式 - 从第2行提取类别名称 - 从后续行提取技能名称（处理多行单元格） - 根据列位置将每个技能映射到其类别

2. 从Supabase查询现有数据

- 获取所有现有技能类别 - 获取所有现有技能及其类别 - 获取所有员工（id, firstname, lastname）

3. 生成脚本1：类别

- 比较CSV类别与数据库 - 为缺失类别创建INSERT语句 - 保存到文件并呈现给用户

4. 生成脚本2：技能

- 比较CSV技能与数据库

employee-skills-importer员工技能导入