Employee Skills Importer
This skill automates the process of importing employee skills from CSV files into a Supabase database. It parses the CSV, checks what already exists in the database, and generates idempotent SQL scripts to insert missing data.
Overview
The skill performs a 3-step process:
- 1. Identify and insert missing skill categories - Extract categories from CSV headers, check database, generate INSERT script
- Identify and insert missing skills - Extract skills with their categories, check database, generate INSERT script
- Generate employee_skills INSERT script - Map employees by name, link skills, create final INSERT statements
CSV Format Requirements
The CSV must have:
- - Row 1: Empty or metadata (ignored)
- Row 2: Skill category names spanning multiple columns
- Row 3+: Individual skill names (column headers, may span multiple rows due to line breaks)
- Employee data rows: Employee data with First Name, Last Name in first two columns, followed by skill experience values
Example structure:
CODEBLOCK0
Workflow
Step 1: Skill Categories
- 1. Parse row 2 to extract unique category names
- Query the database to check existing categories:
SELECT name FROM skill_categories
- 3. Generate idempotent INSERT for missing categories:
CODEBLOCK2
Step 2: Skills
- 1. Parse skill name rows and map to categories from row 2
- Query database for existing skills:
SELECT s.name, sc.name as category_name
FROM skills s
LEFT JOIN skill_categories sc ON s.category_id = sc.id
- 3. For each skill to insert:
- Find the category_id using a subquery
- Generate idempotent INSERT:
CODEBLOCK4
Step 3: Employee Skills
- 1. Parse employee rows (firstname, lastname, skill values)
- Query employees table to get employee IDs:
SELECT id, first_name, last_name FROM employees
- 3. For each employee, for each skill with non-zero experience:
- Look up employee
id by matching firstname + last_name
- Look up skill_id using subquery
-
CRITICAL: Use TRIM() in WHERE clause to handle whitespace variations in database
- Generate INSERT:
CODEBLOCK6
Important Notes
Database Schema
- -
skill_categories table: id (uuid), name (text, unique) - INLINECODE1 table: id (uuid), name (text, unique), categoryid (uuid FK to skillcategories)
- INLINECODE2 table: id (uuid), firstname (text), lastname (text)
- INLINECODE3 table: id (uuid), employeeid (uuid FK), skillid (uuid FK), yearsofexperience (real)
Idempotency
All generated SQL scripts use
ON CONFLICT clauses to ensure they can be run multiple times without errors:
- - For categories and skills: INLINECODE5
- For employee_skills: INLINECODE6
Data Handling
- - Skip employees with zero or empty experience values for a skill
- Handle numeric experience values (can be integers or decimals like 0.5, 1.7, etc.)
- Clean up skill names by trimming whitespace and removing line breaks
- Skip rows where employee lookup fails (employee not found in database)
- Handle multi-line CSV cells properly
- CRITICAL: Deduplicate employee-skill pairs before generating SQL - Keep the highest years value when duplicates exist
- CRITICAL: Automatically correct employee name spellings - Use fuzzy matching to find and correct minor spelling differences (e.g., "Victoriia" → "Viktoriia")
- CRITICAL: Trim all employee names - Remove leading/trailing whitespace from all names
- CRITICAL: Use TRIM() in SQL WHERE clauses - Database may have extra spaces (e.g., "Yurii Solokha" with 3 spaces)
- CRITICAL: Skip employees with no match - If no close match found in database, exclude those records and report them
Error Prevention
- - Always use subqueries for foreign key lookups rather than hardcoding UUIDs
- Validate that category names match between row 2 and skill lookups
- Report any employees from CSV not found in the database
- Report any skills that couldn't be mapped to categories
CRITICAL - Prevent Duplicate Key Violations:
- 1. Before generating the employeeskills INSERT, deduplicate all records by (firstname, lastname, skill)
- When duplicates exist, keep the record with the highest yearsof_experience value
- This prevents: INLINECODE7
CRITICAL - Automatic Name Correction:
- 1. Before generating SQL, validate ALL employees exist in the database
- For employees not found by exact match:
- Use fuzzy matching (Levenshtein distance or similar) to find close matches in database
- If a close match is found (e.g., "Victoriia" → "Viktoriia"), automatically use the database spelling
- If no close match is found, skip the employee entirely
- 3. Generate a report showing:
- Employees with automatic corrections applied: "CSV name → Database name"
- Employees skipped (no match found): List with number of skills skipped
- 4. This prevents: INLINECODE8
Output Format
The skill produces three SQL scripts plus one report file:
1insertcategories.sql
CODEBLOCK7
2insertskills.sql
CODEBLOCK8
3insertemployee_skills.sql
CODEBLOCK9
Execution Steps
When the user provides a CSV file:
- 1. Parse the CSV structure
- Read the file and validate format
- Extract category names from row 2
- Extract skill names from subsequent rows (handling multi-line cells)
- Map each skill to its category based on column positions
- 2. Query existing data from Supabase
- Fetch all existing skill_categories
- Fetch all existing skills with their categories
- Fetch all employees (id, first
name, lastname)
- 3. Generate Script 1: Categories
- Compare CSV categories against database
- Create INSERT statement for missing categories
- Save to file and present to user
- 4. Generate Script 2: Skills
- Compare CSV skills against database
- For missing skills, include category lookup subquery
- Create INSERT statement
- Save to file and present to user
- 5. Generate Script 3: Employee Skills
- Parse employee rows
-
VALIDATE: Compare all CSV employees against database using exact matching
-
FUZZY MATCH: For non-exact matches, find closest database employee using similarity algorithm
- Calculate similarity score for first
name and lastname separately
- If combined similarity is above threshold (e.g., 85%), automatically use database name
- Track all automatic corrections for reporting
-
CORRECT: Replace CSV names with database names for matched employees
-
FILTER: Skip employees with no close match found
-
DEDUPLICATE: Remove duplicates by (employee, skill), keeping highest years value
- Generate INSERT statements using corrected employee names
-
Generate report showing corrections and skipped employees
- Save SQL file and report to outputs directory
- Present both files to user
- 6. Present all files to the user
- Three SQL scripts (1
insertcategories.sql, 2
insertskills.sql, 3
insertemployee_skills.sql)
- One report file (skipped
employeesreport.txt) if any employees were skipped
- User can execute SQL scripts in order: 1 → 2 → 3
- User should review report to fix name mismatches if needed
Usage Example
User uploads CSV file and says:
"Parse this employee skills CSV and generate SQL insert scripts"
Skill responds:
- 1. Analyzes the CSV structure
- Connects to Supabase SkillsSystem project
- Checks existing data in all three tables
- Generates three SQL files
- Reports summary (e.g., "Found 5 new categories, 23 new skills, generating inserts for 47 employees")
- Presents the three SQL files for download
Project Configuration
This skill is configured to work with the Supabase project:
- - Project Name: SkillsSystem
- Project ID: ypibfhbklinkvybgotef
- Region: eu-central-1
The skill automatically connects to this project when executing queries.
Common Errors and Solutions
Error 1: "ON CONFLICT DO UPDATE command cannot affect row a second time"
Cause: Duplicate employee-skill pairs in the generated INSERT statement
Solution: The skill now deduplicates all records before generating SQL. If you see this error, it means deduplication was not performed.
Prevention: Always deduplicate by (first
name, lastname, skill) and keep the highest years value
Error 2: "null value in column 'employee_id' violates not-null constraint"
Cause: Employee from CSV not found in database (usually due to name spelling differences or whitespace issues)
Solution: The skill now:
- 1. Automatically corrects spelling differences using fuzzy matching
- Trims all whitespace from names
- Uses TRIM() in SQL WHERE clauses to match database records with extra spaces
Common issues:
- - Spelling variations: "Victoriia"↔"Viktoriia", "Karasyov"↔"Karasov"
- Extra whitespace in database: "Yurii Solokha" (3 spaces)
- Leading/trailing spaces
How it works:
- - Compares CSV names against database using similarity algorithm
- If close match found (>83% similarity), automatically uses database spelling
- Trims all names before comparison
- Uses
TRIM(first_name) and TRIM(last_name) in SQL to handle database whitespace - If no close match found, skips the employee and reports it
Result: This error should no longer occur as names are automatically corrected and whitespace is handled
Name Matching Algorithm
The skill uses the following approach:
- 1. Try exact match first (firstname AND lastname)
- If no exact match, calculate similarity score using:
- Levenshtein distance or similar algorithm
- Handles common variations: "Victoriia"↔"Viktoriia", "Karasyov"↔"Karasov"
- 3. If similarity > 83% threshold, accept as match
- If multiple close matches found, pick the closest one
- If no close match, skip the employee
- Always trim whitespace from both CSV and database names
- Use TRIM() in SQL queries to match records with extra spaces in database
员工技能导入器
该技能自动化了将员工技能从CSV文件导入Supabase数据库的过程。它解析CSV,检查数据库中已存在的内容,并生成幂等SQL脚本以插入缺失数据。
概述
该技能执行3步流程:
- 1. 识别并插入缺失的技能类别 - 从CSV表头提取类别,检查数据库,生成INSERT脚本
- 识别并插入缺失的技能 - 提取技能及其类别,检查数据库,生成INSERT脚本
- 生成employee_skills INSERT脚本 - 按姓名映射员工,关联技能,创建最终INSERT语句
CSV格式要求
CSV必须包含:
- - 第1行:空行或元数据(忽略)
- 第2行:跨多列的技能类别名称
- 第3行及以上:单个技能名称(列标题,可能因换行跨多行)
- 员工数据行:员工数据,前两列为名字和姓氏,后跟技能经验值
示例结构:
,,,,,,.NET,,,,,前端,,,Java,,,
名字,姓氏,全名,部门,...,C#,ASP.net,MVC,...,JavaScript,HTML,CSS,...,Java,Spring,...
张三,张三,张三,部门1,...,5,4,3,...,6,6,5,...,0,0,...
工作流程
第1步:技能类别
- 1. 解析第2行提取唯一的类别名称
- 查询数据库检查现有类别:
sql
SELECT name FROM skill_categories
- 3. 为缺失类别生成幂等INSERT:
sql
INSERT INTO skill_categories (name)
VALUES (类别1), (类别2), (类别3)
ON CONFLICT (name) DO NOTHING;
第2步:技能
- 1. 解析技能名称行并映射到第2行的类别
- 查询数据库现有技能:
sql
SELECT s.name, sc.name as category_name
FROM skills s
LEFT JOIN skill
categories sc ON s.categoryid = sc.id
- 3. 对每个要插入的技能:
- 使用子查询查找category_id
- 生成幂等INSERT:
sql
INSERT INTO skills (name, category_id)
VALUES
(C#, (SELECT id FROM skill_categories WHERE name = .NET)),
(JavaScript, (SELECT id FROM skill_categories WHERE name = 前端))
ON CONFLICT (name) DO NOTHING;
第3步:员工技能
- 1. 解析员工行(名字、姓氏、技能值)
- 查询员工表获取员工ID:
sql
SELECT id, first
name, lastname FROM employees
- 3. 对每个员工,对每个非零经验值的技能:
- 通过匹配first
name + lastname查找employee_id
- 使用子查询查找skill_id
-
关键:在WHERE子句中使用TRIM()处理数据库中的空白字符差异
- 生成INSERT:
sql
INSERT INTO employee
skills (employeeid, skill
id, yearsof_experience)
VALUES
(
(SELECT id FROM employees WHERE TRIM(first
name) = John AND TRIM(lastname) = Doe),
(SELECT id FROM skills WHERE name = C#),
5
)
ON CONFLICT (employee
id, skillid) DO UPDATE
SET years
ofexperience = EXCLUDED.years
ofexperience;
重要说明
数据库架构
- - skillcategories表:id (uuid), name (text, unique)
- skills表:id (uuid), name (text, unique), categoryid (uuid FK to skillcategories)
- employees表:id (uuid), firstname (text), lastname (text)
- employeeskills表:id (uuid), employeeid (uuid FK), skillid (uuid FK), yearsofexperience (real)
幂等性
所有生成的SQL脚本使用ON CONFLICT子句确保可多次运行而不出错:
- - 对于类别和技能:ON CONFLICT (name) DO NOTHING
- 对于员工技能:ON CONFLICT (employeeid, skillid) DO UPDATE SET yearsofexperience = EXCLUDED.yearsofexperience
数据处理
- - 跳过技能经验值为零或空的员工
- 处理数值型经验值(可以是整数或小数,如0.5、1.7等)
- 通过修剪空白和移除换行符清理技能名称
- 跳过员工查找失败的行(数据库中未找到的员工)
- 正确处理多行CSV单元格
- 关键:在生成SQL前对员工-技能对去重 - 存在重复时保留最高年限值
- 关键:自动纠正员工姓名拼写 - 使用模糊匹配查找并纠正细微拼写差异(例如,Victoriia → Viktoriia)
- 关键:修剪所有员工姓名 - 移除所有姓名前后的空白字符
- 关键:在SQL WHERE子句中使用TRIM() - 数据库可能有多余空格(例如,Yurii Solokha有3个空格)
- 关键:跳过无匹配的员工 - 如果在数据库中未找到接近匹配,排除这些记录并报告
错误预防
- - 始终使用子查询进行外键查找,而非硬编码UUID
- 验证类别名称在第2行和技能查找之间匹配
- 报告CSV中未在数据库中找到的员工
- 报告无法映射到类别的技能
关键 - 防止重复键冲突:
- 1. 在生成employeeskills INSERT之前,按(firstname, lastname, skill)对所有记录去重
- 存在重复时,保留yearsof_experience值最高的记录
- 这防止了:ON CONFLICT DO UPDATE command cannot affect row a second time
关键 - 自动名称纠正:
- 1. 在生成SQL之前,验证所有员工在数据库中存在
- 对于未通过精确匹配找到的员工:
- 使用模糊匹配(Levenshtein距离或类似算法)在数据库中查找接近匹配
- 如果找到接近匹配(例如,Victoriia → Viktoriia),自动使用数据库中的拼写
- 如果未找到接近匹配,完全跳过该员工
- 3. 生成报告显示:
- 自动纠正的员工:CSV名称 → 数据库名称
- 跳过的员工(未找到匹配):列出跳过的技能数量
- 4. 这防止了:null value in column employee_id violates not-null constraint
输出格式
该技能生成三个SQL脚本加一个报告文件:
1insertcategories.sql
sql
-- 插入缺失的技能类别
INSERT INTO skill_categories (name)
VALUES (.NET), (前端), (Java)
ON CONFLICT (name) DO NOTHING;
2insertskills.sql
sql
-- 插入缺失的技能及类别映射
INSERT INTO skills (name, category_id)
VALUES
(C#, (SELECT id FROM skill_categories WHERE name = .NET)),
(ASP.net, (SELECT id FROM skill_categories WHERE name = .NET)),
(JavaScript, (SELECT id FROM skill_categories WHERE name = 前端))
ON CONFLICT (name) DO NOTHING;
3insertemployee_skills.sql
sql
-- 插入员工技能
-- 记录已去重并仅保留有效员工
-- 在WHERE子句中使用TRIM()处理数据库中的空白字符
INSERT INTO employeeskills (employeeid, skillid, yearsof_experience)
VALUES
(
(SELECT id FROM employees WHERE TRIM(firstname) = John AND TRIM(lastname) = Doe),
(SELECT id FROM skills WHERE name = C#),
5
),
(
(SELECT id FROM employees WHERE TRIM(firstname) = John AND TRIM(lastname) = Doe),
(SELECT id FROM skills WHERE name = JavaScript),
6
)
ON CONFLICT (employeeid, skillid) DO UPDATE
SET yearsofexperience = EXCLUDED.yearsofexperience;
执行步骤
当用户提供CSV文件时:
- 1. 解析CSV结构
- 读取文件并验证格式
- 从第2行提取类别名称
- 从后续行提取技能名称(处理多行单元格)
- 根据列位置将每个技能映射到其类别
- 2. 从Supabase查询现有数据
- 获取所有现有技能类别
- 获取所有现有技能及其类别
- 获取所有员工(id, first
name, lastname)
- 3. 生成脚本1:类别
- 比较CSV类别与数据库
- 为缺失类别创建INSERT语句
- 保存到文件并呈现给用户
- 4. 生成脚本2:技能
- 比较CSV技能与数据库