Skill: Low-Resource AI Researcher

ID: 215
Category: AI/ML Research
Language: Python
Framework: PyTorch + PEFT (LoRA/QLoRA) + Transformers

Overview

Based on Parameter-Efficient Fine-Tuning (PEFT) technology, trains high-performance medical domain large language models on consumer-grade GPUs or single A100. Supports advanced fine-tuning methods such as LoRA, QLoRA, optimized for medical text understanding and generation tasks.

Features

- 🚀 Parameter-Efficient Fine-Tuning: LoRA, QLoRA, DoRA support
🏥 Medical Domain Optimized: Pre-configured for medical QA, diagnosis, clinical notes
💻 Low-Resource Ready: Optimized for consumer GPUs (RTX 3090/4090) and single A100
📊 Quantization: 4-bit/8-bit quantization with bitsandbytes
🔄 Multi-Task: Supports SFT, DPO, and medical instruction tuning
📝 Medical Datasets: Built-in support for PubMedQA, MedQA, MIMIC-III

Installation

CODEBLOCK0

Quick Start

CODEBLOCK1

Configuration

Hardware Profiles

Profile	GPU Memory	Quantization	Max Model Size	Batch Size
consumer-24g	24GB (RTX 3090/4090)	QLoRA 4-bit	70B	1-2
a100-40g

40GB (A100) | LoRA 8-bit | 70B | 4-8 | | a100-80g | 80GB (A100) | LoRA 16-bit | 70B | 8-16 | | multi-gpu | 2x A100 | LoRA 16-bit | 70B+ | 16+ |

LoRA Config

CODEBLOCK2

CLI Usage

CODEBLOCK3

API Reference

MedicalPEFTTrainer

CODEBLOCK4

Methods

Method	Description
INLINECODE0	Start fine-tuning with configured parameters
INLINECODE1

Supported Models

- LLaMA 2/3 (7B, 13B, 70B)
Mistral (7B, 8x7B)
Yi (6B, 34B)
Qwen (7B, 14B, 72B)
Baichuan (7B, 13B)
ChatGLM (6B)

Medical Datasets

Dataset	Description	Size
PubMedQA	Biomedical QA	1k QA pairs
MedQA

Performance Benchmarks

Model	Method	GPU	Training Time	MedQA Acc
LLaMA-2-7B	LoRA	A100-40G	2h	58.2%
LLaMA-2-7B

QLoRA | RTX 4090 | 3h | 57.8% | | LLaMA-2-13B | QLoRA | A100-40G | 4h | 62.5% | | Mistral-7B | LoRA | A100-40G | 2.5h | 61.3% |

Best Practices

1. Gradient Accumulation: Use for effective larger batch sizes
Learning Rate: Start with 2e-4 for LoRA, 1e-4 for full fine-tuning
Warmup Steps: 100 steps for medical domain adaptation
Max Length: 2048-4096 for clinical notes, 512-1024 for QA
Data Quality: Filter out low-quality medical data carefully

Troubleshooting

Out of Memory

CODEBLOCK5

Slow Training

CODEBLOCK6

License

This skill follows the license of the underlying models used. Medical applications require compliance with HIPAA/GDPR regulations.

References

1. Hu et al. (2021) - LoRA: Low-Rank Adaptation of Large Language Models
Dettmers et al. (2023) - QLoRA: Efficient Finetuning of Quantized LLMs
Singhal et al. (2023) - Large Language Models Encode Clinical Knowledge

Risk Assessment

Risk Indicator	Assessment	Level
Code Execution	Python/R scripts executed locally	Medium
Network Access

Security Checklist

- [ ] No hardcoded credentials or API keys
[ ] No unauthorized file system access (../)
[ ] Output does not expose sensitive information
[ ] Prompt injection protections in place
[ ] Input file paths validated (no ../ traversal)
[ ] Output directory restricted to workspace
[ ] Script execution in sandboxed environment
[ ] Error messages sanitized (no stack traces exposed)
[ ] Dependencies audited

Prerequisites

CODEBLOCK7

Evaluation Criteria

Success Metrics

- [ ] Successfully executes main functionality
[ ] Output meets quality standards
[ ] Handles edge cases gracefully
[ ] Performance is acceptable

Test Cases

1. Basic Functionality: Standard input → Expected output
Edge Case: Invalid input → Graceful error handling
Performance: Large dataset → Acceptable processing time

Lifecycle Status

- Current Stage: Draft
Next Review Date: 2026-03-06
Known Issues: None
Planned Improvements:

- Performance optimization - Additional feature support

技能：低资源AI研究员

ID： 215
类别： AI/ML研究
语言： Python
框架： PyTorch + PEFT (LoRA/QLoRA) + Transformers

概述

基于参数高效微调（PEFT）技术，在消费级GPU或单张A100上训练高性能医学领域大语言模型。支持LoRA、QLoRA等先进微调方法，针对医学文本理解和生成任务进行了优化。

特性

- 🚀 参数高效微调：支持LoRA、QLoRA、DoRA
🏥 医学领域优化：为医学问答、诊断、临床记录预配置
💻 低资源就绪：针对消费级GPU（RTX 3090/4090）和单张A100优化
📊 量化：使用bitsandbytes实现4位/8位量化
🔄 多任务：支持SFT、DPO和医学指令微调
📝 医学数据集：内置支持PubMedQA、MedQA、MIMIC-III

安装

bash

核心依赖

pip install torch transformers datasets accelerate peft bitsandbytes

训练优化可选

pip install flash-attn --no-build-isolation pip install wandb tensorboard

医学NLP工具

pip install scispacy scikit-learn

快速开始

python
from skills.lowresourceai_researcher.scripts.main import MedicalPEFTTrainer

初始化训练器

trainer = MedicalPEFTTrainer( model_name=meta-llama/Llama-2-7b-hf, task=medical_qa )

使用LoRA训练

trainer.train( outputdir=./medicallora_model, num_epochs=3, batch_size=4, use_qlora=True # 4位量化 )

配置

硬件配置

配置	GPU内存	量化方式	最大模型规模	批大小
consumer-24g	24GB (RTX 3090/4090)	QLoRA 4位	70B	1-2
a100-40g

40GB (A100) | LoRA 8位 | 70B | 4-8 | | a100-80g | 80GB (A100) | LoRA 16位 | 70B | 8-16 | | multi-gpu | 2x A100 | LoRA 16位 | 70B+ | 16+ |

LoRA配置

yaml
lora:
r: 64 # LoRA秩
lora_alpha: 128 # 缩放因子
target_modules: # 应用LoRA的模块
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- up_proj
- down_proj
lora_dropout: 0.05
bias: none
tasktype: CAUSALLM

CLI使用

bash

基础训练

python scripts/main.py \
--modelnameor_path meta-llama/Llama-2-7b-hf \
--dataset medical_qa \
--output_dir ./output \
--use_qlora \
--perdevicetrainbatchsize 4

使用自定义配置

python scripts/main.py --config configs/medical_qlora.yaml

恢复训练

python scripts/main.py --resumefromcheckpoint ./output/checkpoint-1000

API参考

MedicalPEFTTrainer

python
trainer = MedicalPEFTTrainer(
model_name: str, # 基础模型名称/路径
task: str, # 任务类型：medicalqa, diagnosis, clinicalnote
lora_r: int = 64, # LoRA秩
lora_alpha: int = 128, # LoRA alpha
use_qlora: bool = False, # 使用4位量化
target_modules: List[str] = None,
device_map: str = auto,
trustremotecode: bool = True
)

方法

方法	描述
train()	使用配置参数开始微调
evaluate()

支持的模型

- LLaMA 2/3 (7B, 13B, 70B)
Mistral (7B, 8x7B)
Yi (6B, 34B)
Qwen (7B, 14B, 72B)
Baichuan (7B, 13B)
ChatGLM (6B)

医学数据集

数据集	描述	规模
PubMedQA	生物医学问答	1k问答对
MedQA

性能基准

模型	方法	GPU	训练时间	MedQA准确率
LLaMA-2-7B	LoRA	A100-40G	2小时	58.2%
LLaMA-2-7B

QLoRA | RTX 4090 | 3小时 | 57.8% | | LLaMA-2-13B | QLoRA | A100-40G | 4小时 | 62.5% | | Mistral-7B | LoRA | A100-40G | 2.5小时 | 61.3% |

最佳实践

1. 梯度累积：用于实现有效的更大批大小
学习率：LoRA从2e-4开始，全参数微调从1e-4开始
预热步数：医学领域适应100步
最大长度：临床记录2048-4096，问答512-1024
数据质量：仔细过滤低质量医学数据

故障排除

内存不足

python

启用梯度检查点

trainer.train(gradient_checkpointing=True)

减少序列长度

trainer.train(maxseqlength=1024)

对大模型使用DeepSpeed ZeRO-3

训练缓慢

python

启用Flash Attention

trainer.train(useflashattention=True)

在Ampere GPU上使用bf16

trainer.train(bf16=True)

许可证

本技能遵循所使用基础模型的许可证。医学应用需遵守HIPAA/GDPR法规。

参考文献

1. Hu等人(2021) - LoRA：大语言模型的低秩适应
Dettmers等人(2023) - QLoRA：量化LLM的高效微调
Singhal等人(2023) - 大语言模型编码临床知识

风险评估

风险指标	评估	级别
代码执行	Python/R脚本本地执行	中
网络访问

安全检查清单

- [ ] 无硬编码凭据或API密钥
[ ] 无未经授权的文件系统访问(../)
[ ] 输出不暴露敏感信息
[ ] 已实施提示注入保护
[ ] 输入文件路径已验证（无../遍历）
[ ] 输出目录限制在工作空间内
[ ] 脚本在沙盒环境中执行
[ ] 错误消息已清理（不暴露堆栈跟踪）
[ ] 依赖项已审计

先决条件

bash

Python依赖

pip install -r requirements.txt

评估标准

成功指标

- [ ] 成功执行主要功能
[ ] 输出达到质量标准
[ ] 优雅处理边缘情况
[ ] 性能可接受

测试用例

1. 基本功能：标准输入

low-resource-ai-researcher低资源AI研究员

low-resource-ai-researcher

Skill: Low-Resource AI Researcher

Overview

Features

Installation

Quick Start

Configuration

Hardware Profiles

LoRA Config

CLI Usage

API Reference

MedicalPEFTTrainer

Methods

Supported Models

Medical Datasets

Performance Benchmarks

Best Practices

Troubleshooting

Out of Memory

Slow Training

License

References

Risk Assessment

Security Checklist

Prerequisites

Evaluation Criteria

Success Metrics

Test Cases

Lifecycle Status

技能：低资源AI研究员

概述

特性

安装

核心依赖

训练优化可选

医学NLP工具

快速开始

初始化训练器

使用LoRA训练

配置

硬件配置

LoRA配置

CLI使用

基础训练

使用自定义配置

恢复训练

API参考

MedicalPEFTTrainer

方法

支持的模型

医学数据集

性能基准

最佳实践

故障排除

内存不足

启用梯度检查点

减少序列长度

对大模型使用DeepSpeed ZeRO-3

训练缓慢

启用Flash Attention

在Ampere GPU上使用bf16

许可证

参考文献

风险评估

安全检查清单

先决条件

Python依赖

评估标准

成功指标

测试用例

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement