LLM Data Automation for Construction
Overview
Based on DDC methodology (Chapter 2.3), this skill enables automation of construction data processing using Large Language Models (LLM). Instead of manually coding data transformations, you describe what you need in natural language, and the LLM generates the necessary Python/Pandas code.
Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"
"LLM-модели, такие как ChatGPT и LLaMA, позволяют специалистам без глубоких знаний программирования внести свой вклад в автоматизацию и улучшение бизнес-процессов компании."
— DDC Book, Chapter 2.3
Quick Start
Option 1: Use ChatGPT/Claude Online
Simply describe your data processing task in natural language:
CODEBLOCK0
Option 2: Run Local LLM (Ollama)
CODEBLOCK1
Option 3: Use LM Studio (GUI)
- 1. Download from lmstudio.ai
- Install and select a model (e.g., Mistral, LLaMA)
- Start chatting with your local AI
Core Concepts
DataFrame as Universal Format
CODEBLOCK2
LLM Prompts for Construction Tasks
Data Import:
CODEBLOCK3
Data Filtering:
CODEBLOCK4
Data Aggregation:
CODEBLOCK5
Report Generation:
CODEBLOCK6
Common Use Cases
1. Extract Data from PDF Documents
CODEBLOCK7
2. Process BIM Element Data
CODEBLOCK8
3. Cost Estimation Pipeline
CODEBLOCK9
4. Schedule Data Processing
CODEBLOCK10
Local LLM Setup (No Internet Required)
Using Ollama
CODEBLOCK11
Using LlamaIndex for Company Documents
CODEBLOCK12
IDE Recommendations
| IDE | Best For | Features |
|---|
| Jupyter Notebook | Learning, experiments | Interactive cells, visualizations |
| Google Colab |
Free GPU, quick start | Cloud-based, pre-installed libs |
|
VS Code | Professional development | Extensions, GitHub Copilot |
|
PyCharm | Large projects | Advanced debugging, refactoring |
Quick Setup with Jupyter
CODEBLOCK13
Best Practices
- 1. Start Simple: Begin with clear, specific prompts
- Iterate: Refine prompts based on results
- Validate: Always check generated code before running
- Document: Save working prompts for reuse
- Secure: Use local LLM for sensitive company data
Common Prompts Library
Data Import
- - "Read Excel file and show first 10 rows"
- "Import CSV with custom delimiter and encoding"
- "Load multiple Excel sheets into dictionary of DataFrames"
Data Cleaning
- - "Remove duplicate rows based on element_id"
- "Fill missing values with column mean"
- "Convert column to numeric, handling errors"
Data Analysis
- - "Calculate descriptive statistics for numeric columns"
- "Find correlation between cost and duration"
- "Identify outliers using IQR method"
Data Export
- - "Export to Excel with multiple sheets"
- "Save to CSV with specific encoding"
- "Generate formatted PDF report"
Resources
- - Book: "Data-Driven Construction" by Artem Boiko, Chapter 2.3
- Website: https://datadrivenconstruction.io
- Pandas Documentation: https://pandas.pydata.org/docs/
- Ollama: https://ollama.com
- LM Studio: https://lmstudio.ai
- Google Colab: https://colab.research.google.com
Next Steps
- - See
pandas-construction-analysis for advanced Pandas operations - See
pdf-to-structured for document processing - See
etl-pipeline for automated data pipelines - See
rag-construction for RAG implementation with construction documents
技能名称: llm-data-automation
建筑行业LLM数据自动化
概述
基于DDC方法论(第2.3章),本技能利用大型语言模型(LLM)实现建筑数据处理自动化。无需手动编写数据转换代码,只需用自然语言描述需求,LLM即可生成所需的Python/Pandas代码。
书籍参考: Pandas DataFrame и LLM ChatGPT / Pandas DataFrame and LLM ChatGPT
像ChatGPT和LLaMA这样的LLM模型,使没有深厚编程知识的专业人士也能为公司的业务流程自动化和改进做出贡献。
— DDC书籍,第2.3章
快速入门
选项1:使用在线ChatGPT/Claude
直接用自然语言描述数据处理任务:
提示词:编写Python代码,读取包含建筑材料的Excel文件,
筛选数量大于100的行,并保存为CSV文件。
选项2:运行本地LLM(Ollama)
bash
从ollama.com安装Ollama
ollama pull mistral
运行查询
ollama run mistral 编写Pandas代码,根据数量*单价计算总成本
选项3:使用LM Studio(图形界面)
- 1. 从lmstudio.ai下载
- 安装并选择模型(例如Mistral、LLaMA)
- 开始与本地AI对话
核心概念
DataFrame作为通用格式
python
import pandas as pd
建筑项目作为DataFrame
行=元素,列=属性
df = pd.DataFrame({
element_id: [W001, W002, C001],
category: [墙, 墙, 柱],
material: [混凝土, 砖, 钢],
volume_m3: [45.5, 32.0, 8.2],
cost
perm3: [150, 80, 450]
})
计算总成本
df[total
cost] = df[volumem3] * df[cost
perm3]
print(df)
建筑任务的LLM提示词
数据导入:
编写代码导入包含施工进度的Excel文件,
解析日期,并创建Pandas DataFrame
数据筛选:
筛选类别为结构且成本超过预算限额50000的建筑元素
数据聚合:
按楼层对建筑数据进行分组,
计算每层的总体积和总成本
报告生成:
创建按类别分组的材料数量汇总报告,
导出为带格式的Excel文件
常见用例
1. 从PDF文档提取数据
python
向ChatGPT的提示词:
编写代码从PDF中提取表格并转换为DataFrame
import pdfplumber
import pandas as pd
def pdftodataframe(pdf_path):
从PDF文件中提取表格
all_tables = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table:
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)
if all_tables:
return pd.concat(alltables, ignoreindex=True)
return pd.DataFrame()
使用示例
df = pdf
todataframe(construction_spec.pdf)
df.to
excel(extracteddata.xlsx, index=False)
2. 处理BIM元素数据
python
提示词:分析BIM元素,按类别分组,计算体积
import pandas as pd
def analyzebimelements(csv_path):
分析来自CSV导出的BIM元素数据
df = pd.readcsv(csvpath)
# 按类别分组
summary = df.groupby(Category).agg({
Volume: sum,
Area: sum,
ElementId: count
}).rename(columns={ElementId: Count})
return summary
使用示例
summary = analyze
bimelements(revit_export.csv)
print(summary)
3. 成本估算流程
python
提示词:根据工程量和单价创建成本估算
import pandas as pd
def calculatecostestimate(quantitiesdf, pricesdf):
计算项目成本估算
参数:
quantitiesdf: 包含[itemcode, quantity]列的DataFrame
pricesdf: 包含[itemcode, unit_price, unit]列的DataFrame
返回:
包含成本计算的DataFrame
# 合并工程量和价格
result = quantitiesdf.merge(pricesdf, on=item_code, how=left)
# 计算成本
result[totalcost] = result[quantity] * result[unitprice]
# 添加百分比
result[costpercentage] = (result[totalcost] /
result[total_cost].sum() * 100).round(2)
return result
使用示例
quantities = pd.DataFrame({
item_code: [C001, S001, W001],
quantity: [150, 2000, 500]
})
prices = pd.DataFrame({
item_code: [C001, S001, W001],
unit_price: [120, 45, 85],
unit: [m3, kg, m2]
})
estimate = calculatecostestimate(quantities, prices)
print(estimate)
4. 进度数据处理
python
提示词:解析施工进度,计算工期,识别延误
import pandas as pd
from datetime import datetime
def analyzeschedule(schedulepath):
分析施工进度中的延误
df = pd.readexcel(schedulepath)
# 解析日期
df[startdate] = pd.todatetime(df[start_date])
df[enddate] = pd.todatetime(df[end_date])
df[actualend] = pd.todatetime(df[actual_end])
# 计算工期
df[plannedduration] = (df[enddate] - df[start_date]).dt.days
df[actualduration] = (df[actualend] - df[start_date]).dt.days
# 识别延误
df[delaydays] = df[actualduration] - df[planned_duration]
df[isdelayed] = df[delaydays] > 0
return df
使用示例
schedule = analyze
schedule(projectschedule.xlsx)
delayed
tasks = schedule[schedule[isdelayed]]
print(f延误任务数:{len(delayed_tasks)})
本地LLM设置(无需互联网)
使用Ollama
bash
安装
curl -fsSL https://ollama.com/install.sh | sh
下载模型
ollama pull mistral # 通用,70亿参数
ollama pull codellama # 代码专用
ollama pull deepseek-coder # 编码任务最佳
运行
ollama run mistral 编写Pandas代码,根据project_id合并两个DataFrame
使用LlamaIndex处理公司文档
python
将公司文档加载到本地LLM
from llama_index import SimpleDirectoryReader, VectorStoreIndex
读取文件夹中的所有PDF
reader = SimpleDirectoryReader(company_documents/)
documents = reader.load_data()
创建可搜索索引
index = VectorStoreIndex.from_documents(documents)
查询文档
query
engine = index.asquery_engine()
response = query_engine.query(
标准的混凝土配合比规格是什么?
)
print(response)
IDE推荐
| IDE | 最佳用途 | 特点 |
|---|
| Jupyter Notebook | 学习、实验 | 交互式单元格、可视化 |
| Google Colab |
免费GPU、快速启动 | 基于云端、预装库 |
|
VS Code | 专业开发 | 扩展、GitHub Copilot |
|
PyCharm | 大型项目 | 高级调试、重构 |
Jupyter快速设置
bash
pip install jupyter pandas openpyxl pdfplumber
jupyter notebook
最佳实践
- 1. 从简单开始:从清晰、具体的提示词入手
- 迭代优化:根据结果改进提示词
- 验证代码:运行前始终检查生成的代码
- 记录保存:保存有效的提示词以备重复使用
- 确保安全:对敏感公司数据使用本地LLM
常用提示词库
数据导入
- - 读取Excel文件并显示前10行
- 使用自定义分隔符和编码导入CSV
- 将多个Excel工作表加载到DataFrame字典中
数据清洗
- - 根据element_id删除重复行
- 用列均值填充缺失值
- 将列转换为数值类型,处理错误
数据分析