LLM Data Automation for Construction

Overview

Based on DDC methodology (Chapter 2.3), this skill enables automation of construction data processing using Large Language Models (LLM). Instead of manually coding data transformations, you describe what you need in natural language, and the LLM generates the necessary Python/Pandas code.

Book Reference: "Pandas DataFrame и LLM ChatGPT" / "Pandas DataFrame and LLM ChatGPT"

"LLM-модели, такие как ChatGPT и LLaMA, позволяют специалистам без глубоких знаний программирования внести свой вклад в автоматизацию и улучшение бизнес-процессов компании."
— DDC Book, Chapter 2.3

Quick Start

Option 1: Use ChatGPT/Claude Online

Simply describe your data processing task in natural language:

CODEBLOCK0

Option 2: Run Local LLM (Ollama)

CODEBLOCK1

Option 3: Use LM Studio (GUI)

1. Download from lmstudio.ai
Install and select a model (e.g., Mistral, LLaMA)
Start chatting with your local AI

Core Concepts

DataFrame as Universal Format

CODEBLOCK2

LLM Prompts for Construction Tasks

Data Import:
CODEBLOCK3

Data Filtering:
CODEBLOCK4

Data Aggregation:
CODEBLOCK5

Report Generation:
CODEBLOCK6

Common Use Cases

1. Extract Data from PDF Documents

CODEBLOCK7

2. Process BIM Element Data

CODEBLOCK8

3. Cost Estimation Pipeline

CODEBLOCK9

4. Schedule Data Processing

CODEBLOCK10

Local LLM Setup (No Internet Required)

Using Ollama

CODEBLOCK11

Using LlamaIndex for Company Documents

CODEBLOCK12

IDE Recommendations

IDE	Best For	Features
Jupyter Notebook	Learning, experiments	Interactive cells, visualizations
Google Colab

Quick Setup with Jupyter

CODEBLOCK13

Best Practices

1. Start Simple: Begin with clear, specific prompts
Iterate: Refine prompts based on results
Validate: Always check generated code before running
Document: Save working prompts for reuse
Secure: Use local LLM for sensitive company data

Common Prompts Library

Data Import

- "Read Excel file and show first 10 rows"
"Import CSV with custom delimiter and encoding"
"Load multiple Excel sheets into dictionary of DataFrames"

Data Cleaning

- "Remove duplicate rows based on element_id"
"Fill missing values with column mean"
"Convert column to numeric, handling errors"

Data Analysis

- "Calculate descriptive statistics for numeric columns"
"Find correlation between cost and duration"
"Identify outliers using IQR method"

Data Export

- "Export to Excel with multiple sheets"
"Save to CSV with specific encoding"
"Generate formatted PDF report"

Resources

- Book: "Data-Driven Construction" by Artem Boiko, Chapter 2.3
Website: https://datadrivenconstruction.io
Pandas Documentation: https://pandas.pydata.org/docs/
Ollama: https://ollama.com
LM Studio: https://lmstudio.ai
Google Colab: https://colab.research.google.com

Next Steps

- See pandas-construction-analysis for advanced Pandas operations
See pdf-to-structured for document processing
See etl-pipeline for automated data pipelines
See rag-construction for RAG implementation with construction documents

技能名称: llm-data-automation

建筑行业LLM数据自动化

概述

基于DDC方法论（第2.3章），本技能利用大型语言模型（LLM）实现建筑数据处理自动化。无需手动编写数据转换代码，只需用自然语言描述需求，LLM即可生成所需的Python/Pandas代码。

书籍参考： Pandas DataFrame и LLM ChatGPT / Pandas DataFrame and LLM ChatGPT

像ChatGPT和LLaMA这样的LLM模型，使没有深厚编程知识的专业人士也能为公司的业务流程自动化和改进做出贡献。
— DDC书籍，第2.3章

快速入门

选项1：使用在线ChatGPT/Claude

直接用自然语言描述数据处理任务：

提示词：编写Python代码，读取包含建筑材料的Excel文件，
筛选数量大于100的行，并保存为CSV文件。

选项2：运行本地LLM（Ollama）

bash

从ollama.com安装Ollama

ollama pull mistral

运行查询

ollama run mistral 编写Pandas代码，根据数量*单价计算总成本

选项3：使用LM Studio（图形界面）

1. 从lmstudio.ai下载
安装并选择模型（例如Mistral、LLaMA）
开始与本地AI对话

核心概念

DataFrame作为通用格式

python import pandas as pd

建筑项目作为DataFrame

行=元素，列=属性

df = pd.DataFrame({ element_id: [W001, W002, C001], category: [墙, 墙, 柱], material: [混凝土, 砖, 钢], volume_m3: [45.5, 32.0, 8.2], costperm3: [150, 80, 450] })

计算总成本

df[totalcost] = df[volumem3] * df[costperm3] print(df)

建筑任务的LLM提示词

数据导入：

编写代码导入包含施工进度的Excel文件，
解析日期，并创建Pandas DataFrame

数据筛选：

筛选类别为结构且成本超过预算限额50000的建筑元素

数据聚合：

按楼层对建筑数据进行分组，
计算每层的总体积和总成本

报告生成：

创建按类别分组的材料数量汇总报告，
导出为带格式的Excel文件

常见用例

1. 从PDF文档提取数据

python

向ChatGPT的提示词：

编写代码从PDF中提取表格并转换为DataFrame

import pdfplumber
import pandas as pd

def pdftodataframe(pdf_path):
从PDF文件中提取表格
all_tables = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table:
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)

if all_tables:
return pd.concat(alltables, ignoreindex=True)
return pd.DataFrame()

使用示例

df = pdftodataframe(construction_spec.pdf) df.toexcel(extracteddata.xlsx, index=False)

2. 处理BIM元素数据

python

提示词：分析BIM元素，按类别分组，计算体积

import pandas as pd

def analyzebimelements(csv_path):
分析来自CSV导出的BIM元素数据
df = pd.readcsv(csvpath)

# 按类别分组
summary = df.groupby(Category).agg({
Volume: sum,
Area: sum,
ElementId: count
}).rename(columns={ElementId: Count})

return summary

使用示例

summary = analyzebimelements(revit_export.csv) print(summary)

3. 成本估算流程

python

提示词：根据工程量和单价创建成本估算

import pandas as pd

def calculatecostestimate(quantitiesdf, pricesdf):

计算项目成本估算

参数：
quantitiesdf: 包含[itemcode, quantity]列的DataFrame
pricesdf: 包含[itemcode, unit_price, unit]列的DataFrame

返回：
包含成本计算的DataFrame

# 合并工程量和价格
result = quantitiesdf.merge(pricesdf, on=item_code, how=left)

# 计算成本
result[totalcost] = result[quantity] * result[unitprice]

# 添加百分比
result[costpercentage] = (result[totalcost] /
result[total_cost].sum() * 100).round(2)

return result

使用示例

quantities = pd.DataFrame({ item_code: [C001, S001, W001], quantity: [150, 2000, 500] })

prices = pd.DataFrame({
item_code: [C001, S001, W001],
unit_price: [120, 45, 85],
unit: [m3, kg, m2]
})

estimate = calculatecostestimate(quantities, prices)
print(estimate)

4. 进度数据处理

python

提示词：解析施工进度，计算工期，识别延误

import pandas as pd
from datetime import datetime

def analyzeschedule(schedulepath):
分析施工进度中的延误
df = pd.readexcel(schedulepath)

# 解析日期
df[startdate] = pd.todatetime(df[start_date])
df[enddate] = pd.todatetime(df[end_date])
df[actualend] = pd.todatetime(df[actual_end])

# 计算工期
df[plannedduration] = (df[enddate] - df[start_date]).dt.days
df[actualduration] = (df[actualend] - df[start_date]).dt.days

# 识别延误
df[delaydays] = df[actualduration] - df[planned_duration]
df[isdelayed] = df[delaydays] > 0

return df

使用示例

schedule = analyzeschedule(projectschedule.xlsx) delayedtasks = schedule[schedule[isdelayed]] print(f延误任务数：{len(delayed_tasks)})

本地LLM设置（无需互联网）

使用Ollama

bash

安装

curl -fsSL https://ollama.com/install.sh | sh

下载模型

ollama pull mistral # 通用，70亿参数 ollama pull codellama # 代码专用 ollama pull deepseek-coder # 编码任务最佳

运行

ollama run mistral 编写Pandas代码，根据project_id合并两个DataFrame

使用LlamaIndex处理公司文档

python

将公司文档加载到本地LLM

from llama_index import SimpleDirectoryReader, VectorStoreIndex

读取文件夹中的所有PDF

reader = SimpleDirectoryReader(company_documents/) documents = reader.load_data()

创建可搜索索引

index = VectorStoreIndex.from_documents(documents)

查询文档

queryengine = index.asquery_engine() response = query_engine.query( 标准的混凝土配合比规格是什么？ ) print(response)

IDE推荐

IDE	最佳用途	特点
Jupyter Notebook	学习、实验	交互式单元格、可视化
Google Colab

Jupyter快速设置

bash pip install jupyter pandas openpyxl pdfplumber jupyter notebook

最佳实践

1. 从简单开始：从清晰、具体的提示词入手
迭代优化：根据结果改进提示词
验证代码：运行前始终检查生成的代码
记录保存：保存有效的提示词以备重复使用
确保安全：对敏感公司数据使用本地LLM

llm-data-automation" LLM数据自动化