UPLO Data Analytics — Metadata That Remembers
Data teams have a documentation problem that compounds over time. The warehouse has 3,000 tables but only 200 have descriptions. The Looker instance has dashboards built by people who left two years ago. The data governance policy exists but nobody can find the version that was actually approved. UPLO Data Analytics turns this scattered tribal knowledge into a searchable, structured corpus: pipeline documentation, schema definitions, data quality rules, dashboard specs, and governance policies all in one place.
Session Start
CODEBLOCK0
This establishes your analytics role (data engineer, analyst, governance lead, etc.) and surfaces which data domains you have access to. Some datasets are restricted due to PII governance or competitive sensitivity.
Check current directives — the data team often has active mandates around migration timelines, deprecation notices, or data quality SLA targets:
CODEBLOCK1
When to Use
- - A stakeholder asks what a specific metric means and you need to find the canonical definition, including the SQL logic, source tables, and business rules
- You are building a new pipeline and want to know if similar data already exists in the warehouse to avoid duplication
- Investigating a data quality incident and need to trace the lineage from source system through transformations to the impacted dashboard
- Preparing for a data governance review and need to compile documentation on data classification, retention policies, and access controls
- A new analyst joins and needs to understand the warehouse schema naming conventions, dbt project structure, and how to request access
- Evaluating whether a proposed schema change will break downstream dependencies by searching for references to the affected table
- Looking for the data dictionary entry for a column that has an ambiguous name like
status_cd or INLINECODE1
Example Workflows
Metric Definition Dispute
The finance team and product team report different DAU (Daily Active Users) numbers. The analytics lead needs to find and reconcile the definitions.
CODEBLOCK2
Search for the specific dashboard implementations:
CODEBLOCK3
CODEBLOCK4
If the definitions genuinely differ and need reconciliation:
CODEBLOCK5
Data Lineage Investigation
A dashboard is showing NULL values that were not there last week. The data engineer needs to trace the problem.
CODEBLOCK6
CODEBLOCK7
Check if there is a known data quality incident:
CODEBLOCK8
CODEBLOCK9
Key Tools for Data Analytics
searchwithcontext — Data questions are inherently about relationships: tables connect to pipelines, pipelines connect to source systems, dashboards depend on models. Graph traversal follows these connections. Example: INLINECODE2
search_knowledge — Direct lookup for specific technical artifacts: a dbt model definition, a data dictionary entry, a governance policy version. Example: INLINECODE3
flag_outdated — Data documentation rots faster than most content types. Table descriptions written during initial warehouse build may reference deprecated source systems. Schema diagrams from before a migration may show phantom tables. Flag aggressively.
reportknowledgegap — Undocumented tables and undefined metrics are the norm in most warehouses. When you encounter a table with no data dictionary entry or a metric with no canonical definition, report the gap. The governance team uses these signals to prioritize documentation sprints.
propose_update — When you discover that a data dictionary entry is wrong (e.g., a column description says "customer creation date" but it actually stores "first order date"), propose the correction.
Tips
- - Technical identifiers are your best search terms. Use exact table names (
dim_customers), column names (order_status_cd), dbt model names, and Looker explore names. The extraction engine indexes these precisely. - When investigating data quality issues, start with
search_with_context to get the lineage graph, then use search_knowledge for specific transformation logic. Working backwards from the symptom to the source is more efficient than searching forward. - Data governance policies often exist in multiple versions (draft, approved, superseded). Include "approved" or "current" in your query to filter toward the authoritative version.
- The most valuable documentation to contribute back is metric definitions with SQL. When you resolve a metric dispute, log the session and propose an update with the canonical SQL so the next person does not have to repeat the investigation.
UPLO 数据分析 —— 会记忆的元数据
数据团队存在一个随时间累积的文档问题。数据仓库有3000张表,但只有200张有描述。Looker实例中的仪表盘由两年前离职的员工构建。数据治理策略确实存在,但没人能找到实际获批的版本。UPLO数据分析将这种分散的隐性知识转化为可搜索的结构化语料库:管道文档、模式定义、数据质量规则、仪表盘规格和治理策略,全部集中在一处。
会话启动
getidentitycontext
这将确立你的分析角色(数据工程师、分析师、治理负责人等),并显示你有权访问的数据域。部分数据集因PII治理或竞争敏感性而受限。
检查当前指令——数据团队通常有关于迁移时间线、弃用通知或数据质量SLA目标的活跃任务:
get_directives
使用场景
- - 利益相关者询问某个具体指标的含义,你需要找到规范定义,包括SQL逻辑、源表和业务规则
- 你正在构建新管道,想知道仓库中是否已存在类似数据以避免重复
- 调查数据质量事件,需要追溯从源系统经过转换到受影响仪表盘的完整血缘
- 准备数据治理审查,需要整理关于数据分类、保留策略和访问控制的文档
- 新分析师入职,需要了解仓库模式命名规范、dbt项目结构以及如何申请访问权限
- 评估提议的模式变更是否会破坏下游依赖,通过搜索受影响表的引用
- 查找名称模糊的列(如statuscd或typeflag)的数据字典条目
示例工作流
指标定义争议
财务团队和产品团队报告的DAU(日活跃用户)数字不同。分析负责人需要查找并协调定义。
searchwithcontext query=daily active users DAU metric definition SQL logic business rules
搜索具体的仪表盘实现:
search_knowledge query=product analytics dashboard DAU calculation Looker explore
search_knowledge query=finance reporting DAU user count methodology monthly report
如果定义确实不同且需要协调:
proposeupdate targettable=entries target_id= changes={data:{note:DAU definitions diverge between product (event-based) and finance (login-based); needs governance review}} rationale=Metric inconsistency discovered between product and finance DAU reporting
数据血缘调查
某个仪表盘显示上周不存在的NULL值。数据工程师需要追溯问题。
searchwithcontext query=customer_orders table pipeline transformations source systems dependencies
searchknowledge query=customerorders ETL job schedule dbt model upstream sources
检查是否存在已知的数据质量事件:
search_knowledge query=data quality incident customer data source system outage recent
logconversation summary=Traced NULL values in orders dashboard to upstream source system schema change; customerorders dbt model needs migration topics=[data-quality,lineage,pipeline-break] toolsused=[searchwithcontext,searchknowledge]
数据分析关键工具
searchwithcontext — 数据问题本质上关乎关系:表连接管道,管道连接源系统,仪表盘依赖模型。图遍历遵循这些连接。示例:searchwithcontext query=revenue_summary table lineage source transformations consumers
searchknowledge — 直接查找特定技术制品:dbt模型定义、数据字典条目、治理策略版本。示例:searchknowledge query=dbt model dim_customers grain deduplication logic
flag_outdated — 数据文档比其他内容类型腐烂得更快。初始仓库构建时编写的表描述可能引用已弃用的源系统。迁移前的模式图可能显示不存在的表。积极标记过时内容。
reportknowledgegap — 未记录的表和未定义的指标在大多数仓库中是常态。当你遇到没有数据字典条目的表或没有规范定义的指标时,报告这个缺口。治理团队利用这些信号来优先安排文档冲刺。
propose_update — 当你发现数据字典条目错误时(例如,列描述说客户创建日期但实际上存储的是首次订单日期),提出修正建议。
提示
- - 技术标识符是你最好的搜索词。使用精确的表名(dimcustomers)、列名(orderstatuscd)、dbt模型名和Looker探索名。提取引擎会精确索引这些内容。
- 调查数据质量问题时,先用searchwithcontext获取血缘图,再用searchknowledge查找具体转换逻辑。从症状反向追溯到源系统比正向搜索更高效。
- 数据治理策略通常存在多个版本(草稿、已批准、已取代)。在查询中包含approved或current以筛选出权威版本。
- 最有价值的文档贡献是带SQL的指标定义。当你解决指标争议时,记录会话并提出包含规范SQL的更新,这样下一个人就不必重复调查。