Kirk Content Pipeline

Create Twitter content from analyst research PDFs, validated against KSVC holdings.

Pipeline Steps (MANDATORY)

CODEBLOCK0

Never skip steps 4a-4d. Use 1a for multi-PDF screening, 1b for deep extraction, 1c for cross-doc synthesis, 4a for verification, 4a.5 for web cross-validation, 4b for final holdings check, 4c for character voice, 4d for AI pattern removal.

⚠️ CRITICAL: Step 1b extracts data. Step 1c synthesizes across docs. Step 4a VERIFIES the written content. Step 4a.5 CROSS-VALIDATES inferences.

- 1b: "What does each PDF say?" (per-doc extraction)
1c: "What patterns emerge across PDFs?" (cross-doc synthesis)
4a: "Does my draft accurately reflect the sources?" (source-locked verification)
4a.5: "Are the flagged inferences valid per public sources?" (web cross-validation)
4c: "Which Kirk mode fits this situation?" (character voice)

Subagent Permissions (CRITICAL)

Subagents CANNOT Read files outside the project directory. PDFs in /Users/Shared/ksvc/pdfs/ are blocked. The fix: symlink PDFs into the project directory before spawning subagents.

The main agent MUST create a symlink before Step 1a:
CODEBLOCK1

Then subagents Read from .claude/pdfs-scan/filename.pdf — this works because the path resolves inside the project.

Access Method	INLINECODE2 path	Symlinked project path
Subagent Read tool (PDF)	❌ Auto-denied	✅ Works
Subagent Read tool (images)

Discovered 2026-02-07: Subagents fail with "Permission to use Read has been auto-denied (prompts unavailable)" on /Users/Shared/ paths. Symlink into project dir = full Read access. Tested: 19 PDFs, medium thoroughness, 125k tokens, zero errors.

Content Types & Voice Blends

Full guide: references/kirk-voice.md — Read this for templates and examples.

Kirk voice = Serenity's data + Citrini7's wit + Jukan's skepticism + Zephyr's energy.

Type	When	Blend	Key Element
Long Thread	Deep dive, multi-source	Serenity + Jukan	TLDR + skepticism
Quick Take

Quick Formulas

Long Thread: Hook → TLDR → Numbers → Skepticism → Position

Quick Take: Headline number → Context → "If you're looking now..."

Breaking News: "Huge." / "Well well well..." → Key number → Source

Victory Lap: "$TICKER up X% since KSVC added it" → Entry/Now → Thesis validated

Step 1a: Scan PDFs with Explore Agents

Use Explore agents for broad screening when you have many PDFs to review. This is faster than RLM for initial discovery.

Step 1a.0: Check Published Threads (MANDATORY - DO FIRST)

⚠️ Before scanning any PDFs, check what Kirk has already posted.

CODEBLOCK2

For each published thread, note:

- Topic (what was the thesis?)
Source PDFs used (check _metadata.md)
Date (how recent?)

Then when selecting a topic after scanning, REJECT any topic that:

- Uses the same primary source PDF as a published thread
Covers the same thesis/angle (even if from different sources)
Would read as a repeat to Kirk's followers

Acceptable overlap:

- A follow-up/update to a previous thread with NEW data (e.g., earnings confirm the thesis)
A different angle on the same sector (e.g., posted about ABF shortage, now posting about specific company earnings)
Explicitly framed as "update: here's what changed since my last post on X"

Why this exists (Case Study — ABF Substrate, 2026-02-07):

Kirk published a 10-tweet thread on Feb 5 covering Goldman's ABF shortage report (10%→21%→42%, Kinsus/NYPCB/Unimicron). On Feb 7, the pipeline picked the same Goldman report and produced a 3-tweet quick take with the same numbers, same companies, same angle. We didn't check published threads first, so we wasted a pipeline run on duplicate content when 10 other fresh topic angles were available.

When to Use

- Screening 10+ PDFs to find relevant ones
Finding cross-document connections
Building a thesis from multiple sources
Don't know which PDFs matter yet

How to Scan

CODEBLOCK3

Agent Sizing

PDFs	Agents	PDFs/Agent	Expected Time
≤5	1	all	~25s
6-10

2 | ~5 each | ~25s | | 11-15 | 3 | ~5 each | ~25s | | 16-20 | 4 | ~5 each | ~25s | | 21-30 | 5-6 | ~5 each | ~30s |

Why ~5 PDFs per agent? Sweet spot for speed. Each PDF takes ~4-8s to Read + summarize. 5 PDFs ≈ 25s per agent. Adding more PDFs per agent saves nothing (same total tokens) but makes wall-clock time worse.

Cost: Haiku is cheap. 4 agents × 5 PDFs × ~4k tokens = ~80k input tokens total — same as 1 agent doing all 20. Parallelism is free.

Cross-doc synthesis trade-off: Each agent only sees its batch, so cross-batch themes are the main agent's job. This is fine — the main agent merges all results anyway.

Example: Spawn Explore Agents

Step 1: Main agent creates symlink and lists PDFs:
CODEBLOCK4

Step 2: Split filenames into groups and spawn agents in parallel (single message, multiple Task calls):
CODEBLOCK5

Step 3: Main agent synthesizes results from all agents:
After all agents return, the main agent:

1. Merges per-PDF summaries
Identifies cross-agent themes (patterns Agent 1 found + patterns Agent 2 found)
Picks top 3 content angles across all PDFs
Selects 2-5 PDFs for Step 1b deep extraction

Output: Identify Which PDFs Matter

After scanning, you'll know:

- Which reports have the best data
Cross-document connections (e.g., "3 reports confirm memory shortage")
Thesis recommendations (2-3 angles to explore)
Which to deep-extract with RLM

⚠️ WARNING: Explore agents can hallucinate specific numbers. Treat all numbers from Explore summaries as "unverified claims" until RLM grep confirms them. Component counts, percentages, and market sizing are especially prone to errors.

Capacity (tested 2026-02-07): Single Explore agent (haiku) handled 19 PDFs at medium thoroughness in 83 seconds, using 125k tokens (~4k tokens/PDF for pages 1-5). 3 agents in parallel = ~30-40s for the same batch.

Step 1b: Deep Extract with RLM

Use RLM for deep extraction from specific PDFs you've identified in Step 1a.

MANDATORY for any number you'll publish. Explore agents summarize; RLM verifies.

When to Use

- You know which 2-5 PDFs matter most
Need specific numbers, charts, tables
Building cross-document verification tables
Extracting technical details (fabs, yields, WPM)

Single PDF

CODEBLOCK6

Multiple PDFs (synthesis)

CODEBLOCK7

View Extracted Charts/Images

CODEBLOCK8

Charts often contain key data (P/B trends, margin history, capacity timelines) that text extraction misses.

Extraction Validation (MANDATORY)

⚠️ After EVERY rlm_repl.py init, validate the extraction actually worked.

RLM reports chars_extracted after init. A multi-page analyst report should yield thousands of chars. If you get suspiciously few, the PDF is likely image-based and RLM only extracted metadata/headers.

Validation rule:

Chars Extracted	Expected Report Type	Action
> 5,000	Multi-page report	✅ Proceed with grep
1,000 - 5,000

The threshold is context-dependent. A 20-page Goldman Sachs report yielding 666 chars is obviously broken. A 1-page pricing table yielding 800 chars might be fine. Use judgment, but when in doubt, fallback.

Mandatory Fallback when RLM extraction is low:

CODEBLOCK9

⚠️ Path rule: Subagents must Read PDFs via the symlinked project path (.claude/pdfs-scan/), NOT from /Users/Shared/. See "Subagent Permissions" section above.

Why this exists (Case Study — ABF Substrate Shortage, 2026-02-07):

Goldman Sachs published two reports: a main ABF upcycle report (71K chars, extracted fine) and a Kinsus upgrade report (15 pages, but only 666 chars extracted). We skipped the Kinsus PDF because "the main report had everything we needed." It didn't. The Kinsus report had unique data (company-specific capacity plans, margin guidance, order book details) that would have strengthened the thread. Skipping it was lazy — the Read tool fallback takes 30 seconds and would have recovered the data.

Rules:

1. Never skip a relevant PDF just because RLM extraction was low. Use the fallback.
Check extracted images. RLM with --extract-images often saves chart/table images even when text extraction fails. View them with Read tool.
Log the fallback. In the extraction cache, note "extraction_method": "read_fallback" so audit knows the data source.
If fallback also fails (corrupted PDF, DRM), document it and move on. But you must TRY.

RLM Cache: Include Visual Data

When extracting, capture all data types for potential chart generation later:

Source Type	What to Extract	Cache Format
Text numbers	Exact quotes with page refs	INLINECODE13
Tables

Full table as structured JSON | {"columns": [...], "rows": [...], "source": "p.20"} |
| Charts | Data points + source image path | {"data": {...}, "source_image": "pdf-3-1.png", "page": 3} |

Why cache visual data? Step 6 (chart generation) needs this. If you only cache text, you'll lose table structures and chart data points that make great visualizations.

Cross-Document Reasoning

Build thesis by triangulating claims across multiple reports: CODEBLOCK10

Use cross-doc to verify:

- Do multiple sources agree on price forecasts?
Are supply constraint timelines consistent?
Any contradictions between reports?

Step 1b.5: Build Extraction Cache (MANDATORY)

⚠️ Why this step exists: RLM creates state.pkl during extraction, but the writing phase (Step 3) doesn't access it. Without a persistent cache, writers rely on memory, leading to errors like wrong product types, missing time periods, or source attribution mistakes.

What this does: Extracts from state.pkl (RLM's internal format) into structured JSON with context labels that the writing phase can reference.

When to Run

After Step 1b (RLM extraction) and before Step 3 (writing).

Workflow	When to Cache
Single PDF (rlm-repl)	After `rlm_repl.py init` completes
Multiple PDFs (rlm-repl-multi)

After all init commands complete |

How to Build Cache

New in v2: Auto-generates source tags and attribution map from PDF filenames!

Single PDF (rlm-repl):
CODEBLOCK11

Multiple PDFs (rlm-repl-multi):
CODEBLOCK12

With Cross-Doc Synthesis (Optional):
CODEBLOCK13

Synthesis format (optional, for complex multi-source threads):
CODEBLOCK14

What auto-generates:

- ✅ Source tags from PDF filenames ("GFHK - Memory.pdf" → tag: "GFHK")
✅ Topics with primarysource, keymetrics, sourcecontext
✅ Extraction entries with full context labels (producttype, time_period, units, scope)

Cache Format

The cache includes context labels and attribution map to prevent common errors:

CODEBLOCK15

Key fields that prevent errors:

- product_type: Prevents "GB300 rack" when source says "HGX B300 server"
INLINECODE21: Prevents missing "3Q25 → 1Q26E" context
INLINECODE22: Prevents "Goldman's BOM" when data is from GFHK
INLINECODE23: Auto-extracted from PDF filename for quick attribution
INLINECODE24: Prevents "22.5B racks" when source means "22.5bn dollars"
INLINECODE25: Prevents "per rack" when source means "per server"

Attribution map benefits:

- topics: Topic-level mapping showing which source is primary authority
INLINECODE27: Quick lookup of what each source covers
INLINECODE28: Summary of figures, time periods covered
INLINECODE29: Manual insights connecting multiple sources

Integration with Step 3 (Writing)

MANDATORY: Reference the cache when writing.

Step 3a: Load cache and attribution map:
CODEBLOCK16

Step 3b: Write using cache labels and attribution:
CODEBLOCK17

Source: rlm-extraction-cache.json, entry mem_001, mem_002, INLINECODE33

Context labels from cache:

- Product type: HGX B300 8-GPU server (not GB300 rack)
Time period: 3Q25 → 1Q26E (quarterly change)
Source: GFHK Figure 2 (via attribution map tag)

Attribution map usage:

- Used topics["Memory Pricing"]["tag"] → "GFHK"
Verified metrics against key_metrics list
Cross-doc synthesis: See dual_squeeze_thesis for memory + ABF connection

Enforcement

Before saving draft (Step 5), verify:

- [ ] Every published number has a cache entry
[ ] Product types match cache labels
[ ] Time periods included from cache
[ ] Source attributions match cache source_id and attribution map INLINECODE38
[ ] Units match cache (dollars vs racks, per server vs per datacenter)
[ ] Cross-doc claims reference cross_doc_synthesis if applicable

Red flags - stop if you notice:

- Writing numbers from memory instead of cache
Product type differs from cache (product_type field)
Missing time period when cache has INLINECODE41
Attributing to wrong source vs cache INLINECODE42
Using wrong tag (e.g., "Goldman" for GFHK data)
Missing cross-doc synthesis when connecting multiple sources

Manual Cache Building

If automatic extraction fails, manually create cache entries:

CODEBLOCK18

See: ~/.claude/skills/kirk-content-pipeline/scripts/README-extraction-cache.md for full documentation.

Step 1c: Cross-Doc Synthesis (RECOMMENDED)

Why this step exists: Steps 1a and 1b produce per-document facts. Without explicit synthesis, the pipeline gravitates toward single-source claims ("KHGEARS P/E is 20x") rather than cross-doc insights ("Taiwan brokers are more bullish than Western analysts on humanoid robotics").

When to Use

Scenario	Use 1c?
Multiple PDFs on same topic	Yes
Comparing broker views

What 1c Produces

Output Type	Example	Audit Requirement
Consensus claim	"3 of 4 brokers see DRAM ASP rising in 2H26"	Cross-doc (rlm-multi)
Comparative insight

How to Run Cross-Doc Synthesis

CODEBLOCK19

Synthesis Questions to Ask

Category	Questions
Consensus	Do sources agree on [market size / timeline / key risk]?
Comparison

Output Format: Synthesis Cache

After running 1c, document synthesized insights for Step 3 (writing):

CODEBLOCK20

Integration with Audit (Step 4a)

⚠️ CRITICAL: Synthesized claims from Step 1c MUST be flagged for cross-doc audit in Step 4a.

In the audit manifest, mark these claims with cross-doc: true:

CODEBLOCK21

Cross-doc claims use rlm-repl-multi for verification, not parallel single-doc agents.

Extract with Technical Specificity

Go beyond surface numbers. Extract:

- Wafer capacity (WPM)
Fab names (M15X, P4L, X2)
Yield percentages
Process nodes (1b, 1c)
Component counts per unit

Question	Extract
What	One-sentence summary
Why

Step 2: Check KSVC Holdings (Initial)

⚠️ CRITICAL: This is a preliminary check. You MUST run Step 4c (Final Holdings Verification) after writing content to catch any tickers discovered during extraction.

All Models (7 Total)

- US Models: usa-model1 ~ usa-model5 (5 models)
Taiwan (TWSE) Models: twse-model1 ~ twse-model2 (2 models)

Step 2a: Identify All Possible Tickers

Before querying the API, identify ALL possible identifiers for the company:

CODEBLOCK22

Rules:

1. US stocks: Search by ticker only (e.g., "MU", "AMD", "NVDA")
Taiwan stocks: Search by stock code (e.g., "3443") - may appear as "3443 創意" in API
If unsure: Check both US and TWSE models

Step 2b: Query All 7 Models

NEVER assume a stock isn't held without checking ALL 7 models.

RECOMMENDED: Use tradebook for accurate entry prices and current status

CODEBLOCK23

⚠️ CRITICAL: API's todayPrice and profitPercent can be STALE (hours or days old). Always verify current price with Yahoo Finance API (Step 2d).

FALLBACK: Check equitySeries (slower, less data)

CODEBLOCK24

Why still use equitySeries?

1. Historical tracking: Shows return % evolution over time (.data[] array)
Verification: Confirms position is still active
Fallback: If tradebook is unavailable or empty
Entry date discovery: First data point (return ≈ 0) indicates entry date

Example: Finding entry date from equitySeries
CODEBLOCK25

Step 2c: Verification and Fallback Strategy

Use all three data sources for robustness:

Data Source	When to Use	What It Shows	Limitation
tradebook	Primary	Entry date, entry price, exit status	INLINECODE49 may be stale
equitySeries

Recommended workflow:

CODEBLOCK26

Cross-verification example:

CODEBLOCK27

Fallback: Check filledOrders (if tradebook empty)

If equitySeries is empty OR tradebook is empty (rare, but possible after model reset):

CODEBLOCK28

When data sources disagree:

Scenario	Action
tradebook shows position, equitySeries doesn't	Trust tradebook (equitySeries may lag)
equitySeries shows position, tradebook doesn't

Step 2e: Document Holdings with Accurate Returns

CRITICAL: Always calculate actual returns using:

1. Entry price from INLINECODE50
Current price from Yahoo Finance API (NOT KSVC API's stale todayPrice)

Output format (with accurate data):
CODEBLOCK29

If NOT held in any model:
CODEBLOCK30

Integration Strategies

Situation	Approach	Example
Held (US)	Call out position	"KSVC Model1 holds $MU at $412 entry"
Held (TW)

Step 2d: Current Price Check (Yahoo Finance API - REQUIRED)

⚠️ CRITICAL: ALWAYS use Yahoo Finance for current prices. KSVC API's todayPrice can be stale.

US stocks:
CODEBLOCK31

Taiwan stocks (use .TW or .TWO suffix):
CODEBLOCK32

Taiwan ticker suffixes:

- .TW - Listed on Taiwan Stock Exchange (TWSE)
INLINECODE54 - Listed on Taipei Exchange (TPEx/OTC)

Calculate actual gain (not API's stale profit%):
CODEBLOCK33

Complete workflow (tradebook + Yahoo Finance):

# 1. Get entry price from tradebook
ENTRY=$(curl -s "https://kicksvc.online/api/twse-model2" | jq '.tradebook[] | select(.ticker == 6285) | .enterPrice')

# 2. Get current price from Yahoo Finance
CURRENT=$(curl -s -A "Mozilla/5.0" "https://query1.finance.yahoo.com/v8/finance/chart/6285.TW?interval=1d&range=1d" | jq '.chart.result[0].meta.regularMarketPrice')

# 3. Calculate actual gain
echo "Entry: NT\$$ENTRY | Current: NT\$$CURRENT | Gain: $(awk "BEGIN {printf \"%.1f\", ($CURRENT - $ENTRY) / $ENTRY * 100}")%"

Step 3: Write Content

See references/kirk-voice.md for full templates and examples.

Thread Numbering Convention

Format	When to Use
No number on Tweet 1	Recommended - cleaner hook, stands alone if quoted/shared
INLINECODE56, `3/`, etc.

Standard thread format - signals "2 of N" | | 1/ on first tweet | Optional - explicit "thread incoming" signal |

Why skip number on first tweet:

- Hook tweet often gets shared standalone
"1/" makes it look incomplete out of context
Cleaner visual presentation

Format preference: Use / not ) - it's the established Twitter thread convention.

CODEBLOCK35

Pick Content Type

1. What kind of content? (Thread / Quick Take / Breaking / Shitpost / Commentary / Victory Lap)
Look up the formula in kirk-voice.md
Apply the blend

Technical Specificity

❌ Vague: "NAND supply is tight"

✅ Specific: "YMTC adding 135k WPM at Wuhan Fab 3. Still won't close the gap - Samsung X2 conversion delayed to Q2."

❌ Vague: "HBM margins are good"

✅ Specific: "SK Hynix HBM yields at 80-90%. Samsung stuck at 60% on 1c DRAM."

Always include: specific numbers, time frames, fab names, comparisons.

Referential Clarity (Learned 2026-02-08)

Never use vague pronouns or shorthand when the referent hasn't been introduced.

In thread format, each tweet may be read semi-independently. If earlier tweets discuss a concept as a category (e.g., "ASIC revenue"), don't suddenly refer to it as "the project" in a later tweet — the reader has no antecedent for "the project."

❌ Vague: "MS thinks the project is the 3nm Google TPU"
(What project? The thread never introduced "a project.")

✅ Clear: "MS thinks the main client/program is the 3nm Google TPU"
(Names what MS is identifying — who's buying and what they're building.)

Rule: When a shorthand ("the project", "this deal", "the play") saves words but costs clarity, it's not saving anything. Name the thing directly. A few extra words that prevent the reader from pausing to re-read are always worth it.

When shifting from category to specific: If the thread discusses an abstract category (ASIC revenue, memory supply) and then pivots to a specific entity (Google TPU, Samsung fab), bridge the transition. Don't assume the reader already knows which specific thing drives the category.

Step 4a: Audit (MANDATORY — MUST USE SUBAGENTS)

⚠️ WHY THIS STEP EXISTS: We learned that RLM extraction (Step 1b) is not the same as verification. Explore agents hallucinate numbers. Writers make inferences. This step catches errors BEFORE publishing.

⚠️ STRUCTURAL GATE: You (the main agent) are the WRITER. You cannot also be the AUDITOR. You MUST delegate audit to fresh-context subagents. See the "WARM STATE TRAP" section in the audit-content skill for why.

Step 4a Process (3 actions, in order)

Action 1: Generate audit manifest
CODEBLOCK36

Action 2: Spawn Explore agents (MANDATORY — do NOT skip this)
CODEBLOCK37

⚠️ WARM STATE TRAP: If RLM is already loaded from Step 1b, you WILL be tempted to "just grep it yourself." DO NOT. The audit-content skill explains why: you wrote the draft, so you already "know" the answers. Self-auditing is confirmation bias, not verification.

Self-check: If you are about to type rlm_repl.py exec during Step 4a, STOP. You are skipping the gate.

Action 3: Collect results and write audit report
CODEBLOCK38

Invoke the audit-content skill for full process details:

CODEBLOCK39

What Gets Verified

Claim Type	Example	How to Verify
Company names	"KHGEARS"	RLM grep + TWSE API
Ticker formats

When to Proceed

- All PASS: Save draft (Step 5)
Any FAIL: Fix the claim, re-audit
UNSOURCED: Either remove, add caveat ("reportedly"), or find source

Do NOT save draft with FAIL status. UNSOURCED claims need explicit decision.

Step 4a.5: Gemini Web Cross-Validation (RECOMMENDED)

⚠️ WHY THIS STEP EXISTS: RLM audit (Step 4a) is source-locked — it only checks claims against the cited PDF. This over-flags reasonable inferences that go beyond one report but are well-documented publicly. Step 4a.5 gives flagged claims a second chance via web-grounded search.

Case Study (Old Memory Squeeze, 2026-02-07):

- Draft said "capacity getting cannibalized for HBM and DDR5"
RLM audit FAIL: MS report says "exiting DDR4" / "cannibalization" but doesn't name HBM/DDR5 as destination
Gemini confirmed: TrendForce, DigiTimes, The Elec all document the DDR4→HBM/DDR5 shift
Result: Claim restored with dual attribution (MS + public sources)

Same thread: "Samsung, Kioxia, Micron all reducing MLC NAND" — MS only confirmed Samsung, said Kioxia/Micron "could" reduce. Gemini confirmed all three are actively reducing per TrendForce (41.7% YoY MLC NAND capacity decrease).

When to Use

RLM Audit Result	Use Gemini?	Why
FAIL — wrong number	No	Number errors need source correction, not web search
FAIL — inference beyond source

How to Run

CODEBLOCK40

Key: Ask Gemini to search the web explicitly. Without "search the web", Gemini may read local files instead.

Decision Matrix

Gemini Result	Action
Confirmed with public sources	Restore claim, add dual attribution (`source + Gemini web`)
Partially confirmed

Audit Report Format

Update claims restored via Gemini:

CODEBLOCK41

Include Gemini's sources in the audit resolution log:
CODEBLOCK42

Guidelines

- Only use for inferences and industry knowledge claims, NOT for number verification
Gemini's web search is a second opinion, not the final word — present both perspectives
If Gemini contradicts both the source and common sense, flag for human review
Keep Gemini queries specific and focused — one claim per query

Step 4b: Final Holdings Verification (MANDATORY)

⚠️ CRITICAL: This is the FINAL holdings check. You MUST run this after writing content because:

1. Tickers/stock codes may be discovered during extraction (Step 1b)
Company names may be clarified during audit (Step 4b)
Step 2 was a preliminary check with limited information

Why This Step Exists

Problem: You might learn the correct ticker late in the pipeline.

Example - GUC Case:

- Step 1b: Learn company is "Global Unichip (GUC)"
Step 1b: Extract ticker "3443 TW" from report
Step 2: ❌ Assumed "not held" without actually checking TWSE models
Step 3: Wrote "I don't have a position here"
Step 4c: ✅ Discovered GUC IS held in TWSE Model 1 (+2.22%)
Result: Had to rewrite content to reflect actual position

Step 4c Process

1. Extract ALL tickers/identifiers from the draft:
CODEBLOCK43

2. For EACH ticker, check ALL 7 models:

CODEBLOCK44

3. Compare Step 2 vs Step 4c results:

CODEBLOCK45

4. If holdings status changed, update draft:

CODEBLOCK46

Decision Matrix

Step 2	Step 4c	Action
Not held	Not held	✅ No change needed
Not held

Output Format

CODEBLOCK47

Step 4c: Stylize (MANDATORY)

Why this step exists: The data backbone (Step 3) is Serenity-heavy - precise, comprehensive, verified facts. Step 4c transforms it into Kirk's authentic voice with emotional range and character.

Invoke the kirk-mode skill:
CODEBLOCK48

What Kirk Mode Does

Transforms verified data into Kirk's voice by:

1. Mode selection - Matches Kirk's emotional mode to situation (Analytical, Sarcastic, Emo, Shitpost, Degen, GIF Master)
Voice elements - Adds discovery moments ("Wait though"), reactions ("wayyy bigger"), first-person thesis
Meme culture - Integrates fintwit slang (ngmi, wagmi, brother, probably nothing) strategically
Anti-formula - Rotates structure to prevent templating (varies TLDR → "ok so" → question)
Credibility balance - Online enough to relate, credible enough to trust

When to Use Each Mode

Situation	Kirk Mode	Example
Deep fundamental dive	Analytical	"ok so", "Wait though", data-heavy with reactions
Market absurdity

Most natural: Mix modes in single post (Analytical + Sarcastic + maybe GIF)

Workflow

1. Assess situation - What's happening? (Deep dive, absurd market, position down, quick reaction)
Select mode(s) - Use kirk-mode decision tree or mix modes naturally
Apply voice toolkit - Discovery moments, strategic "wayyy", emphasis markers
Check meme integration - Would slang/GIF enhance or distract from analysis?
Verify authenticity - Read aloud: sounds like intern at bar or ChatGPT report?

Output: Transformed content with Kirk's character voice - ready for humanizer pass.

See kirk-mode skill for:

- Complete mode descriptions with examples
Meme vocabulary and format templates
Anti-formula principles
Credibility boundaries

Step 4d: Humanize (MANDATORY)

Note: Humanizer runs AFTER stylize to remove any AI patterns that slipped through during transformation.

Invoke the humanizer skill:
CODEBLOCK49

Patterns to Remove

Pattern	Fix
"Full stop."	"Simple as." or just delete
Em-dashes (—)

AI Words to Remove

Additionally, crucial, delve, emphasize, testament, enhance, foster, landscape, showcase, tapestry, underscore, vibrant, pivotal, key (adj), interplay

Soul to Add

- Skepticism: "I might be wrong" / "Not sure about this"
Reactions: "That number is wild" / "Interesting"
First person: "I keep thinking about..."
Mixed feelings: "Impressive but also kind of unsettling"
Questions: Ask the audience

Step 5: Save Draft

File Organization Convention

CRITICAL: Use assets folder structure for all drafts.

CODEBLOCK50

Example: INLINECODE63

Draft Content Format

Save main content as: INLINECODE64

CODEBLOCK51

README.md Template

Create README.md in the assets folder to document the work:

CODEBLOCK52
[Claim]: [Value]

- Source: [PDF name, page]
RLM verified: [grep results or calculation]


### [Claim Category 2]
\

[Claim]: [Value]

- Source: [PDF name, page]
Verified: [evidence]


---

## Audit Reports

| File | Purpose |
|------|---------|
| YYYY-MM-DD-topic-audit-manifest.md | Claims to verify |
| YYYY-MM-DD-topic-audit-report.md | Initial audit results |
| YYYY-MM-DD-topic-audit-final.md | Final audit with corrections |

**Audit result:** X/Y claims verified

---

## KSVC Holdings

\

bash

Verification command

curl -s "https://kicksvc.online/api/[model]" | jq '...'

Result: [Holdings status]
\


---

## Source Documents

| Source | Path | Used For |
|--------|------|----------|
| [Report name] | /Users/Shared/ksvc/pdfs/YYYYMMDD/file.pdf | [What data] |

---

## Corrections Made

1. [Correction 1]
2. [Correction 2]

---

## Lessons Learned

1. [Lesson 1]
2. [Lesson 2]

Step 6: Chart Decision & Generation

Timing: After draft is complete. The draft crystallizes the thesis - then you see which claims benefit from visualization.

When to Make Charts

Content Type	Chart Likely?	Why
Long Thread	Yes	Multiple data points, trends
Quick Take

Chart-Tweet Pairing

Principle: Put the most eye-catching visual early (Tweet 1-3) to hook engagement.

Chart Type	Best Tweet Position	Why
Market size / growth bar	Tweet 2 (TLDR)	Pairs with market numbers, shows scale
Component breakdown pie

Pairing logic:

1. Match chart to the tweet that contains the same data
Hook tweet (Tweet 1) can go either way:

- Text-only: Clean, curiosity-driven, lets words land first
- With chart: Visual stop, data-forward, shows you have receipts

3. Visuals work best on data-heavy tweets, not opinion tweets
Final tweet (watchlist/conclusion) usually doesn't need a chart

Example pairing (humanoid robotics thread):
CODEBLOCK56

Decision Process

1. Review draft - identify "chartable moments"

- Time series data (market growth, price trends) - Component breakdowns (pie charts) - Company comparisons (tables)

2. Check RLM cache - do we have the data?

- Text numbers → bar/line charts - Tables → comparison tables - Source charts → reference or recreate

3. DECLARE SOURCE (MANDATORY) - before any chart generation

CODEBLOCK57

4. Generate with chart-factory

CODEBLOCK58

Chart Generation Workflow

CODEBLOCK59

Source Declaration (LEARNED FROM MISTAKE)

⚠️ Why this exists: We once created a "component count" chart but saved a "cost %" source image. The metrics didn't match, making the source invalid for verification.

Before generating ANY chart, you MUST:

Step	Action	Example
1. State	"I am charting [METRIC] from [SOURCE]"	"I am charting hardware cost % from 永豐 p.20"
2. Show

Red flags - STOP if you notice:

- Source shows % but you're charting counts (metric mismatch)
Source has 15 items but chart has 5 (cherry-picking)
Source image doesn't contain your chart's numbers (wrong source)
Company name romanized/guessed from Chinese (fabricated data)
Ticker suffix assumed without checking (TT vs TW)

Company & Ticker Verification

⚠️ LEARNED FROM MISTAKE: We fabricated "Chuing" for 祺驊 (4571). Official name is "KHGEARS".

CODEBLOCK60

- Never romanize Chinese names (祺驊 ≠ "Chuing")
Use TW suffix for general audience (TT = Bloomberg only)

Using chart-factory

CODEBLOCK61

Verification (MANDATORY)

After generating, spawn Explore agent with thoroughness: quick for focused verification:

CODEBLOCK62

Verification checks data → chart integrity. Source accuracy is RLM's responsibility (Step 4a).

Thoroughness = quick: Single-pass verification, focused on specific data points. Fast visual-to-data check.

Save Charts

Save to: INLINECODE67

Include:

- Generated charts (chart1.png, chart2.png)
Source images from PDF (for traceability)
generate_charts.py script (reproducibility)

Step 7: Publish to Final Folder

After approval, publish clean version to /Users/Shared/ksvc/threads/.

File Organization Convention

CRITICAL: Flat folder structure, one folder per post.

CODEBLOCK63

Rules:

- ✅ Flat structure: YYYY-MM-DD-topic/ at root level (not nested in 2026-02/)
✅ Charts directly in folder (not in charts/ subfolder)
✅ thread.md = clean content only (no metadata header)
✅ _metadata.md = internal reference (sources, audit, not for posting)

thread.md Format

Clean version with just the tweets - no metadata header:

CODEBLOCK64

_metadata.md Format

Internal reference file (prefixed with _ to indicate not for posting):

CODEBLOCK65

Example: See INLINECODE75

Publish Workflow

CODEBLOCK66

Result:
CODEBLOCK67

When to Publish

Status	Action
Draft approved	Publish to /Users/Shared/ksvc/threads/
Needs revision

Stay in content-pipeline/draft/ | | Posted to X | Move to /Users/Shared/ksvc/threads/archive/ (optional) |

Quality Checklist

Extraction (Step 1a/1b):

- [ ] ⚠️ Checked published threads (/Users/Shared/ksvc/threads/) before topic selection
[ ] Topic does NOT duplicate a recently published thread (same source + same angle = reject)
[ ] Scanned recent PDF folders (at least 3) with Explore agents
[ ] Identified cross-document connections
[ ] Deep extracted key reports with RLM
[ ] Charts/images extracted and reviewed (use --extract-images)
[ ] ⚠️ Extraction validation: Every PDF's chars_extracted checked against expected size
[ ] ⚠️ Read tool fallback used for any PDF with < 1000 chars (or suspiciously low for page count)
[ ] Key numbers verified via RLM grep (not just Explore summary)

Cross-Doc Synthesis (Step 1c):

- [ ] Used rlm-repl-multi to compare across sources (if multiple PDFs)
[ ] Asked synthesis questions (consensus, comparison, disagreement)
[ ] Documented synthesized insights in cache
[ ] Flagged cross-doc claims for audit in Step 4b
[ ] Identified unique insights that single-source extraction would miss

Content:

- [ ] All published numbers have RLM grep confirmation
[ ] Technical specifics included (fabs, yields, WPM)
[ ] Time frames clear (Q1 2026, 2027e)
[ ] Sources cited (multiple reports for cross-do

Kirk 内容管线

从分析师研究PDF中创建Twitter内容，并针对KSVC持仓进行验证。

管线步骤（强制）

1a. 扫描PDF（使用Explore代理进行广泛筛选）
1b. 提取洞察（使用RLM进行深度提取——文本、表格和图表）
1c. 跨文档综合（使用rlm-multi进行跨来源洞察整合）

2. 检查KSVC持仓（初步——使用已知代码）
撰写内容（数据骨架，以Serenity为主）

4a. 审计（使用RLM对照源PDF验证草稿中的主张）
4a.5. Gemini交叉验证（对未通过/无来源的推论进行网络验证）
4b. 最终持仓验证（使用发现的代码检查全部7个模型）
4c. 风格化（调用kirk-mode技能进行语气/角色塑造）
4d. 人性化（去除AI模式）

5. 保存草稿以待审批
图表决策与生成（在草稿明确论点之后）
发布至最终文件夹（用于发布的纯净版本）

切勿跳过步骤4a-4d。使用1a进行多PDF筛选，1b进行深度提取，1c进行跨文档综合，4a进行验证，4a.5进行网络交叉验证，4b进行最终持仓检查，4c进行角色语气塑造，4d进行AI模式去除。

⚠️ 关键：步骤1b提取数据。步骤1c跨文档综合。步骤4a验证已撰写的内容。步骤4a.5交叉验证推论。

- 1b：每个PDF说了什么？（逐文档提取）
1c：跨PDF出现了哪些模式？（跨文档综合）
4a：我的草稿是否准确反映了来源？（来源锁定验证）
4a.5：被标记的推论是否根据公开来源有效？（网络交叉验证）
4c：哪种Kirk模式适合这种情况？（角色语气）

子代理权限（关键）

子代理无法读取项目目录之外的文件。 /Users/Shared/ksvc/pdfs/ 中的PDF被阻止。解决方法：在生成子代理之前，将PDF符号链接到项目目录中。

主代理必须在步骤1a之前创建符号链接：
bash
ln -sf /Users/Shared/ksvc/pdfs/YYYYMMDD .claude/pdfs-scan

然后子代理从 .claude/pdfs-scan/filename.pdf 读取——这可行，因为路径在项目内解析。

访问方式	/Users/Shared/ 路径	符号链接后的项目路径
子代理读取工具（PDF）	❌ 自动拒绝	✅ 可行
子代理读取工具（图片）

发现于2026-02-07： 子代理在 /Users/Shared/ 路径上会失败，提示读取权限已被自动拒绝（提示不可用）。符号链接到项目目录 = 完全读取权限。已测试：19个PDF，中等详尽度，125k tokens，零错误。

内容类型与语气混合

完整指南： references/kirk-voice.md —— 阅读以获取模板和示例。

Kirk语气 = Serenity的数据 + Citrini7的机智 + Jukan的怀疑 + Zephyr的能量。

类型	时机	混合	关键元素
长推文串	深度挖掘，多来源	Serenity + Jukan	TLDR + 怀疑
快速点评

快速公式

长推文串： 钩子 → TLDR → 数字 → 怀疑 → 立场

快速点评： 标题数字 → 背景 → 如果你现在在看...

突发新闻： 重磅。 / 好好好... → 关键数字 → 来源

胜利巡礼： $TICKER自KSVC加入以来上涨X% → 入场价/现价 → 论点得到验证

步骤1a：使用Explore代理扫描PDF

当你有许多PDF需要审阅时，使用Explore代理进行广泛筛选。这比RLM用于初步发现更快。

步骤1a.0：检查已发布的推文串（强制——先做）

⚠️ 在扫描任何PDF之前，检查Kirk已经发布了什么。

bash

列出所有已发布的推文串

ls /Users/Shared/ksvc/threads/

阅读最近的thread.md文件以了解涵盖的主题

对于每个已发布的推文串，注意：

- 主题（论点是什么？）
使用的源PDF（检查_metadata.md）
日期（多近？）

然后在扫描后选择主题时，拒绝任何：

- 使用与已发布推文串相同的主要源PDF的主题
涵盖相同论点/角度（即使来自不同来源）
对Kirk的粉丝来说会读起来像重复的内容

可接受的重叠：

- 对先前推文串的跟进/更新，包含新数据（例如，财报确认了论点）
同一行业的不同角度（例如，发布了关于ABF短缺的内容，现在发布关于特定公司财报的内容）
明确表述为更新：自从我上次发布关于X的内容以来，情况发生了变化

为什么存在这个规则（案例研究——ABF基板，2026-02-07）：

Kirk在2月5日发布了一个10条推文的推文串，涵盖高盛的ABF短缺报告（10%→21%→42%，Kinsus/NYPCB/Unimicron）。在2月7日，管线选择了同一份高盛报告，并生成了一个3条推文的快速点评，包含相同的数字、相同的公司、相同的角度。我们没有先检查已发布的推文串，所以当有10个其他新鲜主题角度可用时，我们在重复内容上浪费了一次管线运行。

何时使用

- 筛选10+个PDF以找到相关的
寻找跨文档连接
从多个来源构建论点
还不知道哪些PDF重要

如何扫描

1. 检查已发布的推文串（上面的步骤1a.0）

2. 列出最近的PDF文件夹并统计PDF数量

ls /Users/Shared/ksvc/pdfs/ | tail -5 ls /Users/Shared/ksvc/pdfs/YYYYMMDD/ | wc -l

3. 将PDF符号链接到项目目录（子代理访问必需）

ln -sf /Users/Shared/ksvc/pdfs/YYYYMMDD .claude/pdfs-scan

4. 将PDF分组并并行生成Explore代理

目标：每个代理约5个PDF。在单条消息中生成所有代理。 - 每个代理获得一个特定的文件名列表进行扫描 - 所有代理同时运行 → 总时间 = 最慢的代理 - Haiku很便宜——更多代理 = 更快，且没有显著的成本增加

代理规模

PDF数量	代理数量	PDF/代理	预期时间
≤5	1	全部	~25秒
6-10

2 | 各约5个 | ~25秒 | | 11-15 | 3 | 各约5个 | ~25秒 | | 16-20 | 4 | 各约5个 | ~25秒 | | 21-30 | 5-6 | 各约5个 | ~30秒 |

为什么每个代理约5个PDF？ 速度的最佳点。每个PDF需要约4-8秒来读取+总结。5个PDF ≈ 每个代理25秒。每个代理增加更多PDF不会节省任何东西（相同的总tokens），但会使实际时间更差。

成本： Haiku很便宜。4个代理 × 5个PDF × 约4k tokens = 总共约80k输入tokens——与1个代理做全部20个相同。并行是免费的。

跨文档综合权衡： 每个代理只看到其批次，所以跨批次主题是主代理的工作。这没问题——主代理无论如何都会合并所有结果。

示例：生成Explore代理

步骤1：主代理创建符号链接并列出PDF：
bash
ln -sf /Users/Shared/

kirk-content-pipeline内容流水线