Media News Digest
Automated media & entertainment industry news digest system. Covers Hollywood trades, box office, streaming platforms, awards season, film festivals, production news, and industry deals.
Quick Start
- 1. Generate Digest (unified pipeline — runs all 4 sources in parallel):
CODEBLOCK0
- 2. Use Templates: Apply Discord or email templates to merged output
Data Sources (65 total, 64 enabled)
- - RSS Feeds (36, 35 enabled): THR, Deadline, Variety, IndieWire, The Wrap, Collider, Vulture, Awards Daily, Gold Derby, Screen Rant, Empire, The Playlist, /Film, Entertainment Weekly, Roger Ebert, CinemaBlend, Den of Geek, The Direct, MovieWeb, CBR, What's on Netflix, Decider, Anime News Network, and more
- Twitter/X KOLs (18): @THR, @DEADLINE, @Variety, @FilmUpdates, @DiscussingFilm, @BoxOfficeMojo, @MattBelloni, @BorysKit, @TheAcademy, @letterboxd, @A24, and more
- Reddit (11): r/movies, r/boxoffice, r/television, r/Oscars, r/TrueFilm, r/entertainment, r/netflix, r/marvelstudios, r/DCCinematic, r/anime, r/flicks
- Web Search (9 topics): Brave Search / Tavily with freshness filters
Topics (9 sections)
- - 🇨🇳 China / 中国影视 — China mainland box office, Chinese films, Chinese streaming
- 🎬 Production / 制作动态 — New projects, casting, filming updates
- 💰 Deals & Business / 行业交易 — M&A, rights, talent deals
- 🎞️ Upcoming Releases / 北美近期上映 — Theater openings, release dates, trailers
- 🎟️ Box Office / 票房 — NA/global box office, opening weekends
- 📺 Streaming / 流媒体 — Netflix, Disney+, Apple TV+, HBO, viewership
- 🏆 Awards / 颁奖季 — Oscars, Golden Globes, Emmys, BAFTAs
- 🎪 Film Festivals / 电影节 — Cannes, Venice, TIFF, Sundance, Berlin
- ⭐ Reviews & Buzz / 影评口碑 — Critical reception, RT/Metacritic scores
Scripts Pipeline
Unified Pipeline
python3 scripts/run-pipeline.py \
--defaults config/defaults --config workspace/config \
--hours 48 --freshness pd \
--archive-dir workspace/archive/media-news-digest/ \
--output /tmp/md-merged.json --verbose --force
- - Features: Runs all 4 fetch steps in parallel, then merges + deduplicates + scores
- Output: Final merged JSON ready for report generation (~30s total)
- Flags:
--skip rss,twitter to skip steps, --enrich for full-text enrichment
Individual Scripts
- -
fetch-rss.py — Parallel RSS fetcher (10 workers, 30s timeout, caching) - INLINECODE3 — Dual backend: official X API v2 + twitterapi.io (auto fallback, 3-worker concurrency)
- INLINECODE4 — Web search via Brave (multi-key rotation) or Tavily
- INLINECODE5 — Reddit public JSON API (4 workers, no auth)
- INLINECODE6 — Quality scoring, URL dedup, multi-source merging
- INLINECODE7 — Structured overview sorted by quality_score
- INLINECODE8 — Full-text enrichment for top articles
- INLINECODE9 — PDF generation with Chinese typography + emoji
- INLINECODE10 — MIME email with HTML body + PDF attachment
- INLINECODE11 — XSS-safe markdown to HTML conversion
- INLINECODE12 — Configuration validator
- INLINECODE13 — Source health tracking
- INLINECODE14 — Config overlay loader (defaults + user overrides)
- INLINECODE15 — Pipeline testing with --only/--skip/--twitter-backend filters
Cron Integration
Reference references/digest-prompt.md in cron prompts.
Daily Digest
CODEBLOCK2
Weekly Digest
CODEBLOCK3
Dependencies
All scripts work with Python 3.8+ standard library only. feedparser optional but recommended.
媒体新闻摘要
自动化媒体与娱乐行业新闻摘要系统。涵盖好莱坞行业动态、票房数据、流媒体平台、颁奖季、电影节、制作新闻及行业交易。
快速开始
- 1. 生成摘要(统一流水线——并行运行全部4个数据源):
bash
python3 scripts/run-pipeline.py \
--defaults
/config/defaults \
--config /config \
--hours 48 --freshness pd \
--archive-dir /archive/media-news-digest/ \
--output /tmp/md-merged.json --verbose --force
- 2. 使用模板:将合并后的输出应用于Discord或邮件模板
数据源(共65个,启用64个)
- - RSS订阅源(36个,启用35个):THR、Deadline、Variety、IndieWire、The Wrap、Collider、Vulture、Awards Daily、Gold Derby、Screen Rant、Empire、The Playlist、/Film、Entertainment Weekly、Roger Ebert、CinemaBlend、Den of Geek、The Direct、MovieWeb、CBR、Whats on Netflix、Decider、Anime News Network等
- Twitter/X意见领袖(18个):@THR、@DEADLINE、@Variety、@FilmUpdates、@DiscussingFilm、@BoxOfficeMojo、@MattBelloni、@BorysKit、@TheAcademy、@letterboxd、@A24等
- Reddit(11个):r/movies、r/boxoffice、r/television、r/Oscars、r/TrueFilm、r/entertainment、r/netflix、r/marvelstudios、r/DCCinematic、r/anime、r/flicks
- 网络搜索(9个主题):Brave Search / Tavily(带新鲜度过滤器)
主题分类(9个板块)
- - 🇨🇳 中国影视 — 中国大陆票房、华语电影、中国流媒体
- 🎬 制作动态 — 新项目、选角、拍摄进展
- 💰 行业交易 — 并购、版权、人才交易
- 🎞️ 北美近期上映 — 院线上映、发行日期、预告片
- 🎟️ 票房 — 北美/全球票房、开画周末
- 📺 流媒体 — Netflix、Disney+、Apple TV+、HBO、收视率
- 🏆 颁奖季 — 奥斯卡、金球奖、艾美奖、英国电影学院奖
- 🎪 电影节 — 戛纳、威尼斯、多伦多、圣丹斯、柏林
- ⭐ 影评口碑 — 评论界反响、烂番茄/Metacritic评分
脚本流水线
统一流水线
bash
python3 scripts/run-pipeline.py \
--defaults config/defaults --config workspace/config \
--hours 48 --freshness pd \
--archive-dir workspace/archive/media-news-digest/ \
--output /tmp/md-merged.json --verbose --force
- - 功能:并行运行全部4个抓取步骤,然后合并+去重+评分
- 输出:最终合并的JSON文件,可用于报告生成(总计约30秒)
- 标志:--skip rss,twitter跳过步骤,--enrich进行全文丰富
独立脚本
- - fetch-rss.py — 并行RSS抓取器(10个工作线程,30秒超时,带缓存)
- fetch-twitter.py — 双后端:官方X API v2 + twitterapi.io(自动回退,3线程并发)
- fetch-web.py — 通过Brave(多密钥轮换)或Tavily进行网络搜索
- fetch-reddit.py — Reddit公共JSON API(4个工作线程,无需认证)
- merge-sources.py — 质量评分、URL去重、多源合并
- summarize-merged.py — 按qualityscore排序的结构化概览
- enrich-articles.py — 精选文章的全文丰富
- generate-pdf.py — 支持中文排版+表情符号的PDF生成
- send-email.py — 带HTML正文+PDF附件的MIME邮件
- sanitize-html.py — XSS安全的Markdown转HTML
- validate-config.py — 配置验证器
- source-health.py — 数据源健康追踪
- configloader.py — 配置覆盖加载器(默认配置+用户覆盖)
- test-pipeline.sh — 带--only/--skip/--twitter-backend过滤器的流水线测试
Cron集成
在cron提示中引用references/digest-prompt.md。
每日摘要
MODE = daily, FRESHNESS = pd, RSS_HOURS = 48
每周摘要
MODE = weekly, FRESHNESS = pw, RSS_HOURS = 168
依赖项
所有脚本仅需Python 3.8+标准库即可运行。feedparser为可选但推荐安装的依赖。