Skip to content

Commit d7f5d39

Browse files
committed
chore: update skill
1 parent 5897fa2 commit d7f5d39

2 files changed

Lines changed: 71 additions & 16 deletions

File tree

.cursor/skills/pgweekly-blog-generation/SKILL.md

Lines changed: 20 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,26 +9,29 @@ Generates English and Chinese technical blog posts from PostgreSQL mailing list
99

1010
## Quick Workflow
1111

12-
1. **Fetch** thread data (required; do not skip): run the fetch script so that the thread HTML, Markdown, and **all patch attachments** are downloaded and saved under `data/threads/`:
12+
1. **Fetch** thread data (required; do not skip):
1313
```bash
1414
python3 tools/fetch_data.py --thread-id "{THREAD_ID_OR_URL}"
1515
```
16-
This creates `data/threads/YYYY-MM-DD/<sanitized-thread-id>/` and downloads every `.patch` (and other allowed attachments) into `data/threads/YYYY-MM-DD/<sanitized-thread-id>/attachments/`. Always run this step before writing the blog.
16+
- **Wait for the command to finish** (check exit code is 0). Do not proceed if fetch failed.
17+
- This creates `data/threads/YYYY-MM-DD/<sanitized-thread-id>/` and downloads attachments into `attachments/`.
18+
- The `YYYY-MM-DD` in the path is the **fetch date** (when you ran the script), NOT the thread date—do not use it for year/week.
1719

1820
2. **Locate** fetched content in `data/threads/YYYY-MM-DD/<thread-id>/`:
1921
- `thread.html` - Original HTML
2022
- `thread.md` - Converted Markdown
21-
- `metadata.txt` - Thread info (use for year/week)
23+
- `metadata.txt` - Thread info
2224
- `attachments/` - **Downloaded patches** (e.g. `.patch` files from the mailing list)
2325
- `attachments.txt` - List of downloaded attachment filenames
2426

25-
3. **Verify** all patch set versions are downloaded (required before analyze):
26-
- Read `thread.md` and `thread.html` to identify all patch versions referenced in the thread (e.g. v1, v2, v3, v4, v5…; also patterns like `0001-`, `0002-` in patch series)
27-
- List files in `attachments/` and compare: every referenced version must have a corresponding downloaded file
28-
- If any referenced version is missing:
29-
- Run `python3 tools/fetch_data.py --thread-dir "data/threads/YYYY-MM-DD/<thread-id>"` to retry downloading missing attachments
30-
- If still missing, do not proceed with analysis; report the missing versions and ask the user to verify the thread or manually add the patches
31-
- Only proceed to analyze/generate once all referenced patch versions are present in `attachments/`
27+
3. **Verify** all patch set versions are downloaded — **MANDATORY GATE; do not skip**:
28+
- Read `thread.md` and `thread.html` to identify **all** patch versions referenced (v1, v2, v3…; or `0001-`, `0002-` in patch series)
29+
- Run `ls data/threads/YYYY-MM-DD/<thread-id>/attachments/` and compare with the list of referenced versions
30+
- **If any referenced version is missing:**
31+
- Run `python3 tools/fetch_data.py --thread-dir "data/threads/YYYY-MM-DD/<thread-id>"` to retry
32+
- Re-verify; if still missing, **STOP** — report missing versions to the user and do not write the blog
33+
- **If the thread has no patches**, verification passes (nothing to check).
34+
- **CRITICAL:** Do not proceed to step 4 (Analyze) until you have explicitly confirmed: "Referenced versions: [list] ✓ All present in attachments/". Only then may you write the blog.
3235

3336
4. **Analyze** content:
3437
- If multiple patch versions (v1, v2, v3...), run `diff -u` between versions to explain evolution
@@ -48,7 +51,7 @@ Generates English and Chinese technical blog posts from PostgreSQL mailing list
4851
- Chinese: `src/cn/{year}/{week}/{descriptive-filename}.md`
4952
- Filename: kebab-case from main topic (e.g. `planner-count-optimization`)
5053

51-
7. **Update** SUMMARY.md and year READMEs:
54+
7. **Update** `src/SUMMARY.md` and year READMEs:
5255
- Add entries under both `# 🇬🇧 English` and `# 🇨🇳 中文`
5356
- Follow existing hierarchy: year → week → link to article
5457
- **Put the new week/article at the top** (newest first): insert the new week immediately after the year line, so the latest week appears first in the list.
@@ -57,7 +60,12 @@ Generates English and Chinese technical blog posts from PostgreSQL mailing list
5760

5861
## Year/Week
5962

60-
Determine from `metadata.txt` (thread date) or use current date. Use ISO week number (e.g. 06 for week 6).
63+
**Use the blog writing date (the day you write the blog) as the source of truth.** This determines which week the article is filed under.
64+
65+
**Rules:**
66+
- Compute ISO year and ISO week from **today's date** (the date when the blog is being written).
67+
- Example: if writing on 2026-03-20, use year=2026, week=12 (from `datetime(2026, 3, 20).isocalendar()`).
68+
- **Do NOT use** the thread date, `metadata.txt`, the directory name `YYYY-MM-DD` (fetch date), or "Downloaded:" for year/week.
6169

6270
## Writing Guidelines
6371

tools/fetch_data.py

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import re
66
from datetime import datetime
77
from pathlib import Path
8+
from email.utils import parsedate_to_datetime
89
import urllib.request
910
from html.parser import HTMLParser
1011

@@ -51,6 +52,37 @@ def extract_title(html: str) -> str:
5152
return "PostgreSQL Thread Summary"
5253

5354

55+
def extract_thread_date(html: str) -> str | None:
56+
"""Extract the first/original message date from thread HTML for year/week determination.
57+
Returns YYYY-MM-DD or None if not found.
58+
"""
59+
# RFC 2822 style: "Mon, 20 Jan 2026 12:00:00 +0000" or "Date: Mon, 20 Jan 2026..."
60+
rfc2822 = re.findall(
61+
r'(?:Date:\s*)?([A-Za-z]{3},\s*\d{1,2}\s+[A-Za-z]{3}\s+\d{4}\s+\d{1,2}:\d{2}(?::\d{2})?\s*[+-]\d{4})',
62+
html
63+
)
64+
for s in rfc2822:
65+
try:
66+
dt = parsedate_to_datetime(s.strip())
67+
return dt.strftime("%Y-%m-%d")
68+
except (ValueError, TypeError):
69+
continue
70+
# "On Mon, Jan 20, 2026 at 12:00 PM" style
71+
on_wrote = re.findall(
72+
r'On\s+([A-Za-z]{3}),\s*([A-Za-z]{3})\s+(\d{1,2}),?\s+(\d{4})',
73+
html
74+
)
75+
if on_wrote:
76+
try:
77+
# Use first (original) message date
78+
_, month_str, day, year = on_wrote[0]
79+
dt = datetime.strptime(f"{month_str} {day} {year}", "%b %d %Y")
80+
return dt.strftime("%Y-%m-%d")
81+
except ValueError:
82+
pass
83+
return None
84+
85+
5486
def html_to_markdown(html: str) -> str:
5587
"""Convert HTML to Markdown using html2text if available."""
5688
if HAS_HTML2TEXT:
@@ -297,16 +329,31 @@ def main() -> None:
297329
print(" No attachments found")
298330

299331
# Step 6: Create metadata file
300-
metadata_path = thread_dir / "metadata.txt"
301-
metadata_content = "\n".join([
332+
thread_date_str = extract_thread_date(html)
333+
iso_year, iso_week = "", ""
334+
if thread_date_str:
335+
try:
336+
dt = datetime.strptime(thread_date_str, "%Y-%m-%d")
337+
iso_year = str(dt.isocalendar()[0])
338+
iso_week = f"{dt.isocalendar()[1]:02d}"
339+
except ValueError:
340+
pass
341+
342+
metadata_lines = [
302343
f"Thread ID: {thread_id}",
303344
f"Title: {title}",
304345
f"Downloaded: {datetime.now().isoformat()}",
305346
f"HTML Size: {len(html)} bytes",
306347
f"Markdown Size: {len(markdown_content)} chars",
307348
f"Attachments: {len(attachments) if attachments else 0}",
308-
])
309-
metadata_path.write_text(metadata_content, encoding="utf-8")
349+
]
350+
if thread_date_str:
351+
metadata_lines.insert(2, f"Thread date: {thread_date_str}")
352+
if iso_year and iso_week:
353+
metadata_lines.insert(3, f"ISO year: {iso_year}, ISO week: {iso_week}")
354+
355+
metadata_path = thread_dir / "metadata.txt"
356+
metadata_path.write_text("\n".join(metadata_lines), encoding="utf-8")
310357

311358
print(f"\n✅ Done! All files saved to: {thread_dir.resolve()}")
312359
print(f"\nContents:")

0 commit comments

Comments
 (0)