# Cron Job: reddit-daily-telegram

**Job ID:** 93a2c5150a6f
**Run Time:** 2026-06-07 18:02:41
**Schedule:** 0 18 * * *

## Prompt

[IMPORTANT: The user has invoked the "content-monitoring-briefings" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]

---
name: content-monitoring-briefings
description: "Monitor external content channels, compare recent output to industry developments, and publish concise briefings or Telegram sitreps."
platforms: [linux, macos, windows]
---

# Content Monitoring & Briefings

Use this skill when the user wants to:
- scrape or inspect a YouTube channel, creator feed, or other content stream
- compare recent uploads/posts against what happened in the broader industry
- identify gaps, missed stories, or under-covered themes
- turn the result into a daily briefing, sitrep, or delivery to Telegram

This is an umbrella workflow skill. Keep it reusable: channel-specific details belong in `references/`.

## Core workflow

1. **Verify the target first**
   - Confirm the exact channel/page/account URL.
   - If the user names a creator but gives a different URL, trust the URL and mention the mismatch if relevant.
   - Inspect the page identity before analyzing topics.

2. **Collect the latest items**
   - Prefer the channel's videos/posts page.
   - Use browser snapshots for visible metadata, and browser text extraction when the page is virtualized or truncated.
   - Capture title, age, view count, and any recurring topic labels.
   - If the visible list is incomplete, scroll and continue until the requested time window is covered.

3. **Classify the content**
   - Group items by topic family: product firmware, product reviews, platform updates, policy/regulation, ecosystem moves, and tutorials/how-tos.
   - Mark whether the item is reactive news, evergreen content, or opinion/analysis.

4. **Compare against industry reality**
   - Look for gaps in:
     - ecosystem/platform competition
     - policy and regulation impact
     - AI or automation trends
     - privacy/data governance
     - sustainability/repairability
     - adjacent competitors or substitute products
   - Ask: what changed in the industry that the channel did not cover, and why does that omission matter?

5. **Write the briefing**
   - Start with the direct answer.
   - Keep it structured: covered / missed / why it matters / next story ideas.
   - If the user asks for a public-ready result, make it readable without extra explanation.

6. **Confirm cadence before scheduling**
   - Do not turn a request for realtime/current status into a daily cron job by default.
   - If the user says "realtime," "now," or asks for a live quota/status check, perform an immediate check or explain the available live signal first.
   - Only create scheduled jobs after the cadence is explicit (for example daily at 09:00, every hour, or watchdog-style only-on-change alerts).

7. **Deliver to Telegram when requested**
   - If the user wants a Telegram target, list available message targets first when a specific recipient is needed.
   - Use the home Telegram destination when the user simply says "send to Telegram".
   - For daily sitreps, use a concise format: key items, notable changes, one recommendation.

## Long-output delivery pattern (cron → file + email)

Use this when a recurring monitoring job produces output too long for chat (>50 lines / 3000 chars), or when the user prefers email over Telegram for their daily digests.

### Prompt structure for the cron job
The cron job prompt must be self-contained (no chat context available). Include:

- The exact source URL or API endpoint
- Time window (e.g. "past 24h", "since last run")
- Formatting rules inline (the cron agent has no chat context)
- Two delivery actions in this order:

```
1. Write output to /opt/data/cron/output/<channel-slug>/YYYY-MM-DD.md
   Create directories if needed.
2. EMAIL to <user-email> with subject "<type> - <channel> - YYYY-MM-DD"
   Via GAPI script at /opt/data/skills/productivity/google-workspace/scripts/google_api.py
   Only skip email if the Gmail API fails — still write the local file.
```

### Conditional email (skip when no new content)
When the channel has no new items since the last run:
- Write "No new videos" or "Nothing notable" to the local file
- Skip the email entirely — don't send empty digests

This keeps the user's inbox clean and only alerts them to actual content.

### Reddit-specific: prefer JSON API for cron scraping
For Reddit cron jobs, append `.json` to subreddit URLs (e.g. `https://www.reddit.com/r/bellingcat/hot.json`) instead of using the browser. The JSON API is faster, more reliable, and avoids browser overhead in cron context. Default to `hot` sort; only use `top` if the user explicitly asks for top posts.

### Purge old output
Set up a daily cron at 03:00 that deletes `.md` files older than 30 days from `/opt/data/cron/output/`:
```
find /opt/data/cron/output/ -type f -name "*.md" -mtime +30 -delete
find /opt/data/cron/output/ -type d -empty -delete
```

## Cron prompt template (YT sitrep)

```
Check [CHANNEL NAME] YouTube channel at [URL] for new videos since the last run.
Output in nieuws24 format: thematic groups with short title+explanation per item,
up to 5 videos max, concluding takeaway.
After creating the digest:
1. Write to /opt/data/cron/output/yt-<slug>/YYYY-MM-DD.md (create dirs if needed)
2. EMAIL to [email] with subject "YT Sitrep - [Name] - YYYY-MM-DD"
   using /opt/data/skills/productivity/google-workspace/scripts/google_api.py
   Only skip email if Gmail API fails — still write the file.
If NO new videos since last run: write "No new videos" and skip email.
```

## Cron prompt template (Reddit digest)

```
Create a concise digest from Reddit for subreddits: [list].
Use JSON API: https://www.reddit.com/r/<sub>/hot.json
Skip stickied posts. Top 1-3 threads per sub, each with:
title, upvotes, 1-line BLUF.
Format: lurker-friendly — no reply suggestions, no engagement prompts.
1. Write to /opt/data/cron/output/reddit-daily/YYYY-MM-DD.md
2. EMAIL to [email] with subject "Reddit Daily Digest - YYYY-MM-DD"
   via GAPI script.
```

## News24 briefing pattern

Use this when the user says `nieuws24` or asks for a last-24-hours news briefing.

1. **Use the configured source set**
   - Default sources for this user: GVA, De Standaard, VRT NWS, De Morgen, and Tweakers.
   - Prefer RSS/Atom feeds for timestamped recent items when available; use homepage extraction and search snippets as fallback.
   - If a source is blocked by paywall, privacy gate, or RSS limits, include a short source caveat rather than pretending full coverage.

2. **Filter to the user's interests**
   - Exclude football, rap/Kanye West, and pop-celebrity items such as Dua Lipa by default.
   - Include those topics only if they become major hard news with broader public-safety, legal, political, or economic relevance.
   - Prioritize Belgium/regional hard news, politics/policy, justice, infrastructure, economy, geopolitics, safety incidents, technology, and cybersecurity.

3. **Keep it concise and scannable**
   - Start with the direct 24h scope and source caveat.
   - Group by section; use short bullets with only the actionable/important fact.
   - Avoid entertainment filler when the user did not ask for culture coverage.

## Reddit-to-Telegram digest pattern

Use this when the user wants recurring Reddit monitoring pushed to Telegram.

1. **Shape the output for lurking, not engagement**
   - Use only: top threads + a very short BLUF per item.
   - Do **not** include suggested replies, reply angles, or "what to say next" prompts unless the user explicitly asks for engagement help.
   - Keep each item compact enough to scan in chat.

2. **Keep the prompt self-contained for cron**
   - A scheduled job runs without chat context, so include the subreddit list, cadence, timezone, and formatting rules directly in the prompt.
   - Prefer explicit timezone wording when scheduling daily jobs.

3. **Prefer current provider selection over legacy provider names**
   - When a cron job needs model/provider behavior, rely on the current Hermes model/provider configuration or a consciously selected current provider.
   - Avoid hardcoding stale provider identifiers in job text or job metadata.
   - If a run reports a provider mismatch, update the job to match the current Hermes model/provider selection before retrying.

4. **Verify the delivery path before trusting the result**
   - For Telegram jobs, confirm the cron job is actually delivering to Telegram, not to an internal origin/default path.
   - Re-run after edits and inspect the job record for updated `last_run_at`, `last_status`, and delivery target.

5. **Style defaults for Reddit digests**
   - Title: short.
   - Body: `Top threads` with 1-line BLUF bullets.
   - Tone: factual, low-noise, no fluff.

## References

- News24 Dutch/Belgian source notes and user-interest filter: `references/news24-dutch-sources.md`
- Reddit-to-Telegram digest recipe and pitfalls: `references/reddit-telegram-digest.md`
- Reddit Telegram format and anti-patterns: `references/reddit-digest-format.md`
- YT channel sitrep and gap analysis: `references/youtube-channel-gap-and-sitrep.md`

## Pitfalls

- YouTube pages can be truncated or virtualized; do not assume the first snapshot contains the full recent list.
- Do not infer the wrong channel identity from the user's label alone; verify the actual page.
- Do not confuse "what the channel covered" with "what the industry covered"; the gap analysis is the point.
- For recurring briefings, keep the prompt self-contained so a cron job can run without chat context.
- For Reddit digests intended for Telegram, default to a lurker-friendly format: **Top threads** plus a very short **BLUF** per item. Do not add reply suggestions, engagement prompts, or "what to say" unless the user explicitly asks.
- **Cron delivery to email**: set `deliver: "local"` on the cron job so the output is saved server-side, and let the prompt handle the email send. The cron's `deliver` field controls where the agent's final message goes — set it to `local` to avoid blasting the chat every day.
- **Reddit JSON API in cron**: browser tooling is flaky and slow in cron context. Always use `.json` suffix URLs for Reddit cron scraping. The `web` toolset (web_search/web_extract) is sufficient — no need for `browser` toolset in Reddit cron jobs.
- **Conditional email**: if "no new content", write the file but skip email. This requires the prompt to explicitly say when to skip.

## Output patterns

### Gap analysis
- Covered
- Missed
- Why it matters
- Candidate stories

### Sitrep
- Today's key items
- Overnight changes
- Recommendation / focus

### Telegram delivery
- Keep messages short enough for chat readability.
- Prefer one clean block over multiple fragmented updates.

### Email delivery (long output)
- Subject: `<Type> - <Channel/Source> - YYYY-MM-DD`
- Body: the full sitrep/digest text
- Attachments: none (the Google API script sends via body only)

[IMPORTANT: The user has invoked the "messaging-document-ingestion" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]

---
name: messaging-document-ingestion
description: "Build and maintain workflows that ingest PDFs or URLs from messaging platforms and convert them into Markdown or note-ready text."
version: 0.1.0
author: Hermes Agent
license: MIT
platforms: [windows, linux, macos]
metadata:
  hermes:
    tags: [telegram, markdown, pdf, url, document-ingestion, markitdown, messaging, automation]
---

# Messaging Document Ingestion

Use this skill when the user wants a bot or automation that receives files or URLs from a messaging platform and turns them into Markdown, notes, or structured text.

Typical triggers:
- "Telegram bot that converts PDF/URL to md"
- "send a PDF and get Markdown back"
- "ingest links or documents from chat"
- "use markitdown for PDFs"
- "build a Telegram workflow for Obsidian"

## Goal
Create a small, reliable pipeline with these stages:
1. Receive input from chat
2. Detect input type (PDF, URL, other)
3. Convert the source into Markdown
4. Return the `.md` file and a concise status message
5. Optionally persist the output locally or in a notes vault

## Recommended approach

### 1) Reuse the existing messaging gateway when possible
If Hermes already has a Telegram/gateway bot configured, prefer extending or reusing that path instead of creating a separate bot token/process. Users often mean “make this work in my current bot” rather than “create another bot.” Verify the current gateway/platform status and target before proposing new infrastructure.

### 2) Start with a narrow MVP
Do not overbuild. The first version should support:
- one messaging platform
- PDF files
- plain http/https URLs
- a Markdown file response

Add OCR, browser rendering, vault sync, and metadata later.

### 2) Use a clear module split
A practical layout is:
- `app.py` / `bot.py` — platform handlers and routing
- `converters.py` — PDF and URL conversion logic
- `utils.py` — URL detection, filename cleanup, text helpers
- `storage.py` — optional persistence to disk or vault

### 3) Prefer real converters over placeholder text
For PDFs, use a real converter such as `markitdown` when available.
For websites, use a fetch + extract pipeline first; only add browser rendering when needed.

### 4) Make the output predictable
Return a Markdown file with a small metadata header when useful:
- source type
- source URL or original filename
- processed timestamp
- title when available

### 5) Treat errors as first-class
Useful failure messages:
- unsupported file type
- URL fetch failed
- empty conversion output
- conversion dependency missing

## Conversion strategy

### PDF
Preferred order:
1. Try `markitdown`
2. If output is empty or unreadable, inspect whether the PDF is scanned
3. Add OCR only for scanned PDFs

### URL
Preferred order:
1. Fetch HTML with a normal HTTP client
2. Remove script/style boilerplate
3. Extract title + readable text
4. If the page is JS-heavy or thin, fall back to a browser renderer later

## Implementation notes
- Use asynchronous handling for bot responsiveness, but run heavy conversion work in a thread or worker.
- Close file handles explicitly when sending files back to chat.
- Use stable filename slugs derived from title, host, or original filename.
- Keep the bot reply short; the `.md` file carries the content.

### Converter availability
When the user asks whether a specific converter such as `markitdown` is available, check three levels and answer precisely:
1. Is there a first-class Hermes tool exposed for it?
2. Is a local CLI/module installed and importable?
3. If neither is true, can the workflow install/use it as a dependency or fall back to another converter?

Do not collapse these into a vague “no.” A missing built-in tool does not mean the workflow cannot be built.

## Verification checklist
Before calling the workflow done:
- Import-check the converter module
- Convert one sample PDF
- Convert one sample URL
- Confirm the bot returns a `.md` attachment
- Confirm error handling for a bad URL or unsupported file

## Common pitfalls
- **PDF placeholder left in place**: always replace it with a real converter before declaring success.
- **Using `Path(url)` for URL parsing**: URL parsing should use `urllib.parse.urlparse`.
- **Leaking open file handles**: open files in a `with` block before sending them.
- **Assuming all PDFs are text PDFs**: scanned PDFs need OCR.
- **Assuming all sites are static HTML**: some pages need browser rendering later.
- **Returning raw HTML**: the user asked for Markdown, not a webpage dump.

## References
- See `references/telegram-markdown-bot.md` for the session-derived implementation notes and verification pattern.

The user has provided the following instruction alongside the skill invocation: [IMPORTANT: You are running as a scheduled cron job. DELIVERY: Your final response will be automatically delivered to the user — do NOT use send_message or try to deliver the output yourself. Just produce your report/output as your final response and the system handles the rest. SILENT: If there is genuinely nothing new to report, respond with exactly "[SILENT]" (nothing else) to suppress delivery. Never combine [SILENT] with content — either report your findings normally, or say [SILENT] and nothing more.]

Create a concise Reddit digest for the following subreddits: r/garmin, r/bellingcat, r/meshtastic. Get the latest hot/top posts from the past 24h. Use the Reddit JSON API (add .json to subreddit URLs like https://www.reddit.com/r/bellingcat/hot.json). Skip stickied posts.

**Output rules (CRITICAL):**
The message you send as your FINAL RESPONSE will be delivered to the user on Telegram. This must be a SHORT digest (max 10 lines, compact, scannable). Per subreddit: list top 1-2 threads with a 1-line BLUF and a CONCLUSION (what does this mean? is it worth clicking? a hot take, practical insight, or warning). Do NOT include links or upvote counts in the Telegram message — keep it concise and readable.

**But BEFORE your final response**, also do these two actions:
1. Write the FULL version of the digest to /opt/data/cron/output/reddit-daily/YYYY-MM-DD.md (create dirs if needed)
2. EMAIL the same full content to tomstynen@gmail.com with subject "Reddit Daily Digest - YYYY-MM-DD" using the GAPI script at /opt/data/skills/productivity/google-workspace/scripts/google_api.py. Only skip email if the Gmail API fails — still write the file.

**Full digest format (for file + email):** section per subreddit, top threads only, title + optional upvotes/comments + BLUF + conclusion. NO LINKS. The user only wants the digest, not URLs.

**Short digest format (final response to user):** maximum 10 lines, compact bullets, conclusion per item, NO links or upvote counts. Example:
🔴 Reddit (18:00)
r/meshtastic: RAK firmware 3.0 beta — breaking changes, backup first
r/bellingcat: OSINT guide update — Google Earth timelapse trick; useful if you do geolocation
r/garmin: Edge 1050 pricing leaks — probably wait for real launch before caring

If nothing notable in a subreddit, skip it silently. If ALL subreddits have nothing notable, say "Rustige dag op Reddit."

## Response

🔴 Reddit digest mislukt  
Reddit JSON API blokkeert deze cron-run met 403/network security voor r/garmin, r/bellingcat en r/meshtastic.  
Geen betrouwbare hot/top-posts uit de voorbije 24u opgehaald — dus geen digest zonder te gokken.  
Ook lokale file-write en Gmail-script konden niet uitgevoerd worden in deze toolcontext.
