Content Collector
Autonomous local pipeline — noisy tech streams to a daily, decision-ready brief.

Overview
Automated content pipeline with local AI processing. Tech content from multiple sources — GitHub, Reddit, Threads — through a local AI pipeline. Embedding, classification, clustering, summarization. Raw content to daily editorial digest. Full local execution — Ollama + free APIs. Structured data in PostgreSQL + pgvector. Semantic search built in.
Features
Multi-Source Ingestion
RSS, Reddit, Threads — pulled and normalized into one pipeline
Local AI Triage
Qwen3 14B on-device — relevance scoring, topic labeling, priority ranking
Semantic Vector Index
BGE-M3 embeddings in pgvector — fast similarity search and topic-level recall
Daily Digest
Gemini-generated editorial summaries — LINE message + clean HTML report
Zero Monthly Cost
Local Mac Mini, open models, free-tier APIs — no paid infrastructure
Real-Time Alerts
Immediate notifications on high-priority signals — not hours later