Content Collector

Autonomous local pipeline — noisy tech streams to a daily, decision-ready brief.

PythonPostgreSQLOllamaGeminiLINE

Overview

Automated content pipeline with local AI processing. Tech content from multiple sources — GitHub, Reddit, Threads — through a local AI pipeline. Embedding, classification, clustering, summarization. Raw content to daily editorial digest. Full local execution — Ollama + free APIs. Structured data in PostgreSQL + pgvector. Semantic search built in.

Features

📡

Multi-Source Ingestion

RSS, Reddit, Threads — pulled and normalized into one pipeline

🤖

Local AI Triage

Qwen3 14B on-device — relevance scoring, topic labeling, priority ranking

🧲

Semantic Vector Index

BGE-M3 embeddings in pgvector — fast similarity search and topic-level recall

📊

Daily Digest

Gemini-generated editorial summaries — LINE message + clean HTML report

Zero Monthly Cost

Local Mac Mini, open models, free-tier APIs — no paid infrastructure

🔔

Real-Time Alerts

Immediate notifications on high-priority signals — not hours later

Architecture

Stack

🐍Python 3.12
🐘PostgreSQL 17
🧮pgvector
🦙Ollama
🤖Qwen3 14B
🔤BGE-M3
Gemini 2.5 Flash
📱LINE API
🤖Telegram Bot API
🐳Docker Compose
LaunchAgent
🎭Playwright