Sustainability Analyzer

AI-driven ESG report analysis — topic discovery, qualitative assessment, automated cross-company comparison.

PythonPostgreSQLpgvectorBGE-M3ClaudeDocling

Overview

An analysis pipeline that parses corporate ESG report PDFs and performs systematic qualitative assessments with AI. Docling layout parser preserves table and text structure through hierarchical chunking. BGE-M3 embeddings (Dense+Sparse) enable hybrid semantic search. Claude auto-discovers 41 ESG topics and generates 415 checklist items for qualitative evaluation. Local embeddings + free APIs. Structured storage in PostgreSQL + pgvector.

Features

📄

Structure-Preserving PDF Parsing

Docling layout parser — hierarchical chunking that preserves tables and section boundaries

🔍

Hybrid Search

BGE-M3 Dense+Sparse embeddings — pgvector hybrid semantic search

🗂️

Auto Topic Discovery

Claude Sonnet — auto-discovers 41 topics and 415 checklist items from ESG reports

📊

AI Qualitative Assessment

Checklist-based evaluation per topic — cited evidence, scores, and commentary

⚖️

Cross-Company Comparison

Sector-filtered topic × company matrix — auto-generated comparison tables

Local + Free

BGE-M3 on-device embeddings, Claude & Gemini free-tier usage

Architecture

Stack

🐍Python 3.12
🐘PostgreSQL 17
🧮pgvector 0.8
🔤BGE-M3
🧠Claude Sonnet
📄Docling
🔥PyTorch
🐳Docker Compose
📊Jinja2