pdf-to-kcf
A Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. Built with pydantic-ai, this tool creates an intelligent agent that autonomously analyzes documents, requesting additional pages as needed to form complete insights.
Features
- Autonomous Document Analysis: AI agent decides how much of the document to read
- Structured Insight Extraction: Classifies content as facts, opinions, or comments
- Rich Metadata: Adds attributes like source, confidence, dates, and more
- Multiple AI Models: Supports OpenAI and other compatible models
- JSON Output: Exports insights in a structured, machine-readable format
Installation
This project uses uv for dependency management:
# Install dependencies
uv sync
Setup
- Copy the environment template:
cp .env.example .env
- Add your OpenRouter API key to
.env:
OPENROUTER_API_KEY=your_openrouter_api_key_here
- Get your API key from OpenRouter (free tier available)
Usage
# Basic usage (uses OpenRouter with Claude 3.5 Sonnet by default)
uv run pdf-to-kcf document.pdf
# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json
# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3
# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
Options
--output, -o: Output JSON file path (default:<pdf_name>_insights.json)--start-page, -s: Starting page number, 0-indexed (default: 0)--model, -m: AI model to use via OpenRouter (default:anthropic/claude-3.5-sonnet)
Available Models
When using OpenRouter, you can specify any model using the format <provider>/<model-name>:
anthropic/claude-3.5-sonnet(default, recommended)anthropic/claude-3-opusopenai/gpt-4ometa-llama/llama-3.1-70b-instructgoogle/gemini-pro-1.5- See OpenRouter models for full list
Output Format
The tool generates JSON files with structured insights:
{
"insights": [
{
"type": "fact",
"insight": "Global temperatures have risen 1.1<EFBFBD>C since pre-industrial times",
"content": "According to the IPCC, global temperatures have risen approximately 1.1<EFBFBD>C...",
"attributes": [
{"attribute": "source", "value": "IPCC Report"},
{"attribute": "confidence", "value": "high"},
{"attribute": "year", "value": "2023"}
]
},
{
"type": "opinion",
"insight": "The author believes immediate action is required",
"content": "We must act now to prevent catastrophic consequences...",
"attributes": [
{"attribute": "sentiment", "value": "urgent"},
{"attribute": "section", "value": "conclusion"}
]
}
]
}
How It Works
- PDF Loading: Extracts text content from PDF using pypdf
- Agent Initialization: Creates a pydantic-ai agent with the specified model
- Autonomous Analysis: Agent analyzes content and can request additional pages
- Insight Extraction: Classifies and structures insights with metadata
- JSON Export: Saves all insights to a JSON file
Requirements
- Python 3.12 or higher
- OpenRouter API key (set as
OPENROUTER_API_KEYenvironment variable)- Get your free API key at OpenRouter
- Supports all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
- Alternatively, use
OPENAI_API_KEY,ANTHROPIC_API_KEY, or other provider keys
Architecture
The tool follows the agentic document parsing format with these core components:
- models.py: Data structures (ContentInsight, PageContentAnalysis, etc.)
- pdf_reader.py: PDF text extraction (PDFDocument class)
- agent.py: AI agent with autonomous page reading capability
- cli.py: Command-line interface
See CLAUDE.md for detailed architecture documentation.
License
MIT
Description
Languages
Python
100%