Files
pdf-to-kcf/README.md
neutrino2211 b847133df2 Init
2025-12-19 20:41:08 +01:00

128 lines
4.0 KiB
Markdown
Raw Permalink Blame History

# pdf-to-kcf
A Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. Built with `pydantic-ai`, this tool creates an intelligent agent that autonomously analyzes documents, requesting additional pages as needed to form complete insights.
## Features
- **Autonomous Document Analysis**: AI agent decides how much of the document to read
- **Structured Insight Extraction**: Classifies content as facts, opinions, or comments
- **Rich Metadata**: Adds attributes like source, confidence, dates, and more
- **Multiple AI Models**: Supports OpenAI and other compatible models
- **JSON Output**: Exports insights in a structured, machine-readable format
## Installation
This project uses [uv](https://github.com/astral-sh/uv) for dependency management:
```bash
# Install dependencies
uv sync
```
## Setup
1. Copy the environment template:
```bash
cp .env.example .env
```
2. Add your OpenRouter API key to `.env`:
```bash
OPENROUTER_API_KEY=your_openrouter_api_key_here
```
3. Get your API key from [OpenRouter](https://openrouter.ai/) (free tier available)
## Usage
```bash
# Basic usage (uses OpenRouter with Claude 3.5 Sonnet by default)
uv run pdf-to-kcf document.pdf
# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json
# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3
# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
```
### Options
- `--output, -o`: Output JSON file path (default: `<pdf_name>_insights.json`)
- `--start-page, -s`: Starting page number, 0-indexed (default: 0)
- `--model, -m`: AI model to use via OpenRouter (default: `anthropic/claude-3.5-sonnet`)
### Available Models
When using OpenRouter, you can specify any model using the format `<provider>/<model-name>`:
- `anthropic/claude-3.5-sonnet` (default, recommended)
- `anthropic/claude-3-opus`
- `openai/gpt-4o`
- `meta-llama/llama-3.1-70b-instruct`
- `google/gemini-pro-1.5`
- See [OpenRouter models](https://openrouter.ai/models) for full list
## Output Format
The tool generates JSON files with structured insights:
```json
{
"insights": [
{
"type": "fact",
"insight": "Global temperatures have risen 1.1<EFBFBD>C since pre-industrial times",
"content": "According to the IPCC, global temperatures have risen approximately 1.1<EFBFBD>C...",
"attributes": [
{"attribute": "source", "value": "IPCC Report"},
{"attribute": "confidence", "value": "high"},
{"attribute": "year", "value": "2023"}
]
},
{
"type": "opinion",
"insight": "The author believes immediate action is required",
"content": "We must act now to prevent catastrophic consequences...",
"attributes": [
{"attribute": "sentiment", "value": "urgent"},
{"attribute": "section", "value": "conclusion"}
]
}
]
}
```
## How It Works
1. **PDF Loading**: Extracts text content from PDF using pypdf
2. **Agent Initialization**: Creates a pydantic-ai agent with the specified model
3. **Autonomous Analysis**: Agent analyzes content and can request additional pages
4. **Insight Extraction**: Classifies and structures insights with metadata
5. **JSON Export**: Saves all insights to a JSON file
## Requirements
- Python 3.12 or higher
- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)
- Get your free API key at [OpenRouter](https://openrouter.ai/)
- Supports all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
- Alternatively, use `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or other provider keys
## Architecture
The tool follows the agentic document parsing format with these core components:
- **models.py**: Data structures (ContentInsight, PageContentAnalysis, etc.)
- **pdf_reader.py**: PDF text extraction (PDFDocument class)
- **agent.py**: AI agent with autonomous page reading capability
- **cli.py**: Command-line interface
See `CLAUDE.md` for detailed architecture documentation.
## License
MIT