128 lines
4.0 KiB
Markdown
128 lines
4.0 KiB
Markdown
# pdf-to-kcf
|
||
|
||
A Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. Built with `pydantic-ai`, this tool creates an intelligent agent that autonomously analyzes documents, requesting additional pages as needed to form complete insights.
|
||
|
||
## Features
|
||
|
||
- **Autonomous Document Analysis**: AI agent decides how much of the document to read
|
||
- **Structured Insight Extraction**: Classifies content as facts, opinions, or comments
|
||
- **Rich Metadata**: Adds attributes like source, confidence, dates, and more
|
||
- **Multiple AI Models**: Supports OpenAI and other compatible models
|
||
- **JSON Output**: Exports insights in a structured, machine-readable format
|
||
|
||
## Installation
|
||
|
||
This project uses [uv](https://github.com/astral-sh/uv) for dependency management:
|
||
|
||
```bash
|
||
# Install dependencies
|
||
uv sync
|
||
```
|
||
|
||
## Setup
|
||
|
||
1. Copy the environment template:
|
||
```bash
|
||
cp .env.example .env
|
||
```
|
||
|
||
2. Add your OpenRouter API key to `.env`:
|
||
```bash
|
||
OPENROUTER_API_KEY=your_openrouter_api_key_here
|
||
```
|
||
|
||
3. Get your API key from [OpenRouter](https://openrouter.ai/) (free tier available)
|
||
|
||
## Usage
|
||
|
||
```bash
|
||
# Basic usage (uses OpenRouter with Claude 3.5 Sonnet by default)
|
||
uv run pdf-to-kcf document.pdf
|
||
|
||
# Specify custom output file
|
||
uv run pdf-to-kcf document.pdf -o insights.json
|
||
|
||
# Start from a specific page (0-indexed)
|
||
uv run pdf-to-kcf document.pdf -s 3
|
||
|
||
# Use a different AI model from OpenRouter
|
||
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
|
||
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
|
||
```
|
||
|
||
### Options
|
||
|
||
- `--output, -o`: Output JSON file path (default: `<pdf_name>_insights.json`)
|
||
- `--start-page, -s`: Starting page number, 0-indexed (default: 0)
|
||
- `--model, -m`: AI model to use via OpenRouter (default: `anthropic/claude-3.5-sonnet`)
|
||
|
||
### Available Models
|
||
|
||
When using OpenRouter, you can specify any model using the format `<provider>/<model-name>`:
|
||
- `anthropic/claude-3.5-sonnet` (default, recommended)
|
||
- `anthropic/claude-3-opus`
|
||
- `openai/gpt-4o`
|
||
- `meta-llama/llama-3.1-70b-instruct`
|
||
- `google/gemini-pro-1.5`
|
||
- See [OpenRouter models](https://openrouter.ai/models) for full list
|
||
|
||
## Output Format
|
||
|
||
The tool generates JSON files with structured insights:
|
||
|
||
```json
|
||
{
|
||
"insights": [
|
||
{
|
||
"type": "fact",
|
||
"insight": "Global temperatures have risen 1.1<EFBFBD>C since pre-industrial times",
|
||
"content": "According to the IPCC, global temperatures have risen approximately 1.1<EFBFBD>C...",
|
||
"attributes": [
|
||
{"attribute": "source", "value": "IPCC Report"},
|
||
{"attribute": "confidence", "value": "high"},
|
||
{"attribute": "year", "value": "2023"}
|
||
]
|
||
},
|
||
{
|
||
"type": "opinion",
|
||
"insight": "The author believes immediate action is required",
|
||
"content": "We must act now to prevent catastrophic consequences...",
|
||
"attributes": [
|
||
{"attribute": "sentiment", "value": "urgent"},
|
||
{"attribute": "section", "value": "conclusion"}
|
||
]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
## How It Works
|
||
|
||
1. **PDF Loading**: Extracts text content from PDF using pypdf
|
||
2. **Agent Initialization**: Creates a pydantic-ai agent with the specified model
|
||
3. **Autonomous Analysis**: Agent analyzes content and can request additional pages
|
||
4. **Insight Extraction**: Classifies and structures insights with metadata
|
||
5. **JSON Export**: Saves all insights to a JSON file
|
||
|
||
## Requirements
|
||
|
||
- Python 3.12 or higher
|
||
- OpenRouter API key (set as `OPENROUTER_API_KEY` environment variable)
|
||
- Get your free API key at [OpenRouter](https://openrouter.ai/)
|
||
- Supports all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
|
||
- Alternatively, use `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or other provider keys
|
||
|
||
## Architecture
|
||
|
||
The tool follows the agentic document parsing format with these core components:
|
||
|
||
- **models.py**: Data structures (ContentInsight, PageContentAnalysis, etc.)
|
||
- **pdf_reader.py**: PDF text extraction (PDFDocument class)
|
||
- **agent.py**: AI agent with autonomous page reading capability
|
||
- **cli.py**: Command-line interface
|
||
|
||
See `CLAUDE.md` for detailed architecture documentation.
|
||
|
||
## License
|
||
|
||
MIT |