pdf-to-kcf/CLAUDE.md

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

`pdf-to-kcf` is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses `pydantic-ai` to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.

## Commands

### Development Setup
```bash
# Install dependencies
uv sync

# Set up OpenRouter API key
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY

# Run the CLI tool
uv run pdf-to-kcf <pdf-path>

# Run with options
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus
```

### Running the Tool
```bash
# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
uv run pdf-to-kcf document.pdf

# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json

# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3

# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
uv run pdf-to-kcf document.pdf -m openai/gpt-4o
```

## Architecture

### Core Components

**models.py** - Data structures following the agentic document parsing format specification:
- `ContentInsightType`: Enum for insight classification (FACT, OPINION, COMMENT)
- `ContentInsightAttribute`: Key-value metadata for insights
- `ContentInsight`: A single extracted insight with type, content, and attributes
- `PageContentAnalysis`: Agent output containing all insights
- `PageContent`: Context passed to the agent (page number, content, total pages)

**pdf_reader.py** - PDF document handling:
- `PDFDocument`: Wrapper class for reading PDF files using pypdf
- Provides `get_page_text()` for single page extraction
- Provides `get_all_pages()` for full document extraction

**agent.py** - AI agent implementation:
- `DocumentAnalyzer`: Main analyzer using pydantic-ai Agent
- Configures the AI model and system prompt
- Implements `read_page` tool that allows the agent to request additional pages autonomously
- The agent decides when to fetch more pages based on context needs
- Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes

**cli.py** - Command-line interface:
- Built with Click framework
- Handles PDF loading, analysis orchestration, and JSON output
- Provides user feedback during processing

### Agentic Behavior

The AI agent is autonomous and can:
1. Start analyzing from an initial page
2. Determine if more context is needed from other pages
3. Use the `read_page` tool to fetch additional pages
4. Extract structured insights with proper classification
5. Return all insights in the specified JSON format

### Output Format

The tool outputs JSON files with the following structure:
```json
{
  "insights": [
    {
      "type": "fact",
      "insight": "Summary of the insight",
      "content": "Original text that was analyzed",
      "attributes": [
        {"attribute": "source", "value": "Page 1"},
        {"attribute": "confidence", "value": "high"}
      ]
    }
  ]
}
```

## Requirements

- Python 3.12+
- OpenRouter API key set as `OPENROUTER_API_KEY` environment variable
  - Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
  - Get API key at https://openrouter.ai/
- Alternatively supports direct provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.)
- Dependencies managed via uv

## Model Configuration

The tool is configured to use OpenRouter by default, which provides:
- Access to multiple AI providers through a single API
- Automatic fallback and load balancing
- Competitive pricing
- Support for the latest models

When `OPENROUTER_API_KEY` is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: `<provider>/<model-name>` (e.g., `anthropic/claude-3.5-sonnet`, `openai/gpt-4o`)

## Format Specification

The project follows the format defined in `../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md`, which specifies:
- How agents interact with documents
- The structure of insights and their attributes
- The `read_page` tool interface for autonomous page navigation
- Classification system for different insight types