CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

pdf-to-kcf is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses pydantic-ai to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.

Commands

Development Setup

# Install dependencies
uv sync

# Set up OpenRouter API key
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY

# Run the CLI tool
uv run pdf-to-kcf <pdf-path>

# Run with options
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus

Running the Tool

# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
uv run pdf-to-kcf document.pdf

# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json

# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3

# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
uv run pdf-to-kcf document.pdf -m openai/gpt-4o

Architecture

Core Components

models.py - Data structures following the agentic document parsing format specification:

ContentInsightType: Enum for insight classification (FACT, OPINION, COMMENT)
ContentInsightAttribute: Key-value metadata for insights
ContentInsight: A single extracted insight with type, content, and attributes
PageContentAnalysis: Agent output containing all insights
PageContent: Context passed to the agent (page number, content, total pages)

pdf_reader.py - PDF document handling:

PDFDocument: Wrapper class for reading PDF files using pypdf
Provides get_page_text() for single page extraction
Provides get_all_pages() for full document extraction

agent.py - AI agent implementation:

DocumentAnalyzer: Main analyzer using pydantic-ai Agent
Configures the AI model and system prompt
Implements read_page tool that allows the agent to request additional pages autonomously
The agent decides when to fetch more pages based on context needs
Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes

cli.py - Command-line interface:

Built with Click framework
Handles PDF loading, analysis orchestration, and JSON output
Provides user feedback during processing

Agentic Behavior

The AI agent is autonomous and can:

Start analyzing from an initial page
Determine if more context is needed from other pages
Use the read_page tool to fetch additional pages
Extract structured insights with proper classification
Return all insights in the specified JSON format

Output Format

The tool outputs JSON files with the following structure:

{
  "insights": [
    {
      "type": "fact",
      "insight": "Summary of the insight",
      "content": "Original text that was analyzed",
      "attributes": [
        {"attribute": "source", "value": "Page 1"},
        {"attribute": "confidence", "value": "high"}
      ]
    }
  ]
}

Requirements

Python 3.12+
OpenRouter API key set as OPENROUTER_API_KEY environment variable
- Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
- Get API key at https://openrouter.ai/
Alternatively supports direct provider keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
Dependencies managed via uv

Model Configuration

The tool is configured to use OpenRouter by default, which provides:

Access to multiple AI providers through a single API
Automatic fallback and load balancing
Competitive pricing
Support for the latest models

When OPENROUTER_API_KEY is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: <provider>/<model-name> (e.g., anthropic/claude-3.5-sonnet, openai/gpt-4o)

Format Specification

The project follows the format defined in ../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md, which specifies:

How agents interact with documents
The structure of insights and their attributes
The read_page tool interface for autonomous page navigation
Classification system for different insight types

4.3 KiB Raw Blame History