Files
pdf-to-kcf/CLAUDE.md
neutrino2211 b847133df2 Init
2025-12-19 20:41:08 +01:00

4.3 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

pdf-to-kcf is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses pydantic-ai to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.

Commands

Development Setup

# Install dependencies
uv sync

# Set up OpenRouter API key
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY

# Run the CLI tool
uv run pdf-to-kcf <pdf-path>

# Run with options
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus

Running the Tool

# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
uv run pdf-to-kcf document.pdf

# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json

# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3

# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
uv run pdf-to-kcf document.pdf -m openai/gpt-4o

Architecture

Core Components

models.py - Data structures following the agentic document parsing format specification:

  • ContentInsightType: Enum for insight classification (FACT, OPINION, COMMENT)
  • ContentInsightAttribute: Key-value metadata for insights
  • ContentInsight: A single extracted insight with type, content, and attributes
  • PageContentAnalysis: Agent output containing all insights
  • PageContent: Context passed to the agent (page number, content, total pages)

pdf_reader.py - PDF document handling:

  • PDFDocument: Wrapper class for reading PDF files using pypdf
  • Provides get_page_text() for single page extraction
  • Provides get_all_pages() for full document extraction

agent.py - AI agent implementation:

  • DocumentAnalyzer: Main analyzer using pydantic-ai Agent
  • Configures the AI model and system prompt
  • Implements read_page tool that allows the agent to request additional pages autonomously
  • The agent decides when to fetch more pages based on context needs
  • Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes

cli.py - Command-line interface:

  • Built with Click framework
  • Handles PDF loading, analysis orchestration, and JSON output
  • Provides user feedback during processing

Agentic Behavior

The AI agent is autonomous and can:

  1. Start analyzing from an initial page
  2. Determine if more context is needed from other pages
  3. Use the read_page tool to fetch additional pages
  4. Extract structured insights with proper classification
  5. Return all insights in the specified JSON format

Output Format

The tool outputs JSON files with the following structure:

{
  "insights": [
    {
      "type": "fact",
      "insight": "Summary of the insight",
      "content": "Original text that was analyzed",
      "attributes": [
        {"attribute": "source", "value": "Page 1"},
        {"attribute": "confidence", "value": "high"}
      ]
    }
  ]
}

Requirements

  • Python 3.12+
  • OpenRouter API key set as OPENROUTER_API_KEY environment variable
    • Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
    • Get API key at https://openrouter.ai/
  • Alternatively supports direct provider keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
  • Dependencies managed via uv

Model Configuration

The tool is configured to use OpenRouter by default, which provides:

  • Access to multiple AI providers through a single API
  • Automatic fallback and load balancing
  • Competitive pricing
  • Support for the latest models

When OPENROUTER_API_KEY is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: <provider>/<model-name> (e.g., anthropic/claude-3.5-sonnet, openai/gpt-4o)

Format Specification

The project follows the format defined in ../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md, which specifies:

  • How agents interact with documents
  • The structure of insights and their attributes
  • The read_page tool interface for autonomous page navigation
  • Classification system for different insight types