# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview `pdf-to-kcf` is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses `pydantic-ai` to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed. ## Commands ### Development Setup ```bash # Install dependencies uv sync # Set up OpenRouter API key cp .env.example .env # Edit .env and add your OPENROUTER_API_KEY # Run the CLI tool uv run pdf-to-kcf # Run with options uv run pdf-to-kcf --output custom_output.json --start-page 2 --model anthropic/claude-3-opus ``` ### Running the Tool ```bash # Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default) uv run pdf-to-kcf document.pdf # Specify custom output file uv run pdf-to-kcf document.pdf -o insights.json # Start from a specific page (0-indexed) uv run pdf-to-kcf document.pdf -s 3 # Use a different AI model from OpenRouter uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5 uv run pdf-to-kcf document.pdf -m openai/gpt-4o ``` ## Architecture ### Core Components **models.py** - Data structures following the agentic document parsing format specification: - `ContentInsightType`: Enum for insight classification (FACT, OPINION, COMMENT) - `ContentInsightAttribute`: Key-value metadata for insights - `ContentInsight`: A single extracted insight with type, content, and attributes - `PageContentAnalysis`: Agent output containing all insights - `PageContent`: Context passed to the agent (page number, content, total pages) **pdf_reader.py** - PDF document handling: - `PDFDocument`: Wrapper class for reading PDF files using pypdf - Provides `get_page_text()` for single page extraction - Provides `get_all_pages()` for full document extraction **agent.py** - AI agent implementation: - `DocumentAnalyzer`: Main analyzer using pydantic-ai Agent - Configures the AI model and system prompt - Implements `read_page` tool that allows the agent to request additional pages autonomously - The agent decides when to fetch more pages based on context needs - Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes **cli.py** - Command-line interface: - Built with Click framework - Handles PDF loading, analysis orchestration, and JSON output - Provides user feedback during processing ### Agentic Behavior The AI agent is autonomous and can: 1. Start analyzing from an initial page 2. Determine if more context is needed from other pages 3. Use the `read_page` tool to fetch additional pages 4. Extract structured insights with proper classification 5. Return all insights in the specified JSON format ### Output Format The tool outputs JSON files with the following structure: ```json { "insights": [ { "type": "fact", "insight": "Summary of the insight", "content": "Original text that was analyzed", "attributes": [ {"attribute": "source", "value": "Page 1"}, {"attribute": "confidence", "value": "high"} ] } ] } ``` ## Requirements - Python 3.12+ - OpenRouter API key set as `OPENROUTER_API_KEY` environment variable - Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.) - Get API key at https://openrouter.ai/ - Alternatively supports direct provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.) - Dependencies managed via uv ## Model Configuration The tool is configured to use OpenRouter by default, which provides: - Access to multiple AI providers through a single API - Automatic fallback and load balancing - Competitive pricing - Support for the latest models When `OPENROUTER_API_KEY` is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: `/` (e.g., `anthropic/claude-3.5-sonnet`, `openai/gpt-4o`) ## Format Specification The project follows the format defined in `../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md`, which specifies: - How agents interact with documents - The structure of insights and their attributes - The `read_page` tool interface for autonomous page navigation - Classification system for different insight types