4.3 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
pdf-to-kcf is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses pydantic-ai to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.
Commands
Development Setup
# Install dependencies
uv sync
# Set up OpenRouter API key
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY
# Run the CLI tool
uv run pdf-to-kcf <pdf-path>
# Run with options
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus
Running the Tool
# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
uv run pdf-to-kcf document.pdf
# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json
# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3
# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
uv run pdf-to-kcf document.pdf -m openai/gpt-4o
Architecture
Core Components
models.py - Data structures following the agentic document parsing format specification:
ContentInsightType: Enum for insight classification (FACT, OPINION, COMMENT)ContentInsightAttribute: Key-value metadata for insightsContentInsight: A single extracted insight with type, content, and attributesPageContentAnalysis: Agent output containing all insightsPageContent: Context passed to the agent (page number, content, total pages)
pdf_reader.py - PDF document handling:
PDFDocument: Wrapper class for reading PDF files using pypdf- Provides
get_page_text()for single page extraction - Provides
get_all_pages()for full document extraction
agent.py - AI agent implementation:
DocumentAnalyzer: Main analyzer using pydantic-ai Agent- Configures the AI model and system prompt
- Implements
read_pagetool that allows the agent to request additional pages autonomously - The agent decides when to fetch more pages based on context needs
- Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes
cli.py - Command-line interface:
- Built with Click framework
- Handles PDF loading, analysis orchestration, and JSON output
- Provides user feedback during processing
Agentic Behavior
The AI agent is autonomous and can:
- Start analyzing from an initial page
- Determine if more context is needed from other pages
- Use the
read_pagetool to fetch additional pages - Extract structured insights with proper classification
- Return all insights in the specified JSON format
Output Format
The tool outputs JSON files with the following structure:
{
"insights": [
{
"type": "fact",
"insight": "Summary of the insight",
"content": "Original text that was analyzed",
"attributes": [
{"attribute": "source", "value": "Page 1"},
{"attribute": "confidence", "value": "high"}
]
}
]
}
Requirements
- Python 3.12+
- OpenRouter API key set as
OPENROUTER_API_KEYenvironment variable- Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
- Get API key at https://openrouter.ai/
- Alternatively supports direct provider keys (
OPENAI_API_KEY,ANTHROPIC_API_KEY, etc.) - Dependencies managed via uv
Model Configuration
The tool is configured to use OpenRouter by default, which provides:
- Access to multiple AI providers through a single API
- Automatic fallback and load balancing
- Competitive pricing
- Support for the latest models
When OPENROUTER_API_KEY is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: <provider>/<model-name> (e.g., anthropic/claude-3.5-sonnet, openai/gpt-4o)
Format Specification
The project follows the format defined in ../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md, which specifies:
- How agents interact with documents
- The structure of insights and their attributes
- The
read_pagetool interface for autonomous page navigation - Classification system for different insight types