Files
pdf-to-kcf/CLAUDE.md
neutrino2211 b847133df2 Init
2025-12-19 20:41:08 +01:00

126 lines
4.3 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
`pdf-to-kcf` is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses `pydantic-ai` to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.
## Commands
### Development Setup
```bash
# Install dependencies
uv sync
# Set up OpenRouter API key
cp .env.example .env
# Edit .env and add your OPENROUTER_API_KEY
# Run the CLI tool
uv run pdf-to-kcf <pdf-path>
# Run with options
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus
```
### Running the Tool
```bash
# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
uv run pdf-to-kcf document.pdf
# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json
# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3
# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
uv run pdf-to-kcf document.pdf -m openai/gpt-4o
```
## Architecture
### Core Components
**models.py** - Data structures following the agentic document parsing format specification:
- `ContentInsightType`: Enum for insight classification (FACT, OPINION, COMMENT)
- `ContentInsightAttribute`: Key-value metadata for insights
- `ContentInsight`: A single extracted insight with type, content, and attributes
- `PageContentAnalysis`: Agent output containing all insights
- `PageContent`: Context passed to the agent (page number, content, total pages)
**pdf_reader.py** - PDF document handling:
- `PDFDocument`: Wrapper class for reading PDF files using pypdf
- Provides `get_page_text()` for single page extraction
- Provides `get_all_pages()` for full document extraction
**agent.py** - AI agent implementation:
- `DocumentAnalyzer`: Main analyzer using pydantic-ai Agent
- Configures the AI model and system prompt
- Implements `read_page` tool that allows the agent to request additional pages autonomously
- The agent decides when to fetch more pages based on context needs
- Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes
**cli.py** - Command-line interface:
- Built with Click framework
- Handles PDF loading, analysis orchestration, and JSON output
- Provides user feedback during processing
### Agentic Behavior
The AI agent is autonomous and can:
1. Start analyzing from an initial page
2. Determine if more context is needed from other pages
3. Use the `read_page` tool to fetch additional pages
4. Extract structured insights with proper classification
5. Return all insights in the specified JSON format
### Output Format
The tool outputs JSON files with the following structure:
```json
{
"insights": [
{
"type": "fact",
"insight": "Summary of the insight",
"content": "Original text that was analyzed",
"attributes": [
{"attribute": "source", "value": "Page 1"},
{"attribute": "confidence", "value": "high"}
]
}
]
}
```
## Requirements
- Python 3.12+
- OpenRouter API key set as `OPENROUTER_API_KEY` environment variable
- Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
- Get API key at https://openrouter.ai/
- Alternatively supports direct provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.)
- Dependencies managed via uv
## Model Configuration
The tool is configured to use OpenRouter by default, which provides:
- Access to multiple AI providers through a single API
- Automatic fallback and load balancing
- Competitive pricing
- Support for the latest models
When `OPENROUTER_API_KEY` is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: `<provider>/<model-name>` (e.g., `anthropic/claude-3.5-sonnet`, `openai/gpt-4o`)
## Format Specification
The project follows the format defined in `../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md`, which specifies:
- How agents interact with documents
- The structure of insights and their attributes
- The `read_page` tool interface for autonomous page navigation
- Classification system for different insight types