126 lines
4.3 KiB
Markdown
126 lines
4.3 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
## Project Overview
|
|
|
|
`pdf-to-kcf` is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses `pydantic-ai` to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.
|
|
|
|
## Commands
|
|
|
|
### Development Setup
|
|
```bash
|
|
# Install dependencies
|
|
uv sync
|
|
|
|
# Set up OpenRouter API key
|
|
cp .env.example .env
|
|
# Edit .env and add your OPENROUTER_API_KEY
|
|
|
|
# Run the CLI tool
|
|
uv run pdf-to-kcf <pdf-path>
|
|
|
|
# Run with options
|
|
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus
|
|
```
|
|
|
|
### Running the Tool
|
|
```bash
|
|
# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
|
|
uv run pdf-to-kcf document.pdf
|
|
|
|
# Specify custom output file
|
|
uv run pdf-to-kcf document.pdf -o insights.json
|
|
|
|
# Start from a specific page (0-indexed)
|
|
uv run pdf-to-kcf document.pdf -s 3
|
|
|
|
# Use a different AI model from OpenRouter
|
|
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
|
|
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
|
|
uv run pdf-to-kcf document.pdf -m openai/gpt-4o
|
|
```
|
|
|
|
## Architecture
|
|
|
|
### Core Components
|
|
|
|
**models.py** - Data structures following the agentic document parsing format specification:
|
|
- `ContentInsightType`: Enum for insight classification (FACT, OPINION, COMMENT)
|
|
- `ContentInsightAttribute`: Key-value metadata for insights
|
|
- `ContentInsight`: A single extracted insight with type, content, and attributes
|
|
- `PageContentAnalysis`: Agent output containing all insights
|
|
- `PageContent`: Context passed to the agent (page number, content, total pages)
|
|
|
|
**pdf_reader.py** - PDF document handling:
|
|
- `PDFDocument`: Wrapper class for reading PDF files using pypdf
|
|
- Provides `get_page_text()` for single page extraction
|
|
- Provides `get_all_pages()` for full document extraction
|
|
|
|
**agent.py** - AI agent implementation:
|
|
- `DocumentAnalyzer`: Main analyzer using pydantic-ai Agent
|
|
- Configures the AI model and system prompt
|
|
- Implements `read_page` tool that allows the agent to request additional pages autonomously
|
|
- The agent decides when to fetch more pages based on context needs
|
|
- Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes
|
|
|
|
**cli.py** - Command-line interface:
|
|
- Built with Click framework
|
|
- Handles PDF loading, analysis orchestration, and JSON output
|
|
- Provides user feedback during processing
|
|
|
|
### Agentic Behavior
|
|
|
|
The AI agent is autonomous and can:
|
|
1. Start analyzing from an initial page
|
|
2. Determine if more context is needed from other pages
|
|
3. Use the `read_page` tool to fetch additional pages
|
|
4. Extract structured insights with proper classification
|
|
5. Return all insights in the specified JSON format
|
|
|
|
### Output Format
|
|
|
|
The tool outputs JSON files with the following structure:
|
|
```json
|
|
{
|
|
"insights": [
|
|
{
|
|
"type": "fact",
|
|
"insight": "Summary of the insight",
|
|
"content": "Original text that was analyzed",
|
|
"attributes": [
|
|
{"attribute": "source", "value": "Page 1"},
|
|
{"attribute": "confidence", "value": "high"}
|
|
]
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
## Requirements
|
|
|
|
- Python 3.12+
|
|
- OpenRouter API key set as `OPENROUTER_API_KEY` environment variable
|
|
- Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
|
|
- Get API key at https://openrouter.ai/
|
|
- Alternatively supports direct provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.)
|
|
- Dependencies managed via uv
|
|
|
|
## Model Configuration
|
|
|
|
The tool is configured to use OpenRouter by default, which provides:
|
|
- Access to multiple AI providers through a single API
|
|
- Automatic fallback and load balancing
|
|
- Competitive pricing
|
|
- Support for the latest models
|
|
|
|
When `OPENROUTER_API_KEY` is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: `<provider>/<model-name>` (e.g., `anthropic/claude-3.5-sonnet`, `openai/gpt-4o`)
|
|
|
|
## Format Specification
|
|
|
|
The project follows the format defined in `../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md`, which specifies:
|
|
- How agents interact with documents
|
|
- The structure of insights and their attributes
|
|
- The `read_page` tool interface for autonomous page navigation
|
|
- Classification system for different insight types
|