Init
This commit is contained in:
125
CLAUDE.md
Normal file
125
CLAUDE.md
Normal file
@@ -0,0 +1,125 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Project Overview
|
||||
|
||||
`pdf-to-kcf` is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses `pydantic-ai` to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.
|
||||
|
||||
## Commands
|
||||
|
||||
### Development Setup
|
||||
```bash
|
||||
# Install dependencies
|
||||
uv sync
|
||||
|
||||
# Set up OpenRouter API key
|
||||
cp .env.example .env
|
||||
# Edit .env and add your OPENROUTER_API_KEY
|
||||
|
||||
# Run the CLI tool
|
||||
uv run pdf-to-kcf <pdf-path>
|
||||
|
||||
# Run with options
|
||||
uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus
|
||||
```
|
||||
|
||||
### Running the Tool
|
||||
```bash
|
||||
# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
|
||||
uv run pdf-to-kcf document.pdf
|
||||
|
||||
# Specify custom output file
|
||||
uv run pdf-to-kcf document.pdf -o insights.json
|
||||
|
||||
# Start from a specific page (0-indexed)
|
||||
uv run pdf-to-kcf document.pdf -s 3
|
||||
|
||||
# Use a different AI model from OpenRouter
|
||||
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
|
||||
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
|
||||
uv run pdf-to-kcf document.pdf -m openai/gpt-4o
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
**models.py** - Data structures following the agentic document parsing format specification:
|
||||
- `ContentInsightType`: Enum for insight classification (FACT, OPINION, COMMENT)
|
||||
- `ContentInsightAttribute`: Key-value metadata for insights
|
||||
- `ContentInsight`: A single extracted insight with type, content, and attributes
|
||||
- `PageContentAnalysis`: Agent output containing all insights
|
||||
- `PageContent`: Context passed to the agent (page number, content, total pages)
|
||||
|
||||
**pdf_reader.py** - PDF document handling:
|
||||
- `PDFDocument`: Wrapper class for reading PDF files using pypdf
|
||||
- Provides `get_page_text()` for single page extraction
|
||||
- Provides `get_all_pages()` for full document extraction
|
||||
|
||||
**agent.py** - AI agent implementation:
|
||||
- `DocumentAnalyzer`: Main analyzer using pydantic-ai Agent
|
||||
- Configures the AI model and system prompt
|
||||
- Implements `read_page` tool that allows the agent to request additional pages autonomously
|
||||
- The agent decides when to fetch more pages based on context needs
|
||||
- Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes
|
||||
|
||||
**cli.py** - Command-line interface:
|
||||
- Built with Click framework
|
||||
- Handles PDF loading, analysis orchestration, and JSON output
|
||||
- Provides user feedback during processing
|
||||
|
||||
### Agentic Behavior
|
||||
|
||||
The AI agent is autonomous and can:
|
||||
1. Start analyzing from an initial page
|
||||
2. Determine if more context is needed from other pages
|
||||
3. Use the `read_page` tool to fetch additional pages
|
||||
4. Extract structured insights with proper classification
|
||||
5. Return all insights in the specified JSON format
|
||||
|
||||
### Output Format
|
||||
|
||||
The tool outputs JSON files with the following structure:
|
||||
```json
|
||||
{
|
||||
"insights": [
|
||||
{
|
||||
"type": "fact",
|
||||
"insight": "Summary of the insight",
|
||||
"content": "Original text that was analyzed",
|
||||
"attributes": [
|
||||
{"attribute": "source", "value": "Page 1"},
|
||||
{"attribute": "confidence", "value": "high"}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.12+
|
||||
- OpenRouter API key set as `OPENROUTER_API_KEY` environment variable
|
||||
- Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
|
||||
- Get API key at https://openrouter.ai/
|
||||
- Alternatively supports direct provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.)
|
||||
- Dependencies managed via uv
|
||||
|
||||
## Model Configuration
|
||||
|
||||
The tool is configured to use OpenRouter by default, which provides:
|
||||
- Access to multiple AI providers through a single API
|
||||
- Automatic fallback and load balancing
|
||||
- Competitive pricing
|
||||
- Support for the latest models
|
||||
|
||||
When `OPENROUTER_API_KEY` is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: `<provider>/<model-name>` (e.g., `anthropic/claude-3.5-sonnet`, `openai/gpt-4o`)
|
||||
|
||||
## Format Specification
|
||||
|
||||
The project follows the format defined in `../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md`, which specifies:
|
||||
- How agents interact with documents
|
||||
- The structure of insights and their attributes
|
||||
- The `read_page` tool interface for autonomous page navigation
|
||||
- Classification system for different insight types
|
||||
Reference in New Issue
Block a user