Init

2025-12-19 20:41:08 +01:00
commit b847133df2
15 changed files with 4307 additions and 0 deletions
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,125 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+`pdf-to-kcf` is a Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. It uses `pydantic-ai` to create an intelligent agent that can autonomously decide how much of a document to analyze, requesting additional pages as needed.
+
+## Commands
+
+### Development Setup
+```bash
+# Install dependencies
+uv sync
+
+# Set up OpenRouter API key
+cp .env.example .env
+# Edit .env and add your OPENROUTER_API_KEY
+
+# Run the CLI tool
+uv run pdf-to-kcf <pdf-path>
+
+# Run with options
+uv run pdf-to-kcf <pdf-path> --output custom_output.json --start-page 2 --model anthropic/claude-3-opus
+```
+
+### Running the Tool
+```bash
+# Basic usage (uses Claude 3.5 Sonnet via OpenRouter by default)
+uv run pdf-to-kcf document.pdf
+
+# Specify custom output file
+uv run pdf-to-kcf document.pdf -o insights.json
+
+# Start from a specific page (0-indexed)
+uv run pdf-to-kcf document.pdf -s 3
+
+# Use a different AI model from OpenRouter
+uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
+uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5
+uv run pdf-to-kcf document.pdf -m openai/gpt-4o
+```
+
+## Architecture
+
+### Core Components
+
+**models.py** - Data structures following the agentic document parsing format specification:
+- `ContentInsightType`: Enum for insight classification (FACT, OPINION, COMMENT)
+- `ContentInsightAttribute`: Key-value metadata for insights
+- `ContentInsight`: A single extracted insight with type, content, and attributes
+- `PageContentAnalysis`: Agent output containing all insights
+- `PageContent`: Context passed to the agent (page number, content, total pages)
+
+**pdf_reader.py** - PDF document handling:
+- `PDFDocument`: Wrapper class for reading PDF files using pypdf
+- Provides `get_page_text()` for single page extraction
+- Provides `get_all_pages()` for full document extraction
+
+**agent.py** - AI agent implementation:
+- `DocumentAnalyzer`: Main analyzer using pydantic-ai Agent
+- Configures the AI model and system prompt
+- Implements `read_page` tool that allows the agent to request additional pages autonomously
+- The agent decides when to fetch more pages based on context needs
+- Agent is instructed to classify insights as facts, opinions, or comments with relevant attributes
+
+**cli.py** - Command-line interface:
+- Built with Click framework
+- Handles PDF loading, analysis orchestration, and JSON output
+- Provides user feedback during processing
+
+### Agentic Behavior
+
+The AI agent is autonomous and can:
+1. Start analyzing from an initial page
+2. Determine if more context is needed from other pages
+3. Use the `read_page` tool to fetch additional pages
+4. Extract structured insights with proper classification
+5. Return all insights in the specified JSON format
+
+### Output Format
+
+The tool outputs JSON files with the following structure:
+```json
+{
+  "insights": [
+    {
+      "type": "fact",
+      "insight": "Summary of the insight",
+      "content": "Original text that was analyzed",
+      "attributes": [
+        {"attribute": "source", "value": "Page 1"},
+        {"attribute": "confidence", "value": "high"}
+      ]
+    }
+  ]
+}
+```
+
+## Requirements
+
+- Python 3.12+
+- OpenRouter API key set as `OPENROUTER_API_KEY` environment variable
+  - Provides access to all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
+  - Get API key at https://openrouter.ai/
+- Alternatively supports direct provider keys (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, etc.)
+- Dependencies managed via uv
+
+## Model Configuration
+
+The tool is configured to use OpenRouter by default, which provides:
+- Access to multiple AI providers through a single API
+- Automatic fallback and load balancing
+- Competitive pricing
+- Support for the latest models
+
+When `OPENROUTER_API_KEY` is set, the agent automatically configures the OpenAI-compatible interface with OpenRouter's base URL. Models should be specified in the format: `<provider>/<model-name>` (e.g., `anthropic/claude-3.5-sonnet`, `openai/gpt-4o`)
+
+## Format Specification
+
+The project follows the format defined in `../docs/AGENTIC_DOCUMENT_PARSING_FORMAT.md`, which specifies:
+- How agents interact with documents
+- The structure of insights and their attributes
+- The `read_page` tool interface for autonomous page navigation
+- Classification system for different insight types