neutrino2211 b847133df2 Init
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00
2025-12-19 20:41:08 +01:00

pdf-to-kcf

A Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. Built with pydantic-ai, this tool creates an intelligent agent that autonomously analyzes documents, requesting additional pages as needed to form complete insights.

Features

  • Autonomous Document Analysis: AI agent decides how much of the document to read
  • Structured Insight Extraction: Classifies content as facts, opinions, or comments
  • Rich Metadata: Adds attributes like source, confidence, dates, and more
  • Multiple AI Models: Supports OpenAI and other compatible models
  • JSON Output: Exports insights in a structured, machine-readable format

Installation

This project uses uv for dependency management:

# Install dependencies
uv sync

Setup

  1. Copy the environment template:
cp .env.example .env
  1. Add your OpenRouter API key to .env:
OPENROUTER_API_KEY=your_openrouter_api_key_here
  1. Get your API key from OpenRouter (free tier available)

Usage

# Basic usage (uses OpenRouter with Claude 3.5 Sonnet by default)
uv run pdf-to-kcf document.pdf

# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json

# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3

# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5

Options

  • --output, -o: Output JSON file path (default: <pdf_name>_insights.json)
  • --start-page, -s: Starting page number, 0-indexed (default: 0)
  • --model, -m: AI model to use via OpenRouter (default: anthropic/claude-3.5-sonnet)

Available Models

When using OpenRouter, you can specify any model using the format <provider>/<model-name>:

  • anthropic/claude-3.5-sonnet (default, recommended)
  • anthropic/claude-3-opus
  • openai/gpt-4o
  • meta-llama/llama-3.1-70b-instruct
  • google/gemini-pro-1.5
  • See OpenRouter models for full list

Output Format

The tool generates JSON files with structured insights:

{
  "insights": [
    {
      "type": "fact",
      "insight": "Global temperatures have risen 1.1<EFBFBD>C since pre-industrial times",
      "content": "According to the IPCC, global temperatures have risen approximately 1.1<EFBFBD>C...",
      "attributes": [
        {"attribute": "source", "value": "IPCC Report"},
        {"attribute": "confidence", "value": "high"},
        {"attribute": "year", "value": "2023"}
      ]
    },
    {
      "type": "opinion",
      "insight": "The author believes immediate action is required",
      "content": "We must act now to prevent catastrophic consequences...",
      "attributes": [
        {"attribute": "sentiment", "value": "urgent"},
        {"attribute": "section", "value": "conclusion"}
      ]
    }
  ]
}

How It Works

  1. PDF Loading: Extracts text content from PDF using pypdf
  2. Agent Initialization: Creates a pydantic-ai agent with the specified model
  3. Autonomous Analysis: Agent analyzes content and can request additional pages
  4. Insight Extraction: Classifies and structures insights with metadata
  5. JSON Export: Saves all insights to a JSON file

Requirements

  • Python 3.12 or higher
  • OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
    • Get your free API key at OpenRouter
    • Supports all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
  • Alternatively, use OPENAI_API_KEY, ANTHROPIC_API_KEY, or other provider keys

Architecture

The tool follows the agentic document parsing format with these core components:

  • models.py: Data structures (ContentInsight, PageContentAnalysis, etc.)
  • pdf_reader.py: PDF text extraction (PDFDocument class)
  • agent.py: AI agent with autonomous page reading capability
  • cli.py: Command-line interface

See CLAUDE.md for detailed architecture documentation.

License

MIT

Description
No description provided
Readme 176 KiB
Languages
Python 100%