qrk/pdf-to-kcf

Fork 0

Go to file

neutrino2211 b847133df2 Init

2025-12-19 20:41:08 +01:00

src/pdf_to_kcf

Init

2025-12-19 20:41:08 +01:00

.env.example

Init

2025-12-19 20:41:08 +01:00

.gitignore

Init

2025-12-19 20:41:08 +01:00

.python-version

Init

2025-12-19 20:41:08 +01:00

CLAUDE.md

Init

2025-12-19 20:41:08 +01:00

LICENSE

Init

2025-12-19 20:41:08 +01:00

pyproject.toml

Init

2025-12-19 20:41:08 +01:00

QUICKSTART.md

Init

2025-12-19 20:41:08 +01:00

README.md

Init

2025-12-19 20:41:08 +01:00

uv.lock

Init

2025-12-19 20:41:08 +01:00

README.md

pdf-to-kcf

A Python CLI tool that uses AI agents to parse PDF documents and extract structured insights. Built with pydantic-ai, this tool creates an intelligent agent that autonomously analyzes documents, requesting additional pages as needed to form complete insights.

Features

Autonomous Document Analysis: AI agent decides how much of the document to read
Structured Insight Extraction: Classifies content as facts, opinions, or comments
Rich Metadata: Adds attributes like source, confidence, dates, and more
Multiple AI Models: Supports OpenAI and other compatible models
JSON Output: Exports insights in a structured, machine-readable format

Installation

This project uses uv for dependency management:

# Install dependencies
uv sync

Setup

Copy the environment template:

cp .env.example .env

Add your OpenRouter API key to .env:

OPENROUTER_API_KEY=your_openrouter_api_key_here

Get your API key from OpenRouter (free tier available)

Usage

# Basic usage (uses OpenRouter with Claude 3.5 Sonnet by default)
uv run pdf-to-kcf document.pdf

# Specify custom output file
uv run pdf-to-kcf document.pdf -o insights.json

# Start from a specific page (0-indexed)
uv run pdf-to-kcf document.pdf -s 3

# Use a different AI model from OpenRouter
uv run pdf-to-kcf document.pdf -m meta-llama/llama-3.1-70b-instruct
uv run pdf-to-kcf document.pdf -m google/gemini-pro-1.5

Options

--output, -o: Output JSON file path (default: <pdf_name>_insights.json)
--start-page, -s: Starting page number, 0-indexed (default: 0)
--model, -m: AI model to use via OpenRouter (default: anthropic/claude-3.5-sonnet)

Available Models

When using OpenRouter, you can specify any model using the format <provider>/<model-name>:

anthropic/claude-3.5-sonnet (default, recommended)
anthropic/claude-3-opus
openai/gpt-4o
meta-llama/llama-3.1-70b-instruct
google/gemini-pro-1.5
See OpenRouter models for full list

Output Format

The tool generates JSON files with structured insights:

{
  "insights": [
    {
      "type": "fact",
      "insight": "Global temperatures have risen 1.1<EFBFBD>C since pre-industrial times",
      "content": "According to the IPCC, global temperatures have risen approximately 1.1<EFBFBD>C...",
      "attributes": [
        {"attribute": "source", "value": "IPCC Report"},
        {"attribute": "confidence", "value": "high"},
        {"attribute": "year", "value": "2023"}
      ]
    },
    {
      "type": "opinion",
      "insight": "The author believes immediate action is required",
      "content": "We must act now to prevent catastrophic consequences...",
      "attributes": [
        {"attribute": "sentiment", "value": "urgent"},
        {"attribute": "section", "value": "conclusion"}
      ]
    }
  ]
}

How It Works

PDF Loading: Extracts text content from PDF using pypdf
Agent Initialization: Creates a pydantic-ai agent with the specified model
Autonomous Analysis: Agent analyzes content and can request additional pages
Insight Extraction: Classifies and structures insights with metadata
JSON Export: Saves all insights to a JSON file

Requirements

Python 3.12 or higher
OpenRouter API key (set as OPENROUTER_API_KEY environment variable)
- Get your free API key at OpenRouter
- Supports all major AI models (Claude, GPT-4, Gemini, Llama, etc.)
Alternatively, use OPENAI_API_KEY, ANTHROPIC_API_KEY, or other provider keys

Architecture

The tool follows the agentic document parsing format with these core components:

models.py: Data structures (ContentInsight, PageContentAnalysis, etc.)
pdf_reader.py: PDF text extraction (PDFDocument class)
agent.py: AI agent with autonomous page reading capability
cli.py: Command-line interface

See CLAUDE.md for detailed architecture documentation.

License

MIT

README.md Unescape Escape

pdf-to-kcf

Features

Installation

Setup

Usage

Options

Available Models

Output Format

How It Works

Requirements

Architecture

License

README.md