Clean up that MetaDataMess
Find a file Use this template
2025-02-19 21:51:34 +00:00
.gitignore Initial commit 2025-02-19 21:35:10 +00:00
LICENSE Initial commit 2025-02-19 21:35:10 +00:00
pdf_processor.py Debugging & Log Handling added 2025-02-19 21:49:18 +00:00
README.md README.md aktualisiert 2025-02-19 21:51:34 +00:00

pdf-mass-cleanuptools v5

Clean up that MetaDataMess

Needs:

  • pip install pdf2image anthropic
  • sudo apt-get install poppler-utils

before running: export ANTHROPIC_API_KEY='your-api-key-here'

Using the tool

Basic usage

python pdf_processor.py -i /path/to/pdfs -o /path/to/output

Test with a single file

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test

Process specific pattern of files

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"

Keep temporary files for inspection

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup