Clean up that MetaDataMess
Find a file Use this template
sebastian 09e0f8e39b V6
New instructions / functions
2025-02-19 21:57:03 +00:00
.gitignore Initial commit 2025-02-19 21:35:10 +00:00
LICENSE Initial commit 2025-02-19 21:35:10 +00:00
metadata_writer.py metadata_writer.py hinzugefügt 2025-02-19 21:53:01 +00:00
pdf_processor.py V6 - with metadata connector 2025-02-19 21:55:20 +00:00
README.md V6 2025-02-19 21:57:03 +00:00

pdf-mass-cleanuptools v6

Clean up that MetaDataMess

Needs:

  • pip install pdf2image anthropic tqdm PyPDF2
  • sudo apt-get install poppler-utils

before running: export ANTHROPIC_API_KEY='your-api-key-here'

Using the tool

Basic usage

python pdf_processor.py -i /path/to/pdfs -o /path/to/output

Test with a single file

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test

Process specific pattern of files

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"

Keep temporary files for inspection

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup

With MetaData

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata

With MetaData - and skip Backups if you dare

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata --no-backup