Clean up that MetaDataMess
Find a file Use this template
2025-02-19 22:00:01 +00:00
.gitignore Initial commit 2025-02-19 21:35:10 +00:00
LICENSE Initial commit 2025-02-19 21:35:10 +00:00
metadata_reviewer.py metadata_reviewer.py hinzugefügt 2025-02-19 21:58:18 +00:00
metadata_writer.py metadata_writer.py hinzugefügt 2025-02-19 21:53:01 +00:00
pdf_processor.py V6 - with metadata connector 2025-02-19 21:55:20 +00:00
README.md tiny md glitch corrected 2025-02-19 22:00:01 +00:00

pdf-mass-cleanuptools v6

Clean up that MetaDataMess

Needs:

  • pip install pdf2image anthropic tqdm PyPDF2 rich
  • sudo apt-get install poppler-utils

before running: export ANTHROPIC_API_KEY='your-api-key-here'

Using the main tool

Basic usage

python pdf_processor.py -i /path/to/pdfs -o /path/to/output

Test with a single file

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test

Process specific pattern of files

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"

Keep temporary files for inspection

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup

With MetaData

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata

With MetaData - and skip Backups if you dare

python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata --no-backup

Reviewing the metadata

Just review and save changes to new JSON file

python metadata_reviewer.py results/processing_results.json

Review and write changes back to PDFs

python metadata_reviewer.py results/processing_results.json --write

Enable debug logging

python metadata_reviewer.py results/processing_results.json --debug