Clean up that MetaDataMess
.gitignore | ||
LICENSE | ||
metadata_reviewer.py | ||
metadata_writer.py | ||
pdf_processor.py | ||
README.md |
pdf-mass-cleanuptools v6
Clean up that MetaDataMess
Needs:
- pip install pdf2image anthropic tqdm PyPDF2 rich
- sudo apt-get install poppler-utils
before running: export ANTHROPIC_API_KEY='your-api-key-here'
Using the main tool
Basic usage
python pdf_processor.py -i /path/to/pdfs -o /path/to/output
Test with a single file
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test
Process specific pattern of files
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"
Keep temporary files for inspection
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup
With MetaData
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata
With MetaData - and skip Backups if you dare
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata --no-backup
Reviewing the metadata
Just review and save changes to new JSON file
python metadata_reviewer.py results/processing_results.json
Review and write changes back to PDFs
python metadata_reviewer.py results/processing_results.json --write
Enable debug logging
python metadata_reviewer.py results/processing_results.json --debug