Clean up that MetaDataMess
|
||
---|---|---|
.gitignore | ||
LICENSE | ||
metadata_writer.py | ||
pdf_processor.py | ||
README.md |
pdf-mass-cleanuptools v5
Clean up that MetaDataMess
Needs:
- pip install pdf2image anthropic tqdm
- sudo apt-get install poppler-utils
before running: export ANTHROPIC_API_KEY='your-api-key-here'
Using the tool
Basic usage
python pdf_processor.py -i /path/to/pdfs -o /path/to/output
Test with a single file
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test
Process specific pattern of files
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"
Keep temporary files for inspection
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup