41 lines
No EOL
1.2 KiB
Markdown
41 lines
No EOL
1.2 KiB
Markdown
# pdf-mass-cleanuptools v6
|
|
|
|
Clean up that MetaDataMess
|
|
|
|
## Needs:
|
|
|
|
+ pip install pdf2image anthropic tqdm PyPDF2 rich
|
|
+ sudo apt-get install poppler-utils
|
|
|
|
before running: export ANTHROPIC_API_KEY='your-api-key-here'
|
|
|
|
## Using the main tool
|
|
|
|
### Basic usage
|
|
python pdf_processor.py -i /path/to/pdfs -o /path/to/output
|
|
|
|
### Test with a single file
|
|
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test
|
|
|
|
### Process specific pattern of files
|
|
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"
|
|
|
|
### Keep temporary files for inspection
|
|
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup
|
|
|
|
### With MetaData
|
|
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata
|
|
|
|
### With MetaData - and skip Backups if you dare
|
|
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata --no-backup
|
|
|
|
## Reviewing the metadata
|
|
|
|
### Just review and save changes to new JSON file
|
|
python metadata_reviewer.py results/processing_results.json
|
|
|
|
### Review and write changes back to PDFs
|
|
python metadata_reviewer.py results/processing_results.json --write
|
|
|
|
### Enable debug logging
|
|
python metadata_reviewer.py results/processing_results.json --debug |