pdf-mass-cleanuptools/README.md

41 lines
No EOL
1.2 KiB
Markdown

# pdf-mass-cleanuptools v6
Clean up that MetaDataMess
## Needs:
+ pip install pdf2image anthropic tqdm PyPDF2 rich
+ sudo apt-get install poppler-utils
before running: export ANTHROPIC_API_KEY='your-api-key-here'
## Using the main tool
### Basic usage
python pdf_processor.py -i /path/to/pdfs -o /path/to/output
### Test with a single file
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --test
### Process specific pattern of files
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --pattern "magazine_*.pdf"
### Keep temporary files for inspection
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --no-cleanup
### With MetaData
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata
### With MetaData - and skip Backups if you dare
python pdf_processor.py -i /path/to/pdfs -o /path/to/output --write-metadata --no-backup
## Reviewing the metadata
### Just review and save changes to new JSON file
python metadata_reviewer.py results/processing_results.json
### Review and write changes back to PDFs
python metadata_reviewer.py results/processing_results.json --write
### Enable debug logging
python metadata_reviewer.py results/processing_results.json --debug