- New lib/storage.mjs: async S3 backup on every queue/log save
- Versioned S3 bucket (claw-apply-data) keeps every revision
- Auto-restore from S3 if local file is missing or corrupt
- saveQueue/saveLog now validate data type before writing
(prevents the exact bug that corrupted the queue)
- IAM role attached to EC2 instance for credential-free S3 access
- Config: storage.type = "local" (default) or "s3"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Secondary processes (standalone classifier, ad-hoc scripts) now write
to queue_updates.jsonl via writePendingUpdate() instead of modifying
jobs_queue.json directly. Primary processes pick up updates on next
loadQueue() call using atomic rename-then-read.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- addJobs: allows same job on multiple tracks (dedup key = track::id)
- Cross-track copies get composite id (job.id_track) to avoid batch collisions
- dedupeAfterFilter(): after collect, keeps highest-scored copy per URL, marks rest as 'duplicate'
- Called automatically at end of collect phase
- Remove unused readFileSync import from job_applier.mjs
- Remove unused makeJobId (dead code, nothing imports it)
- setup.mjs: use shared loadConfig instead of inline cfg()
- queue.mjs: add in-memory cache for queue and log to avoid
redundant disk reads during a single run
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add loadConfig() helper with clear errors for missing/malformed JSON
- Replace raw JSON.parse(readFileSync(...)) in both entry points
- Track retry_count on jobs; re-queue as 'new' up to max_retries (default 2)
- Add max_retries and DEFAULT_MAX_RETRIES constant
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>