storage.mjs is now a single interface: loadJSON() and saveJSON()
route to either local disk or S3 based on settings.storage.type.
The app never touches disk/S3 directly.
- All queue/log functions are now async (saveQueue, appendLog, etc.)
- All callers updated with await
- Data validation prevents saving corrupt types (strings, nulls)
- S3 versioned bucket preserves every write
- Config: storage.type = "local" (disk) or "s3" (S3 primary)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- New lib/storage.mjs: async S3 backup on every queue/log save
- Versioned S3 bucket (claw-apply-data) keeps every revision
- Auto-restore from S3 if local file is missing or corrupt
- saveQueue/saveLog now validate data type before writing
(prevents the exact bug that corrupted the queue)
- IAM role attached to EC2 instance for credential-free S3 access
- Config: storage.type = "local" (default) or "s3"
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Secondary processes (standalone classifier, ad-hoc scripts) now write
to queue_updates.jsonl via writePendingUpdate() instead of modifying
jobs_queue.json directly. Primary processes pick up updates on next
loadQueue() call using atomic rename-then-read.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- addJobs: allows same job on multiple tracks (dedup key = track::id)
- Cross-track copies get composite id (job.id_track) to avoid batch collisions
- dedupeAfterFilter(): after collect, keeps highest-scored copy per URL, marks rest as 'duplicate'
- Called automatically at end of collect phase
- Remove unused readFileSync import from job_applier.mjs
- Remove unused makeJobId (dead code, nothing imports it)
- setup.mjs: use shared loadConfig instead of inline cfg()
- queue.mjs: add in-memory cache for queue and log to avoid
redundant disk reads during a single run
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add loadConfig() helper with clear errors for missing/malformed JSON
- Replace raw JSON.parse(readFileSync(...)) in both entry points
- Track retry_count on jobs; re-queue as 'new' up to max_retries (default 2)
- Add max_retries and DEFAULT_MAX_RETRIES constant
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>