matthew/claw-apply - claw-apply - pq.io

matthew/claw-apply

Author	SHA1	Message	Date
Matthew Jackson	7e1bce924e	Remove separate job profile files — filter uses search config + profile.json Search config already defines what each track is looking for (keywords, exclude_keywords, salary_min, remote). Profile.json defines who the candidate is. No need for a third file duplicating both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 09:26:47 -08:00
Matthew Jackson	f14af48905	Route all file I/O through storage layer (S3 or disk) - filter.mjs: loadProfile now async, uses loadJSON - telegram_answers.mjs: answers read/write through storage layer - status.mjs: uses initQueue + loadQueue for S3 support - setup.mjs: await all loadConfig calls - storage.mjs: more robust getS3Key using URL parsing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-07 08:25:22 -08:00
Matthew Jackson	3ecabeea63	Fix: await loadConfig for settings.json (async function returns Promise) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 22:49:46 -08:00
Matthew Jackson	ba5cbedcf4	Make loadConfig async and route through storage layer (S3 or disk) - loadConfig now uses loadJSON when storage is initialized - Fix getS3Key to handle config/ and data/ paths (not just data/) - All loadConfig calls updated to await - settings.json still bootstraps from disk (needed to know storage type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 22:47:07 -08:00
Matthew Jackson	534d318953	Make S3 the primary storage layer (not backup) storage.mjs is now a single interface: loadJSON() and saveJSON() route to either local disk or S3 based on settings.storage.type. The app never touches disk/S3 directly. - All queue/log functions are now async (saveQueue, appendLog, etc.) - All callers updated with await - Data validation prevents saving corrupt types (strings, nulls) - S3 versioned bucket preserves every write - Config: storage.type = "local" (disk) or "s3" (S3 primary) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 22:03:16 -08:00
Matthew Jackson	253d1888e9	Add S3-backed storage to prevent data loss - New lib/storage.mjs: async S3 backup on every queue/log save - Versioned S3 bucket (claw-apply-data) keeps every revision - Auto-restore from S3 if local file is missing or corrupt - saveQueue/saveLog now validate data type before writing (prevents the exact bug that corrupted the queue) - IAM role attached to EC2 instance for credential-free S3 access - Config: storage.type = "local" (default) or "s3" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 21:56:37 -08:00
Matthew Jackson	73887318b2	Fix saveQueue() called without argument in job_filter.mjs Iterate over the full queue array instead of getJobsByStatus() results, and pass it to saveQueue(). The previous code passed no argument, which would corrupt or silently fail the save. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 19:50:27 -08:00
Matthew Jackson	a629291722	Auto-clear stale batch markers in filter before submitting When a batch completes but scores aren't written back (collection error), jobs get stuck with filter_batch_id set and never re-submitted. Now checks: if no filter_state.json exists (no batch in flight) but jobs have batch markers without scores, clear them so they get re-submitted on the next run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 16:41:08 -08:00
Matthew Jackson	87cfce8eca	Filter: stop spamming Telegram on submit, collect+submit in one run - Removed Telegram notification on batch submit (only notify on collect when results are ready) - After collecting, immediately submit remaining unscored jobs in the same run instead of waiting for next cron cycle Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:39:19 -08:00
Matthew Jackson	c99ea10585	Richer search and filter summaries Search: show per-track breakdown (found/added per track name) Filter: show top 5 scoring jobs with score, title, company and cost Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 13:34:21 -08:00
Matthew Jackson	4419363b3c	Fix process exit: use process.exit() directly instead of logStream.end callback logStream.end() callback wasn't firing reliably, leaving processes hanging. process.exit() is synchronous and forces exit regardless of open handles. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 12:21:55 -08:00
Matthew Jackson	d43e2025b2	Fix process not exiting after run, detect closed job listings - All entry points with log tee now call logStream.end() + process.exit() (log stream kept event loop alive, blocking next cron run) - easy_apply: detect "no longer accepting applications" and similar closed listing text before reporting as unsupported Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 12:19:00 -08:00
Matthew Jackson	51ca354c52	Audit fixes: remove dead code, fix run timeout bug, add log tee to all entry points - Remove unused APPLY_PRIORITY array (replaced by score-based sort) - Fix run timeout only breaking inner loop — now breaks outer platform loop too - Remove dead lastProgress variable in easy_apply step loop - Add stdout/stderr log tee to job_searcher, job_filter, telegram_poller Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 12:13:01 -08:00
Matthew Jackson	b1528ac0ad	refactor: extract magic numbers to constants, fix audit issues - Centralize all magic numbers/strings in lib/constants.mjs - Fix double-replaced import names in filter.mjs - Consolidate duplicate fs imports in job_applier/job_searcher - Remove empty JSDoc block in job_searcher - Update keywords.mjs model from claude-3-haiku to claude-haiku-4-5 - Extract Anthropic API URLs to constants - Convert :has-text() selectors to page.locator() API - Fix SIGTERM handler conflict — move partial-run notification into lock.onShutdown - Remove unused exports (LOCAL_USER_AGENT, DEFAULT_REVIEW_WINDOW_MINUTES) - Fix variable shadowing (b -> v) in job_filter reduce callback - Replace SKILL.md PM2 references with system cron Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 08:45:17 -08:00
Claw	37b95b6b85	feat: track token usage and estimated cost per filter run in filter_runs.json	2026-03-06 16:22:14 +00:00
Claw	c9b527c83a	feat: find-all → filter → dedup flow - addJobs: allows same job on multiple tracks (dedup key = track::id) - Cross-track copies get composite id (job.id_track) to avoid batch collisions - dedupeAfterFilter(): after collect, keeps highest-scored copy per URL, marks rest as 'duplicate' - Called automatically at end of collect phase	2026-03-06 15:55:00 +00:00
Claw	c88a71fc20	feat: one batch per track — separate GTM/AE batches with their own system prompts - submitBatch → submitBatches: groups jobs by track, submits one batch each - filter_state.json now stores batches[] array instead of single batch_id - Collect waits for all batches to finish before processing - Each track gets its own cached system prompt = better caching + cleaner scoring - Idempotent collect: skips already-scored jobs	2026-03-06 11:35:15 +00:00
Claw	85c88f9eed	fix: make filter idempotent - skip already-scored jobs on collect, exclude by filter_score on submit	2026-03-06 11:25:19 +00:00
Claw	56eb645e73	fix: import saveQueue statically instead of dynamic import; was causing queue writes to silently fail	2026-03-06 11:22:09 +00:00
Claw	64748d5889	fix: stamp filter_batch_id on submitted jobs; exclude already-submitted/filtered from resubmit - Submit phase now excludes jobs with filter_batch_id set (already in a batch) - After submitting, stamps each job with filter_batch_id = batchId - Filtered jobs already excluded by status='filtered' - Prevents duplicate submissions when batch errors cause state to clear without scores	2026-03-06 11:13:10 +00:00
Claw	85038b6ce1	fix: batch collect O(n²) → single queue write; correct model to claude-3-haiku-20240307 - updateJobStatus was called 4,652 times causing ~4,652 file reads/writes - Now loads queue once, applies all updates in memory, saves once - Model was using OpenClaw alias (sonnet-4-6) not native Anthropic ID - Only claude-3-haiku-20240307 is available on this API key; update settings.example.json	2026-03-06 10:56:54 +00:00
Claw	728e0773b9	fix: sanitize Unicode surrogates in job descriptions, handle custom_id > 64 chars	2026-03-06 10:18:54 +00:00
Claw	d610060dbb	feat: persistent run history logs for searcher and filter - search_runs.json: append-only history of every searcher run (started_at, finished, added, seen, platforms, lookback_days) - search_progress_last.json: snapshot of final progress state after each completed run — answers 'what keywords/tracks were searched?' - filter_runs.json: append-only history of every filter batch (batch_id, submitted/collected timestamps, model, passed/filtered/errors) Fixes the 'did the 90-day run complete?' ambiguity going forward	2026-03-06 10:16:06 +00:00
Claw	dbe9967713	feat: rewrite filter to use Anthropic Batch API - Batch API = 50% cost savings vs synchronous calls - Prompt caching on system prompt (profile + criteria shared across all jobs) - One request per job with custom_id = job ID for result matching - Two-phase state machine: submit → poll/collect (hourly cron safe) - filter_state.json tracks pending batch ID between runs - Model configurable via settings.filter.model (default: claude-sonnet-4-6) - Telegram notifications on submit + collect - Errors pass through — never block applications due to filter failure - --stats flag for queue overview	2026-03-06 10:12:47 +00:00
Claw	9bf904dada	feat: AI job filter — score jobs 0-10 with Claude Haiku before applying - lib/filter.mjs: batch scoring engine (10 jobs/call, Claude Haiku) - job_filter.mjs: standalone CLI with --dry-run and --stats flags - Threshold configurable globally + per-search in search_config.json (filter_min_score, default 5) - Job profiles (gtm/ae) passed as context via settings.filter.job_profiles - Filtered jobs get status='filtered' with filter_score + filter_reason - Filter errors pass jobs through (never block applications) - status.mjs: added 'AI filtered' line to report	2026-03-06 10:01:15 +00:00