Commit Graph

25 Commits

Author SHA1 Message Date
7e1bce924e Remove separate job profile files — filter uses search config + profile.json
Search config already defines what each track is looking for (keywords,
exclude_keywords, salary_min, remote). Profile.json defines who the
candidate is. No need for a third file duplicating both.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 09:26:47 -08:00
f14af48905 Route all file I/O through storage layer (S3 or disk)
- filter.mjs: loadProfile now async, uses loadJSON
- telegram_answers.mjs: answers read/write through storage layer
- status.mjs: uses initQueue + loadQueue for S3 support
- setup.mjs: await all loadConfig calls
- storage.mjs: more robust getS3Key using URL parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 08:25:22 -08:00
3ecabeea63 Fix: await loadConfig for settings.json (async function returns Promise)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:49:46 -08:00
ba5cbedcf4 Make loadConfig async and route through storage layer (S3 or disk)
- loadConfig now uses loadJSON when storage is initialized
- Fix getS3Key to handle config/ and data/ paths (not just data/)
- All loadConfig calls updated to await
- settings.json still bootstraps from disk (needed to know storage type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:47:07 -08:00
534d318953 Make S3 the primary storage layer (not backup)
storage.mjs is now a single interface: loadJSON() and saveJSON()
route to either local disk or S3 based on settings.storage.type.
The app never touches disk/S3 directly.

- All queue/log functions are now async (saveQueue, appendLog, etc.)
- All callers updated with await
- Data validation prevents saving corrupt types (strings, nulls)
- S3 versioned bucket preserves every write
- Config: storage.type = "local" (disk) or "s3" (S3 primary)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 22:03:16 -08:00
253d1888e9 Add S3-backed storage to prevent data loss
- New lib/storage.mjs: async S3 backup on every queue/log save
- Versioned S3 bucket (claw-apply-data) keeps every revision
- Auto-restore from S3 if local file is missing or corrupt
- saveQueue/saveLog now validate data type before writing
  (prevents the exact bug that corrupted the queue)
- IAM role attached to EC2 instance for credential-free S3 access
- Config: storage.type = "local" (default) or "s3"

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 21:56:37 -08:00
73887318b2 Fix saveQueue() called without argument in job_filter.mjs
Iterate over the full queue array instead of getJobsByStatus() results,
and pass it to saveQueue(). The previous code passed no argument, which
would corrupt or silently fail the save.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 19:50:27 -08:00
a629291722 Auto-clear stale batch markers in filter before submitting
When a batch completes but scores aren't written back (collection
error), jobs get stuck with filter_batch_id set and never re-submitted.
Now checks: if no filter_state.json exists (no batch in flight) but
jobs have batch markers without scores, clear them so they get
re-submitted on the next run.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 16:41:08 -08:00
87cfce8eca Filter: stop spamming Telegram on submit, collect+submit in one run
- Removed Telegram notification on batch submit (only notify on collect
  when results are ready)
- After collecting, immediately submit remaining unscored jobs in the
  same run instead of waiting for next cron cycle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:39:19 -08:00
c99ea10585 Richer search and filter summaries
Search: show per-track breakdown (found/added per track name)
Filter: show top 5 scoring jobs with score, title, company and cost

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 13:34:21 -08:00
4419363b3c Fix process exit: use process.exit() directly instead of logStream.end callback
logStream.end() callback wasn't firing reliably, leaving processes hanging.
process.exit() is synchronous and forces exit regardless of open handles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 12:21:55 -08:00
d43e2025b2 Fix process not exiting after run, detect closed job listings
- All entry points with log tee now call logStream.end() + process.exit()
  (log stream kept event loop alive, blocking next cron run)
- easy_apply: detect "no longer accepting applications" and similar closed
  listing text before reporting as unsupported

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 12:19:00 -08:00
51ca354c52 Audit fixes: remove dead code, fix run timeout bug, add log tee to all entry points
- Remove unused APPLY_PRIORITY array (replaced by score-based sort)
- Fix run timeout only breaking inner loop — now breaks outer platform loop too
- Remove dead lastProgress variable in easy_apply step loop
- Add stdout/stderr log tee to job_searcher, job_filter, telegram_poller

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 12:13:01 -08:00
b1528ac0ad refactor: extract magic numbers to constants, fix audit issues
- Centralize all magic numbers/strings in lib/constants.mjs
- Fix double-replaced import names in filter.mjs
- Consolidate duplicate fs imports in job_applier/job_searcher
- Remove empty JSDoc block in job_searcher
- Update keywords.mjs model from claude-3-haiku to claude-haiku-4-5
- Extract Anthropic API URLs to constants
- Convert :has-text() selectors to page.locator() API
- Fix SIGTERM handler conflict — move partial-run notification into lock.onShutdown
- Remove unused exports (LOCAL_USER_AGENT, DEFAULT_REVIEW_WINDOW_MINUTES)
- Fix variable shadowing (b -> v) in job_filter reduce callback
- Replace SKILL.md PM2 references with system cron

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 08:45:17 -08:00
37b95b6b85 feat: track token usage and estimated cost per filter run in filter_runs.json 2026-03-06 16:22:14 +00:00
c9b527c83a feat: find-all → filter → dedup flow
- addJobs: allows same job on multiple tracks (dedup key = track::id)
- Cross-track copies get composite id (job.id_track) to avoid batch collisions
- dedupeAfterFilter(): after collect, keeps highest-scored copy per URL, marks rest as 'duplicate'
- Called automatically at end of collect phase
2026-03-06 15:55:00 +00:00
c88a71fc20 feat: one batch per track — separate GTM/AE batches with their own system prompts
- submitBatch → submitBatches: groups jobs by track, submits one batch each
- filter_state.json now stores batches[] array instead of single batch_id
- Collect waits for all batches to finish before processing
- Each track gets its own cached system prompt = better caching + cleaner scoring
- Idempotent collect: skips already-scored jobs
2026-03-06 11:35:15 +00:00
85c88f9eed fix: make filter idempotent - skip already-scored jobs on collect, exclude by filter_score on submit 2026-03-06 11:25:19 +00:00
56eb645e73 fix: import saveQueue statically instead of dynamic import; was causing queue writes to silently fail 2026-03-06 11:22:09 +00:00
64748d5889 fix: stamp filter_batch_id on submitted jobs; exclude already-submitted/filtered from resubmit
- Submit phase now excludes jobs with filter_batch_id set (already in a batch)
- After submitting, stamps each job with filter_batch_id = batchId
- Filtered jobs already excluded by status='filtered'
- Prevents duplicate submissions when batch errors cause state to clear without scores
2026-03-06 11:13:10 +00:00
85038b6ce1 fix: batch collect O(n²) → single queue write; correct model to claude-3-haiku-20240307
- updateJobStatus was called 4,652 times causing ~4,652 file reads/writes
- Now loads queue once, applies all updates in memory, saves once
- Model was using OpenClaw alias (sonnet-4-6) not native Anthropic ID
- Only claude-3-haiku-20240307 is available on this API key; update settings.example.json
2026-03-06 10:56:54 +00:00
728e0773b9 fix: sanitize Unicode surrogates in job descriptions, handle custom_id > 64 chars 2026-03-06 10:18:54 +00:00
d610060dbb feat: persistent run history logs for searcher and filter
- search_runs.json: append-only history of every searcher run
  (started_at, finished, added, seen, platforms, lookback_days)
- search_progress_last.json: snapshot of final progress state after
  each completed run — answers 'what keywords/tracks were searched?'
- filter_runs.json: append-only history of every filter batch
  (batch_id, submitted/collected timestamps, model, passed/filtered/errors)
Fixes the 'did the 90-day run complete?' ambiguity going forward
2026-03-06 10:16:06 +00:00
dbe9967713 feat: rewrite filter to use Anthropic Batch API
- Batch API = 50% cost savings vs synchronous calls
- Prompt caching on system prompt (profile + criteria shared across all jobs)
- One request per job with custom_id = job ID for result matching
- Two-phase state machine: submit → poll/collect (hourly cron safe)
- filter_state.json tracks pending batch ID between runs
- Model configurable via settings.filter.model (default: claude-sonnet-4-6)
- Telegram notifications on submit + collect
- Errors pass through — never block applications due to filter failure
- --stats flag for queue overview
2026-03-06 10:12:47 +00:00
9bf904dada feat: AI job filter — score jobs 0-10 with Claude Haiku before applying
- lib/filter.mjs: batch scoring engine (10 jobs/call, Claude Haiku)
- job_filter.mjs: standalone CLI with --dry-run and --stats flags
- Threshold configurable globally + per-search in search_config.json (filter_min_score, default 5)
- Job profiles (gtm/ae) passed as context via settings.filter.job_profiles
- Filtered jobs get status='filtered' with filter_score + filter_reason
- Filter errors pass jobs through (never block applications)
- status.mjs: added 'AI filtered' line to report
2026-03-06 10:01:15 +00:00