Update all docs: README, SKILL.md, SPEC.md for current architecture

- Add Telegram answer learning flow (poller + applier safety net) - Add AI filtering, job scoring, cross-track dedup - Add browser crash recovery, fuzzy select matching, shadow DOM details - Update file structure with all new modules - Update job statuses (no_modal, stuck, filtered, duplicate) - Update scheduling info (OpenClaw crons, not crontab/PM2) - Update roadmap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 11:42:52 -08:00
parent 0920554dad
commit 0695d61954
3 changed files with 339 additions and 128 deletions
--- a/SPEC.md
+++ b/SPEC.md
@@ -1,34 +1,60 @@
 # claw-apply — Technical Spec

-Automated job search and application engine. Searches LinkedIn and Wellfound for matching roles, applies automatically using Playwright + Kernel stealth browsers, and self-learns from unknown questions.
+Automated job search and application engine. Searches LinkedIn and Wellfound for matching roles, AI-filters and scores them, applies automatically using Playwright + Kernel stealth browsers, and self-learns from unknown questions via Telegram.

 ---

 ## Architecture

-### Two agents, shared queue
+### Four agents, shared queue

 **JobSearcher** (`job_searcher.mjs`)
- Runs on schedule (default: hourly)
+- Runs on schedule (default: every 12 hours)
 - Searches configured platforms with configured keywords
 - LinkedIn: paginates through up to 40 pages of results
 - Wellfound: infinite-scrolls up to 10 times to load all results
+- Classifies each job: Easy Apply, external ATS (with platform detection), recruiter-only
 - Filters out excluded roles/companies
 - Deduplicates by job ID and URL against existing queue
+- Cross-track duplicate IDs get composite IDs (`{id}_{track}`)
 - Writes new jobs to `jobs_queue.json` with status `new`
 - Sends Telegram summary

+**JobFilter** (`job_filter.mjs`)
+- Runs on schedule (default: every hour at :30)
+- Two-phase: submit batch → collect results (designed for cron)
+- Submits unscored jobs to Claude AI via Anthropic Batch API (50% cost savings)
+- One batch per search track for prompt caching efficiency
+- Scores each job 1-10 based on match to profile and search track
+- Jobs below threshold (default 5) marked `filtered`
+- Cross-track deduplication: groups by URL, keeps highest score
+- State persisted in `data/filter_state.json` between phases
+
 **JobApplier** (`job_applier.mjs`)
- Runs on schedule (default: every 6 hours)
- Reads queue for status `new` + `needs_answer`
- Respects `max_applications_per_run` cap
- LinkedIn: navigates directly to job URL, detects apply type (Easy Apply / external / recruiter-only), fills multi-step modal
- Wellfound: navigates to job, fills form, submits
- Detects honeypot questions and skips
- On unknown required fields: messages user via Telegram, marks `needs_answer`
+- Runs on schedule (disabled by default until ready)
+- Processes Telegram replies at start (safety net for answer learning)
+- Reloads `answers.json` before each job (picks up mid-run Telegram replies)
+- Reads queue for status `new` + `needs_answer`, sorted by priority
+- Respects `max_applications_per_run` cap and `enabled_apply_types` filter
+- Groups jobs by platform to share browser sessions
+- LinkedIn Easy Apply: multi-step modal with shadow DOM handling
+- Wellfound: form fill and submit
+- On unknown required fields: generates AI answer, messages user via Telegram, marks `needs_answer`
+- Browser crash recovery: detects dead page, creates fresh browser session
+- Per-job timeout: 10 minutes. Overall run timeout: 45 minutes
 - On error: retries up to `max_retries` (default 2) before marking `failed`
 - Sends summary with granular skip reasons

+**TelegramPoller** (`telegram_poller.mjs`)
+- Runs every minute via OpenClaw cron
+- Polls Telegram `getUpdates` for replies to question messages
+- Matches replies via `reply_to_message_id` stored on jobs
+- "ACCEPT" → use AI-suggested answer. Anything else → use reply text
+- Saves answer to `answers.json` (reused for ALL future jobs)
+- Flips job back to `new` for retry
+- Sends confirmation reply on Telegram
+- Lightweight: single HTTP call, exits immediately if no updates
+
 **Preview mode** (`--preview`): shows queued jobs without applying.

 ### Shared modules
@@ -37,11 +63,66 @@ Automated job search and application engine. Searches LinkedIn and Wellfound for
 |--------|---------------|
 | `lib/constants.mjs` | All timeouts, selectors, defaults — no magic numbers in code |
 | `lib/browser.mjs` | Browser factory — Kernel stealth (default) with local Playwright fallback |
-| `lib/form_filler.mjs` | Generic form filling — custom answers first, then built-in profile matching |
-| `lib/queue.mjs` | Queue CRUD with in-memory caching, config file validation |
-| `lib/notify.mjs` | Telegram Bot API with rate limiting (1.5s between sends) |
-| `lib/linkedin.mjs` | LinkedIn search (paginated) + Easy Apply (multi-step modal) |
-| `lib/wellfound.mjs` | Wellfound search (infinite scroll) + apply |
+| `lib/session.mjs` | Kernel Managed Auth session refresh |
+| `lib/env.mjs` | .env loader (no dotenv dependency) |
+| `lib/form_filler.mjs` | Form filling — custom answers, built-in profile matching, fuzzy select matching |
+| `lib/ai_answer.mjs` | AI answer generation via Claude (profile + resume context) |
+| `lib/filter.mjs` | AI job scoring via Anthropic Batch API |
+| `lib/keywords.mjs` | AI-generated search keywords via Claude |
+| `lib/queue.mjs` | Queue CRUD with in-memory caching, atomic writes, config validation |
+| `lib/notify.mjs` | Telegram Bot API — send, getUpdates, reply (with rate limiting) |
+| `lib/telegram_answers.mjs` | Telegram reply processing — matches to jobs, saves answers |
+| `lib/search_progress.mjs` | Per-platform search resume tracking |
+| `lib/lock.mjs` | PID-based lockfile with graceful shutdown |
+| `lib/apply/index.mjs` | Apply handler registry with status normalization |
+| `lib/apply/easy_apply.mjs` | LinkedIn Easy Apply — shadow DOM, multi-step modal, post-submit detection |
+
+---
+
+## LinkedIn Easy Apply — Technical Details
+
+LinkedIn renders the Easy Apply modal inside **shadow DOM**. This means:
+- `document.querySelector()` inside `page.evaluate()` **cannot** find modal elements
+- `page.$()` and ElementHandle methods **pierce** shadow DOM and work correctly
+- All modal operations use ElementHandle-based operations, never `evaluate` with `document.querySelector`
+
+### Button detection
+
+`findModalButton()` uses three strategies in order:
+1. CSS selector via `page.$()` — aria-label exact match (pierces shadow DOM)
+2. CSS selector via `page.$()` — aria-label substring match
+3. `modal.$$('button')` + `btn.evaluate()` — text content matching
+
+Check order per step: **Next → Review → Submit** (submit only when no forward nav exists).
+
+### Modal flow
+
+```
+Easy Apply click → [fill fields → Next] × N → Review → Submit application
+```
+
+- Progress tracked via `<progress>` element (not `[role="progressbar"]`)
+- Stuck detection: re-reads progress value after clicking Next, triggers after 3 unchanged clicks
+- Submit verification: `waitForSelector(state: 'detached', timeout: 8s)` — event-driven, not fixed sleep
+- Post-submit: checks for success text, absent Submit button, or validation errors
+- Multiple `[role="dialog"]` elements: `findApplyModal()` identifies the apply modal and tags it with `data-claw-apply-modal`
+
+### Form filling
+
+- Labels found by walking up ancestor DOM (LinkedIn doesn't use `label[for="id"]`)
+- Label deduplication for doubled text (e.g. "Phone country codePhone country code")
+- Resume selection: selects first radio if none checked, falls back to file upload
+- Select matching: `selectOptionFuzzy()` — exact → case-insensitive → substring → value
+- Phone always overwritten (LinkedIn pre-fills wrong numbers)
+- EEO/voluntary fields auto-select "Prefer not to disclose"
+- Honeypot detection: questions containing "digit code", "secret word", etc.
+
+### Dismiss flow
+
+Always discards — never leaves drafts:
+1. Click Dismiss/Close button or press Escape
+2. Wait for Discard confirmation dialog
+3. Click Discard (by `data-test-dialog-primary-btn` or text scan scoped to dialogs)

 ---

@@ -103,6 +184,7 @@ All user config is gitignored. Example templates are committed.
 {
  "max_applications_per_run": 50,
  "max_retries": 2,
+  "enabled_apply_types": ["easy_apply"],
  "notifications": {
    "telegram_user_id": "YOUR_TELEGRAM_USER_ID",
    "bot_token": "YOUR_TELEGRAM_BOT_TOKEN"
@@ -112,6 +194,10 @@ All user config is gitignored. Example templates are committed.
    "profiles": {
      "linkedin": "LinkedIn-YourName",
      "wellfound": "WellFound-YourName"
+    },
+    "connection_ids": {
+      "linkedin": "YOUR_LINKEDIN_CONNECTION_ID",
+      "wellfound": "YOUR_WELLFOUND_CONNECTION_ID"
    }
  },
  "browser": {
@@ -145,6 +231,7 @@ Flat array of pattern-answer pairs. Patterns are matched case-insensitively and
    "id": "li_4381658809",
    "platform": "linkedin",
    "track": "ae",
+    "apply_type": "easy_apply",
    "title": "Senior Account Executive",
    "company": "Acme Corp",
    "url": "https://linkedin.com/jobs/view/4381658809/",
@@ -153,6 +240,8 @@ Flat array of pattern-answer pairs. Patterns are matched case-insensitively and
    "status_updated_at": "2026-03-05T22:00:00Z",
    "retry_count": 0,
    "pending_question": null,
+    "ai_suggested_answer": null,
+    "telegram_message_id": null,
    "applied_at": null,
    "notes": null
  }
@@ -165,28 +254,50 @@ Flat array of pattern-answer pairs. Patterns are matched case-insensitively and
 |--------|---------|-------------|
 | `new` | Found, waiting to apply | Applier picks it up |
 | `applied` | Successfully submitted | Done |
-| `needs_answer` | Blocked on unknown question | Applier retries after user answers |
+| `needs_answer` | Blocked on unknown question | Telegram poller saves reply, flips to `new` |
 | `failed` | Failed after max retries | Manual review |
-| `skipped` | Honeypot detected | Permanent skip |
+| `already_applied` | Duplicate detected | Permanent skip |
+| `filtered` | Below AI score threshold | Permanent skip |
+| `duplicate` | Cross-track duplicate (lower score) | Permanent skip |
+| `skipped_honeypot` | Honeypot question detected | Permanent skip |
 | `skipped_recruiter_only` | LinkedIn recruiter-only | Permanent skip |
 | `skipped_external_unsupported` | External ATS | Saved for future ATS support |
 | `skipped_easy_apply_unsupported` | No Easy Apply button | Permanent skip |
+| `skipped_no_apply` | No apply button found | Permanent skip |
+| `no_modal` | Button found but modal didn't open | Retried |
+| `stuck` | Modal progress stalled | Retried |
+| `incomplete` | Modal didn't reach submit | Retried |

 ### `applications_log.json`

 Append-only history of every application attempt with outcome, timestamps, and metadata.

+### `telegram_offset.json`
+
+Stores the Telegram `getUpdates` offset to avoid reprocessing old messages.
+
+### `filter_state.json`
+
+Persists batch IDs between filter submit and collect phases.
+
 ---

-## Unknown question flow
+## Self-learning answer flow

 1. Applier encounters a required field with no matching answer
-2. Marks job as `needs_answer`, stores question in `pending_question`
-3. Sends Telegram: "Applying to Senior AE @ Acme Corp — question: 'What was your quota attainment?' — what should I answer?"
-4. Moves on to next job
-5. User replies with answer
-6. Answer saved to `answers.json` as pattern match
-7. Next applier run retries all `needs_answer` jobs
+2. Claude generates a suggested answer using profile + resume context
+3. Telegram message sent: question text, options (if select), AI suggestion
+4. Job marked `needs_answer` with `telegram_message_id` stored
+5. User replies on Telegram: their answer, or "ACCEPT" for the AI suggestion
+6. Telegram poller (every minute) picks up the reply:
+   - Matches via `reply_to_message_id` → job
+   - Saves answer to `answers.json` as pattern match
+   - Flips job status back to `new`
+   - Sends confirmation reply
+7. Next applier run: reloads answers, retries the job, fills the field automatically
+8. All future jobs with the same question pattern are answered automatically
+
+Safety net: applier also calls `processTelegramReplies()` at start of each run.

 ---

@@ -199,6 +310,8 @@ When an application fails due to a transient error (timeout, network issue, page
 3. After `max_retries` (default 2) failures, job is marked `failed` permanently
 4. Failed jobs are logged to `applications_log.json` with error details

+Browser crash recovery: after an error, the applier checks if the page is still alive via `page.evaluate(() => true)`. If dead, it creates a fresh browser session and continues with the remaining jobs.
+
 ---

 ## File structure
@@ -208,23 +321,43 @@ claw-apply/
 ├── README.md                  Documentation
 ├── SKILL.md                   OpenClaw skill manifest
 ├── SPEC.md                    This file
+├── claw.json                  OpenClaw skill metadata
+├── package.json               npm manifest
 ├── job_searcher.mjs           Search agent
+├── job_filter.mjs             AI filter + scoring agent
 ├── job_applier.mjs            Apply agent
+├── telegram_poller.mjs        Telegram answer reply processor
 ├── setup.mjs                  Setup wizard
+├── status.mjs                 Queue + run status report
 ├── lib/
 │   ├── constants.mjs          Shared constants and defaults
 │   ├── browser.mjs            Kernel/Playwright browser factory
-│   ├── form_filler.mjs        Form filling with pattern matching
-│   ├── linkedin.mjs           LinkedIn search + Easy Apply
+│   ├── session.mjs            Kernel Managed Auth session refresh
+│   ├── env.mjs                .env loader
+│   ├── form_filler.mjs        Form filling with fuzzy select matching
+│   ├── ai_answer.mjs          AI answer generation via Claude
+│   ├── filter.mjs             AI job scoring via Anthropic Batch API
+│   ├── keywords.mjs           AI-generated search keywords
+│   ├── linkedin.mjs           LinkedIn search + job classification
 │   ├── wellfound.mjs          Wellfound search + apply
-│   ├── queue.mjs              Queue management + config validation
-│   └── notify.mjs             Telegram notifications + rate limiting
+│   ├── queue.mjs              Queue management with atomic writes
+│   ├── lock.mjs               PID lockfile + graceful shutdown
+│   ├── notify.mjs             Telegram Bot API (send, getUpdates, reply)
+│   ├── telegram_answers.mjs   Telegram reply → answers.json processing
+│   ├── search_progress.mjs    Per-platform search resume tracking
+│   └── apply/
+│       ├── index.mjs          Handler registry + status normalization
+│       ├── easy_apply.mjs     LinkedIn Easy Apply (shadow DOM, multi-step)
+│       ├── wellfound.mjs      Wellfound apply
+│       ├── greenhouse.mjs     Greenhouse ATS (stub)
+│       ├── lever.mjs          Lever ATS (stub)
+│       ├── workday.mjs        Workday ATS (stub)
+│       ├── ashby.mjs          Ashby ATS (stub)
+│       └── jobvite.mjs        Jobvite ATS (stub)
 ├── config/
 │   ├── *.example.json         Templates (committed)
 │   └── *.json                 User config (gitignored)
-└── data/
-    ├── jobs_queue.json         Job queue (auto-managed)
-    └── applications_log.json   Application history (auto-managed)
+└── data/                      Runtime data (gitignored, auto-managed)
 ```

 ---
@@ -232,20 +365,22 @@ claw-apply/
 ## Roadmap

 ### v1 (current)
- [x] LinkedIn Easy Apply (multi-step modal, pagination)
+- [x] LinkedIn Easy Apply (multi-step modal, shadow DOM)
 - [x] Wellfound apply (infinite scroll)
 - [x] Kernel stealth browsers + residential proxy
- [x] Self-learning answer bank with regex patterns
+- [x] AI job filtering via Anthropic Batch API
+- [x] Self-learning answer bank with Telegram Q&A loop
+- [x] AI-suggested answers via Claude
+- [x] Telegram answer polling (instant save + applier safety net)
+- [x] Browser crash recovery
 - [x] Retry logic with configurable max retries
 - [x] Preview mode (`--preview`)
- [x] Configurable application caps
- [x] Telegram notifications with rate limiting
- [x] Config validation with clear error messages
- [x] In-memory queue caching for performance
+- [x] Configurable application caps and retry limits
 - [x] Constants extracted — no magic numbers in code
+- [x] Atomic file writes for queue corruption prevention
+- [x] Cross-track deduplication after AI scoring

 ### v2 (planned)
- [ ] Indeed support
- [ ] External ATS support (Greenhouse, Lever)
- [ ] Job scoring and ranking
+- [ ] External ATS support (Greenhouse, Lever, Workday, Ashby, Jobvite)
 - [ ] Per-job cover letter generation via LLM
+- [ ] Indeed support