- Add Telegram answer learning flow (poller + applier safety net) - Add AI filtering, job scoring, cross-track dedup - Add browser crash recovery, fuzzy select matching, shadow DOM details - Update file structure with all new modules - Update job statuses (no_modal, stuck, filtered, duplicate) - Update scheduling info (OpenClaw crons, not crontab/PM2) - Update roadmap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
387 lines
15 KiB
Markdown
387 lines
15 KiB
Markdown
# claw-apply — Technical Spec
|
||
|
||
Automated job search and application engine. Searches LinkedIn and Wellfound for matching roles, AI-filters and scores them, applies automatically using Playwright + Kernel stealth browsers, and self-learns from unknown questions via Telegram.
|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
### Four agents, shared queue
|
||
|
||
**JobSearcher** (`job_searcher.mjs`)
|
||
- Runs on schedule (default: every 12 hours)
|
||
- Searches configured platforms with configured keywords
|
||
- LinkedIn: paginates through up to 40 pages of results
|
||
- Wellfound: infinite-scrolls up to 10 times to load all results
|
||
- Classifies each job: Easy Apply, external ATS (with platform detection), recruiter-only
|
||
- Filters out excluded roles/companies
|
||
- Deduplicates by job ID and URL against existing queue
|
||
- Cross-track duplicate IDs get composite IDs (`{id}_{track}`)
|
||
- Writes new jobs to `jobs_queue.json` with status `new`
|
||
- Sends Telegram summary
|
||
|
||
**JobFilter** (`job_filter.mjs`)
|
||
- Runs on schedule (default: every hour at :30)
|
||
- Two-phase: submit batch → collect results (designed for cron)
|
||
- Submits unscored jobs to Claude AI via Anthropic Batch API (50% cost savings)
|
||
- One batch per search track for prompt caching efficiency
|
||
- Scores each job 1-10 based on match to profile and search track
|
||
- Jobs below threshold (default 5) marked `filtered`
|
||
- Cross-track deduplication: groups by URL, keeps highest score
|
||
- State persisted in `data/filter_state.json` between phases
|
||
|
||
**JobApplier** (`job_applier.mjs`)
|
||
- Runs on schedule (disabled by default until ready)
|
||
- Processes Telegram replies at start (safety net for answer learning)
|
||
- Reloads `answers.json` before each job (picks up mid-run Telegram replies)
|
||
- Reads queue for status `new` + `needs_answer`, sorted by priority
|
||
- Respects `max_applications_per_run` cap and `enabled_apply_types` filter
|
||
- Groups jobs by platform to share browser sessions
|
||
- LinkedIn Easy Apply: multi-step modal with shadow DOM handling
|
||
- Wellfound: form fill and submit
|
||
- On unknown required fields: generates AI answer, messages user via Telegram, marks `needs_answer`
|
||
- Browser crash recovery: detects dead page, creates fresh browser session
|
||
- Per-job timeout: 10 minutes. Overall run timeout: 45 minutes
|
||
- On error: retries up to `max_retries` (default 2) before marking `failed`
|
||
- Sends summary with granular skip reasons
|
||
|
||
**TelegramPoller** (`telegram_poller.mjs`)
|
||
- Runs every minute via OpenClaw cron
|
||
- Polls Telegram `getUpdates` for replies to question messages
|
||
- Matches replies via `reply_to_message_id` stored on jobs
|
||
- "ACCEPT" → use AI-suggested answer. Anything else → use reply text
|
||
- Saves answer to `answers.json` (reused for ALL future jobs)
|
||
- Flips job back to `new` for retry
|
||
- Sends confirmation reply on Telegram
|
||
- Lightweight: single HTTP call, exits immediately if no updates
|
||
|
||
**Preview mode** (`--preview`): shows queued jobs without applying.
|
||
|
||
### Shared modules
|
||
|
||
| Module | Responsibility |
|
||
|--------|---------------|
|
||
| `lib/constants.mjs` | All timeouts, selectors, defaults — no magic numbers in code |
|
||
| `lib/browser.mjs` | Browser factory — Kernel stealth (default) with local Playwright fallback |
|
||
| `lib/session.mjs` | Kernel Managed Auth session refresh |
|
||
| `lib/env.mjs` | .env loader (no dotenv dependency) |
|
||
| `lib/form_filler.mjs` | Form filling — custom answers, built-in profile matching, fuzzy select matching |
|
||
| `lib/ai_answer.mjs` | AI answer generation via Claude (profile + resume context) |
|
||
| `lib/filter.mjs` | AI job scoring via Anthropic Batch API |
|
||
| `lib/keywords.mjs` | AI-generated search keywords via Claude |
|
||
| `lib/queue.mjs` | Queue CRUD with in-memory caching, atomic writes, config validation |
|
||
| `lib/notify.mjs` | Telegram Bot API — send, getUpdates, reply (with rate limiting) |
|
||
| `lib/telegram_answers.mjs` | Telegram reply processing — matches to jobs, saves answers |
|
||
| `lib/search_progress.mjs` | Per-platform search resume tracking |
|
||
| `lib/lock.mjs` | PID-based lockfile with graceful shutdown |
|
||
| `lib/apply/index.mjs` | Apply handler registry with status normalization |
|
||
| `lib/apply/easy_apply.mjs` | LinkedIn Easy Apply — shadow DOM, multi-step modal, post-submit detection |
|
||
|
||
---
|
||
|
||
## LinkedIn Easy Apply — Technical Details
|
||
|
||
LinkedIn renders the Easy Apply modal inside **shadow DOM**. This means:
|
||
- `document.querySelector()` inside `page.evaluate()` **cannot** find modal elements
|
||
- `page.$()` and ElementHandle methods **pierce** shadow DOM and work correctly
|
||
- All modal operations use ElementHandle-based operations, never `evaluate` with `document.querySelector`
|
||
|
||
### Button detection
|
||
|
||
`findModalButton()` uses three strategies in order:
|
||
1. CSS selector via `page.$()` — aria-label exact match (pierces shadow DOM)
|
||
2. CSS selector via `page.$()` — aria-label substring match
|
||
3. `modal.$$('button')` + `btn.evaluate()` — text content matching
|
||
|
||
Check order per step: **Next → Review → Submit** (submit only when no forward nav exists).
|
||
|
||
### Modal flow
|
||
|
||
```
|
||
Easy Apply click → [fill fields → Next] × N → Review → Submit application
|
||
```
|
||
|
||
- Progress tracked via `<progress>` element (not `[role="progressbar"]`)
|
||
- Stuck detection: re-reads progress value after clicking Next, triggers after 3 unchanged clicks
|
||
- Submit verification: `waitForSelector(state: 'detached', timeout: 8s)` — event-driven, not fixed sleep
|
||
- Post-submit: checks for success text, absent Submit button, or validation errors
|
||
- Multiple `[role="dialog"]` elements: `findApplyModal()` identifies the apply modal and tags it with `data-claw-apply-modal`
|
||
|
||
### Form filling
|
||
|
||
- Labels found by walking up ancestor DOM (LinkedIn doesn't use `label[for="id"]`)
|
||
- Label deduplication for doubled text (e.g. "Phone country codePhone country code")
|
||
- Resume selection: selects first radio if none checked, falls back to file upload
|
||
- Select matching: `selectOptionFuzzy()` — exact → case-insensitive → substring → value
|
||
- Phone always overwritten (LinkedIn pre-fills wrong numbers)
|
||
- EEO/voluntary fields auto-select "Prefer not to disclose"
|
||
- Honeypot detection: questions containing "digit code", "secret word", etc.
|
||
|
||
### Dismiss flow
|
||
|
||
Always discards — never leaves drafts:
|
||
1. Click Dismiss/Close button or press Escape
|
||
2. Wait for Discard confirmation dialog
|
||
3. Click Discard (by `data-test-dialog-primary-btn` or text scan scoped to dialogs)
|
||
|
||
---
|
||
|
||
## Config files
|
||
|
||
All user config is gitignored. Example templates are committed.
|
||
|
||
### `profile.json`
|
||
|
||
```json
|
||
{
|
||
"name": { "first": "Jane", "last": "Smith" },
|
||
"email": "jane@example.com",
|
||
"phone": "555-123-4567",
|
||
"location": {
|
||
"city": "San Francisco",
|
||
"state": "California",
|
||
"zip": "94102",
|
||
"country": "United States"
|
||
},
|
||
"linkedin_url": "https://linkedin.com/in/janesmith",
|
||
"resume_path": "/home/user/resume.pdf",
|
||
"years_experience": 7,
|
||
"work_authorization": {
|
||
"authorized": true,
|
||
"requires_sponsorship": false
|
||
},
|
||
"willing_to_relocate": false,
|
||
"desired_salary": 150000,
|
||
"cover_letter": "Your cover letter text here."
|
||
}
|
||
```
|
||
|
||
### `search_config.json`
|
||
|
||
```json
|
||
{
|
||
"first_run_days": 90,
|
||
"searches": [
|
||
{
|
||
"name": "Founding GTM",
|
||
"track": "gtm",
|
||
"keywords": ["founding account executive", "first sales hire"],
|
||
"platforms": ["linkedin", "wellfound"],
|
||
"filters": {
|
||
"remote": true,
|
||
"posted_within_days": 2,
|
||
"easy_apply_only": false
|
||
},
|
||
"exclude_keywords": ["BDR", "SDR", "staffing", "insurance"]
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### `settings.json`
|
||
|
||
```json
|
||
{
|
||
"max_applications_per_run": 50,
|
||
"max_retries": 2,
|
||
"enabled_apply_types": ["easy_apply"],
|
||
"notifications": {
|
||
"telegram_user_id": "YOUR_TELEGRAM_USER_ID",
|
||
"bot_token": "YOUR_TELEGRAM_BOT_TOKEN"
|
||
},
|
||
"kernel": {
|
||
"proxy_id": "YOUR_KERNEL_PROXY_ID",
|
||
"profiles": {
|
||
"linkedin": "LinkedIn-YourName",
|
||
"wellfound": "WellFound-YourName"
|
||
},
|
||
"connection_ids": {
|
||
"linkedin": "YOUR_LINKEDIN_CONNECTION_ID",
|
||
"wellfound": "YOUR_WELLFOUND_CONNECTION_ID"
|
||
}
|
||
},
|
||
"browser": {
|
||
"provider": "kernel",
|
||
"playwright_path": null
|
||
}
|
||
}
|
||
```
|
||
|
||
### `answers.json`
|
||
|
||
Flat array of pattern-answer pairs. Patterns are matched case-insensitively and support regex. First match wins.
|
||
|
||
```json
|
||
[
|
||
{ "pattern": "quota attainment", "answer": "1.12" },
|
||
{ "pattern": "years.*enterprise", "answer": "5" },
|
||
{ "pattern": "1.*10.*scale", "answer": "9" }
|
||
]
|
||
```
|
||
|
||
---
|
||
|
||
## Data files (auto-managed)
|
||
|
||
### `jobs_queue.json`
|
||
|
||
```json
|
||
[
|
||
{
|
||
"id": "li_4381658809",
|
||
"platform": "linkedin",
|
||
"track": "ae",
|
||
"apply_type": "easy_apply",
|
||
"title": "Senior Account Executive",
|
||
"company": "Acme Corp",
|
||
"url": "https://linkedin.com/jobs/view/4381658809/",
|
||
"found_at": "2026-03-05T22:00:00Z",
|
||
"status": "new",
|
||
"status_updated_at": "2026-03-05T22:00:00Z",
|
||
"retry_count": 0,
|
||
"pending_question": null,
|
||
"ai_suggested_answer": null,
|
||
"telegram_message_id": null,
|
||
"applied_at": null,
|
||
"notes": null
|
||
}
|
||
]
|
||
```
|
||
|
||
### Job statuses
|
||
|
||
| Status | Meaning | Next action |
|
||
|--------|---------|-------------|
|
||
| `new` | Found, waiting to apply | Applier picks it up |
|
||
| `applied` | Successfully submitted | Done |
|
||
| `needs_answer` | Blocked on unknown question | Telegram poller saves reply, flips to `new` |
|
||
| `failed` | Failed after max retries | Manual review |
|
||
| `already_applied` | Duplicate detected | Permanent skip |
|
||
| `filtered` | Below AI score threshold | Permanent skip |
|
||
| `duplicate` | Cross-track duplicate (lower score) | Permanent skip |
|
||
| `skipped_honeypot` | Honeypot question detected | Permanent skip |
|
||
| `skipped_recruiter_only` | LinkedIn recruiter-only | Permanent skip |
|
||
| `skipped_external_unsupported` | External ATS | Saved for future ATS support |
|
||
| `skipped_easy_apply_unsupported` | No Easy Apply button | Permanent skip |
|
||
| `skipped_no_apply` | No apply button found | Permanent skip |
|
||
| `no_modal` | Button found but modal didn't open | Retried |
|
||
| `stuck` | Modal progress stalled | Retried |
|
||
| `incomplete` | Modal didn't reach submit | Retried |
|
||
|
||
### `applications_log.json`
|
||
|
||
Append-only history of every application attempt with outcome, timestamps, and metadata.
|
||
|
||
### `telegram_offset.json`
|
||
|
||
Stores the Telegram `getUpdates` offset to avoid reprocessing old messages.
|
||
|
||
### `filter_state.json`
|
||
|
||
Persists batch IDs between filter submit and collect phases.
|
||
|
||
---
|
||
|
||
## Self-learning answer flow
|
||
|
||
1. Applier encounters a required field with no matching answer
|
||
2. Claude generates a suggested answer using profile + resume context
|
||
3. Telegram message sent: question text, options (if select), AI suggestion
|
||
4. Job marked `needs_answer` with `telegram_message_id` stored
|
||
5. User replies on Telegram: their answer, or "ACCEPT" for the AI suggestion
|
||
6. Telegram poller (every minute) picks up the reply:
|
||
- Matches via `reply_to_message_id` → job
|
||
- Saves answer to `answers.json` as pattern match
|
||
- Flips job status back to `new`
|
||
- Sends confirmation reply
|
||
7. Next applier run: reloads answers, retries the job, fills the field automatically
|
||
8. All future jobs with the same question pattern are answered automatically
|
||
|
||
Safety net: applier also calls `processTelegramReplies()` at start of each run.
|
||
|
||
---
|
||
|
||
## Retry logic
|
||
|
||
When an application fails due to a transient error (timeout, network issue, page didn't load):
|
||
|
||
1. `retry_count` is incremented on the job
|
||
2. Job status is reset to `new` so the next run picks it up
|
||
3. After `max_retries` (default 2) failures, job is marked `failed` permanently
|
||
4. Failed jobs are logged to `applications_log.json` with error details
|
||
|
||
Browser crash recovery: after an error, the applier checks if the page is still alive via `page.evaluate(() => true)`. If dead, it creates a fresh browser session and continues with the remaining jobs.
|
||
|
||
---
|
||
|
||
## File structure
|
||
|
||
```
|
||
claw-apply/
|
||
├── README.md Documentation
|
||
├── SKILL.md OpenClaw skill manifest
|
||
├── SPEC.md This file
|
||
├── claw.json OpenClaw skill metadata
|
||
├── package.json npm manifest
|
||
├── job_searcher.mjs Search agent
|
||
├── job_filter.mjs AI filter + scoring agent
|
||
├── job_applier.mjs Apply agent
|
||
├── telegram_poller.mjs Telegram answer reply processor
|
||
├── setup.mjs Setup wizard
|
||
├── status.mjs Queue + run status report
|
||
├── lib/
|
||
│ ├── constants.mjs Shared constants and defaults
|
||
│ ├── browser.mjs Kernel/Playwright browser factory
|
||
│ ├── session.mjs Kernel Managed Auth session refresh
|
||
│ ├── env.mjs .env loader
|
||
│ ├── form_filler.mjs Form filling with fuzzy select matching
|
||
│ ├── ai_answer.mjs AI answer generation via Claude
|
||
│ ├── filter.mjs AI job scoring via Anthropic Batch API
|
||
│ ├── keywords.mjs AI-generated search keywords
|
||
│ ├── linkedin.mjs LinkedIn search + job classification
|
||
│ ├── wellfound.mjs Wellfound search + apply
|
||
│ ├── queue.mjs Queue management with atomic writes
|
||
│ ├── lock.mjs PID lockfile + graceful shutdown
|
||
│ ├── notify.mjs Telegram Bot API (send, getUpdates, reply)
|
||
│ ├── telegram_answers.mjs Telegram reply → answers.json processing
|
||
│ ├── search_progress.mjs Per-platform search resume tracking
|
||
│ └── apply/
|
||
│ ├── index.mjs Handler registry + status normalization
|
||
│ ├── easy_apply.mjs LinkedIn Easy Apply (shadow DOM, multi-step)
|
||
│ ├── wellfound.mjs Wellfound apply
|
||
│ ├── greenhouse.mjs Greenhouse ATS (stub)
|
||
│ ├── lever.mjs Lever ATS (stub)
|
||
│ ├── workday.mjs Workday ATS (stub)
|
||
│ ├── ashby.mjs Ashby ATS (stub)
|
||
│ └── jobvite.mjs Jobvite ATS (stub)
|
||
├── config/
|
||
│ ├── *.example.json Templates (committed)
|
||
│ └── *.json User config (gitignored)
|
||
└── data/ Runtime data (gitignored, auto-managed)
|
||
```
|
||
|
||
---
|
||
|
||
## Roadmap
|
||
|
||
### v1 (current)
|
||
- [x] LinkedIn Easy Apply (multi-step modal, shadow DOM)
|
||
- [x] Wellfound apply (infinite scroll)
|
||
- [x] Kernel stealth browsers + residential proxy
|
||
- [x] AI job filtering via Anthropic Batch API
|
||
- [x] Self-learning answer bank with Telegram Q&A loop
|
||
- [x] AI-suggested answers via Claude
|
||
- [x] Telegram answer polling (instant save + applier safety net)
|
||
- [x] Browser crash recovery
|
||
- [x] Retry logic with configurable max retries
|
||
- [x] Preview mode (`--preview`)
|
||
- [x] Configurable application caps and retry limits
|
||
- [x] Constants extracted — no magic numbers in code
|
||
- [x] Atomic file writes for queue corruption prevention
|
||
- [x] Cross-track deduplication after AI scoring
|
||
|
||
### v2 (planned)
|
||
- [ ] External ATS support (Greenhouse, Lever, Workday, Ashby, Jobvite)
|
||
- [ ] Per-job cover letter generation via LLM
|
||
- [ ] Indeed support
|