002-Jobsdb change detect workdesk (detailed implementation note)

Aug 10, 2024

Repository

1. Block HLD (High level design)

source

Trigger source

  • trigger from page change
  • trigger from routine check

Process / Extract

  • regular check, if new entry found, do scraping
  • get which post should be scraped
  • store the result to db

Process / AI / QnA

  • regular check, if new entry found, do asking deepseek
  • deepseek automator to handle the Q and A between script and deepseek

flow_reporting

  • regular check, if new entry found, do asking deepseek
  • job ad mismatch reason analysis

pocketbase served as a pool of "idea" exchange, it simply store the result from the job ad and then store it for next process

2. Flow HLD (High level design)

source

Event source:

  • regular check, if new entry found, execute the flow

Process / Extract:

  • if new entry found, send the sanitized message(markdown) to deepseek (SendJobAdDetail)
  • let deepseek summarize the job ad (AskDeepseekForCompanyBackgroundInformation)
  • send louis background to deepseek (TellDeepseekForCandiateBackground)
  • combine and consolidate the background of candidate and job (ConsolateInformationsAltogether)
  • try to decide if they match (DB field: ds_decision_YN)

Process / Draft or Analyze

  • if recommend, draft application letter (DraftApplicationLetter)
  • if not recommend, slightly find the reason (UpdateToPendingState)

3. Flow Reporting

source

Process / Analyze

  • send greetings, state scenario (SendGuidelines)
  • review not recommended reason (QnAReviewNotRecommendReason)

Load

  • send louis background to deepseek to review why not recommended (UpdateDsDecisionReview)

4. Schema/DBML (Simplified / Minified, Conceptual only)

[
  {
    "001_fetch_job": {
      "url_to_fetch": "url to fetch, usually a jobsdb link",
      "done": "enum <FETCH | ERROR>",
      "error": "{error result in json}",
      "output": "{fetch output in json}",
      "updated": "record update time",
      "created": "record created time"
    },
    "002_job_list": {
      "advertiserName_HTML": "company name outerHTML",
      "advertiserName_MD": "company name in markdown",
      "collectionId": "pbc_2009428707",
      "collectionName": "002_job_list",
      "created": "record create time",
      "debug": "{debug use}",
      "draft_application_letter_json": "{application letter draft in markdown format with meta}",
      "ds_decision": "result from deepseek",
      "ds_decision_YN": "YES | NO",
      "ds_decision_review": "{deepseek review why not recommended}",
      "ds_digest": "deepseek insight of the job post",
      "ds_digest_meta": "{deepseek insight of the job post json format meta}",
      "error": "{error collection}",
      "id": "rcx0dv0qi270vgf",
      "jobAdDetails_HTML": "job ad detail outerHTML",
      "jobAdDetails_MD": "job ad detail markdown format",
      "jobId": "84267209",
      "jobLink": "link to original post",
      "jobTitle": "title from original post",
      "job_detail_classifications": "classification form original post",
      "job_detail_location": "location from original post",
      "job_detail_title": "title from original post",
      "job_detail_work_type": "type from original post",
      "name_of_company": "name from original post",
      "status": "flow status",
      "summarize_of_job_ad": "string",
      "test": "",
      "updated": "record update time"
    }
  }
]
a simple "GET" call to the result table
code by louiscklawlouis portfolioMy github portfoliovercel (private)source code (private)