Resume PDF Parser (Node.js + TypeScript)

A simple Node.js project that extracts structured resume information from a PDF file and converts it into a clean JSON format.

This parser reads a resume PDF, groups text logically (lines → sections), and outputs structured data like:

Profile details
Education
Work experience
Skills
Projects (if present)

Features

Parse resume data from PDF
Extract structured fields:
- Name, email, links, summary
- Education history
- Work experience
- Skills
Outputs clean resume.json
Built with TypeScript
Uses pdfjs-dist for PDF text extraction

Installation

npm install

Build

Compile TypeScript to JavaScript:

npm run build

Run

npm start

Run this project after build only to get output

This will:

Read sample_resume.pdf
Extract resume data
Generate resume.json

Output example:

{
  "profile": {
    "name": "John Doe",
    "email": "john@example.com",
    "url": "linkedin.com/in/johndoe",
    "summary": "Software Engineer with experience in..."
  },
  "educations": [],
  "workExperiences": [],
  "skills": {
    "descriptions": ["JavaScript", "Node.js", "Cloud"]
  }
}

How It Works

readPdf – extracts raw text items from PDF
groupTextItemsIntoLines – converts text items into readable lines
groupLinesIntoSections – detects resume sections (Education, Experience, Skills)
extractResumeFromSections – builds structured resume JSON

License

MIT

Resume Parser