pdf-parse
    Preparing search index...

    Class PDFParse

    Loads PDF documents and exposes helpers for text, image, table, metadata, and screenshot extraction.

    Index

    Constructors

    Properties

    progress: { loaded: number; total: number } = ...

    Accessors

    Methods

    • Extract embedded images from requested pages.

      Behavior notes:

      • Pages are selected according to ParseParameters (partial, first, last).
      • Images smaller than params.imageThreshold (width OR height) are skipped.
      • Returned ImageResult contains per-page PageImages; each image entry includes:
        • data: Uint8Array (present when params.imageBuffer === true)
        • dataUrl: string (present when params.imageDataUrl === true)
        • width, height, kind, name
      • Works in both Node.js (canvas.toBuffer) and browser (canvas.toDataURL) environments.

      Parameters

      • params: ParseParameters = {}

        ParseParameters controlling page selection, thresholds and output format.

      Returns Promise<ImageResult>

      Promise with extracted images grouped by page.

    • Load document-level metadata (info, outline, permissions, page labels) and optionally gather per-page link details.

      Parameters

      • params: ParseParameters = {}

        Parse options; set parsePageInfo to collect per-page metadata described in ParseParameters.

      Returns Promise<InfoResult>

      Aggregated document metadata in an InfoResult.

    • Render pages to raster screenshots.

      Behavior notes:

      • Pages are selected according to ParseParameters (partial, first, last).
      • Use params.scale for zoom; if params.desiredWidth is specified it takes precedence.
      • Each ScreenshotResult page contains:
        • data: Uint8Array (when params.imageBuffer === true)
        • dataUrl: string (when params.imageDataUrl === true)
        • pageNumber, width, height, scale
      • Works in both Node.js (canvas.toBuffer) and browser (canvas.toDataURL) environments.

      Parameters

      • parseParams: ParseParameters = {}

        ParseParameters controlling page selection and render options.

      Returns Promise<ScreenshotResult>

      Promise with rendered page images.

    • Detect and extract tables from pages by analysing vector drawing operators, then populate cells with text.

      Behavior notes:

      • Scans operator lists for rectangles/lines that form table grids (uses PathGeometry and LineStore).
      • Normalizes detected geometry and matches positioned text to table cells.
      • Honors ParseParameters for page selection.

      Parameters

      • params: ParseParameters = {}

        ParseParameters controlling which pages to analyse (partial/first/last).

      Returns Promise<TableResult>

      Promise containing discovered tables per page.

    • Extract plain text for each requested page, optionally enriching hyperlinks and enforcing line or cell separators.

      Parameters

      • params: ParseParameters = {}

        Parse options controlling pagination, link handling, and line/cell thresholds.

      Returns Promise<TextResult>

      A TextResult containing page-wise text and a concatenated document string.