Interface ParseParameters

ParseParameters Options to control parsing behavior and output formatting.

interface ParseParameters {
    cellSeparator?: string;
    cellThreshold?: number;
    desiredWidth?: number;
    disableNormalization?: boolean;
    first?: number;
    imageBuffer?: boolean;
    imageDataUrl?: boolean;
    imageThreshold?: number;
    includeMarkedContent?: boolean;
    itemJoiner?: string;
    last?: number;
    lineEnforce?: boolean;
    lineThreshold?: number;
    pageJoiner?: string;
    parseHyperlinks?: boolean;
    parsePageInfo?: boolean;
    partial?: number[];
    scale?: number;
}

Index

Properties

cellSeparator? cellThreshold? desiredWidth? disableNormalization? first? imageBuffer? imageDataUrl? imageThreshold? includeMarkedContent? itemJoiner? last? lineEnforce? lineThreshold? pageJoiner? parseHyperlinks? parsePageInfo? partial? scale?

Properties

`Optional`cellSeparator

cellSeparator?: string

String inserted between text items on the same line when a sufficiently large horizontal gap is detected. Typically used to emulate a cell/column separator (for example, "\t" for tabs). Default: '\t'.

`Optional`cellThreshold

cellThreshold?: number

Horizontal distance threshold to decide when two text items on the same baseline should be treated as separate cells. Larger value produces fewer (wider) cells; smaller value creates more cell breaks. Default: 7.

`Optional`desiredWidth

desiredWidth?: number

Desired screenshot width in pixels. When set, the scale option is ignored. Default: undefined.

`Optional`disableNormalization

disableNormalization?: boolean

When true, text normalization is NOT performed in the worker thread. For plain text extraction, normalizing in the worker (false) is usually recommended. Default: false.

`Optional`first

first?: number

Parse the first N pages (pages 1..N). Ignored when partial is provided. If both first and last are set, they define an explicit inclusive page range (first..last) and this "first N" semantics is ignored. Default: undefined.

`Optional`imageBuffer

imageBuffer?: boolean

Applies to both getImage() and getScreenshot(): include the image as a binary buffer. Default: true.

`Optional`imageDataUrl

imageDataUrl?: boolean

Applies to both getImage() and getScreenshot(): include the image as a base64 data URL string. Default: true.

`Optional`imageThreshold

imageThreshold?: number

Minimum image dimension (in pixels) for width or height. When set, images where width OR height are below or equal this value will be ignored by getImage(). Useful for excluding tiny decorative or tracking images. Default: 80. Disable: 0.

`Optional`includeMarkedContent

includeMarkedContent?: boolean

Include marked content items in the items array of TextContent to capture PDF "marked content". Enables tags (MCID, role/props) and structural/accessibility information useful for mapping text ↔ structure. For plain text extraction it's usually false (trade-off: larger output). Default: false.

`Optional`itemJoiner

itemJoiner?: string

Optional string used to join text items when returning a page's text. If provided, this value is used instead of the default empty-string joining behavior. Default: undefined.

`Optional`last

last?: number

Parse the last N pages (pages total-N+1..total). Ignored when partial is provided. If both first and last are set, they define an explicit inclusive page range (first..last) and this "last N" semantics is ignored. Default: undefined.

`Optional`lineEnforce

lineEnforce?: boolean

Enforce logical line breaks by inserting a newline when the vertical distance between text items exceeds lineThreshold. Useful to preserve paragraph/line structure when text items are emitted as separate segments. Default: true.

`Optional`lineThreshold

lineThreshold?: number

Threshold to decide whether nearby text items belong to different lines. Larger values make the parser more likely to start a new line between items. Default: 4.6.

`Optional`pageJoiner

pageJoiner?: string

Optional string appended at the end of each page's extracted text to mark page boundaries. Supports placeholders page_number and total_number which are substituted accordingly. If omitted or empty, no page boundary marker is added. Default: '\n-- page_number of total_number --'.

`Optional`parseHyperlinks

parseHyperlinks?: boolean

Attempt to detect and include hyperlink annotations (e.g. URLs) associated with text. Detected links are formatted as Markdown inline links (for example: text). Default: false.

`Optional`parsePageInfo

parsePageInfo?: boolean

Collect per-page metadata such as embedded links, title, pageLabel, and dimensions; ISBN, DOI, abstract, and references are work in progress when getInfo() is used. Default: false.

`Optional`partial

partial?: number[]

Array of page numbers to parse. When provided, only these pages will be parsed and returned in the same order. Example: [1, 3, 5]. Parse only one page: [7]. Default: undefined.

`Optional`scale

scale?: number

Screenshot scale factor: use 1 for the original size, 1.5 for a 50% larger image, etc. Default: 1.

Interface ParseParameters

Index

Properties

Properties

`Optional`cellSeparator

`Optional`cellThreshold

`Optional`desiredWidth

`Optional`disableNormalization

`Optional`first

`Optional`imageBuffer

`Optional`imageDataUrl

`Optional`imageThreshold

`Optional`includeMarkedContent

`Optional`itemJoiner

`Optional`last

`Optional`lineEnforce

`Optional`lineThreshold

`Optional`pageJoiner

`Optional`parseHyperlinks

`Optional`parsePageInfo

`Optional`partial

`Optional`scale

Settings

On This Page

Interface ParseParameters

Index

Properties

Properties

OptionalcellSeparator

OptionalcellThreshold

OptionaldesiredWidth

OptionaldisableNormalization

Optionalfirst

OptionalimageBuffer

OptionalimageDataUrl

OptionalimageThreshold

OptionalincludeMarkedContent

OptionalitemJoiner

Optionallast

OptionallineEnforce

OptionallineThreshold

OptionalpageJoiner

OptionalparseHyperlinks

OptionalparsePageInfo

Optionalpartial

Optionalscale

Settings

On This Page

`Optional`cellSeparator

`Optional`cellThreshold

`Optional`desiredWidth

`Optional`disableNormalization

`Optional`first

`Optional`imageBuffer

`Optional`imageDataUrl

`Optional`imageThreshold

`Optional`includeMarkedContent

`Optional`itemJoiner

`Optional`last

`Optional`lineEnforce

`Optional`lineThreshold

`Optional`pageJoiner

`Optional`parseHyperlinks

`Optional`parsePageInfo

`Optional`partial

`Optional`scale