Extract Images from PDF

Endpoint #

POST https://pixlab.davix.dev/v1/pdf

Action #

action=extract-images

Description #

The Extract Images from PDF action extracts embedded images from a PDF document using the H2I engine (PixLab).

This action scans the uploaded PDF and returns extracted images as separate output files. It is part of the public /v1/pdf API and is suitable for workflows that need to recover embedded image assets from PDF documents for reuse, inspection, or downstream processing. Generated outputs are returned as signed URLs under the public PDF output path.

Request Format #

Requests to /v1/pdf must use:

Content-Type: multipart/form-data
API key authentication in request headers
PDF upload through the files field

For non-merge PDF actions, the first uploaded PDF file is used as the primary input. This applies to action=extract-images.

Parameters #

action #

Type: string
Required: Yes
Accepted value: extract-images

Specifies that the request should extract embedded images from the uploaded PDF document.

files #

Type: file upload (multipart/form-data)
Required: Yes

The source PDF document.

uploaded through the files field
must be a valid PDF upload
for this action, the first uploaded PDF file is used as the source input

pages #

Type: string
Required: No
Default: all

Specifies which pages should be scanned for embedded images.

Supported forms:

all
first
single page such as 1
multiple pages such as 1,3,5
ranges such as 2-6

Notes:

page values use the same selector/parser behavior as to-images
values are parsed and clamped to the document page count

imageFormat #

Type: string
Required: No
Default: png

Defines the output format for extracted images.

Supported values:

png
jpeg
jpg
webp

The external example table notes that this field is passed internally as the target output format.

quality #

Type: integer-like
Required: No

Controls output quality for extracted image outputs when the selected format uses compression. The public /v1/pdf parameter table lists quality as supported for extract-images.

density #

Type: integer-like
Required: No

Controls rendering density for PDF page image extraction paths. The public /v1/pdf parameter table lists density as supported for extract-images.

Supported Parameters #

The Extract Images from PDF action supports the following public parameters:

Parameter	Description
`action`	Must be `extract-images`
`files`	Source PDF upload
`pages`	Page selector for extraction scope
`imageFormat`	Output image format
`quality`	Output quality control
`density`	Extraction/render density control

These are the documented public parameters relevant to action=extract-images.

Full cURL Example #

curl -sS -X POST "https://pixlab.davix.dev/v1/pdf" \
  -H "X-Api-Key: <YOUR_API_KEY>" \
  -H "Idempotency-Key: pdf-extract-images-001" \
  -F "action=extract-images" \
  -F "files=@/path/to/document.pdf" \
  -F "pages=all" \
  -F "imageFormat=jpeg" \
  -F "quality=90" \
  -F "density=300"

This example includes the full documented public parameter surface for the Extract Images from PDF action:

action=extract-images
source PDF upload in files
pages
imageFormat
quality
density
optional idempotency header

The public docs accept both Idempotency-Key and X-Idempotency-Key.

Success Response #

Successful /v1/pdf requests return either a single output object or a results array, depending on the action. For action=extract-images, the public success pattern is a multi-output response because extracted images are returned as separate output files. Output URLs are signed and served from the public PDF output path.

Use this conservative public example:

{
  "results": [
    {
      "url": "https://pixlab.davix.dev/pdf/image-1.jpeg"
    },
    {
      "url": "https://pixlab.davix.dev/pdf/image-2.jpeg"
    }
  ],
  "request_id": "req_abc123"
}

Response Fields #

results[] #

Array of extracted image outputs. Each result represents one generated output file.

url #

Signed output URL for an extracted image file. PDF action output URLs are signed under /pdf/<file>.

request_id #

Request identifier returned by the API when available.

Errors #

The public /v1/pdf endpoint documents the following PDF-route errors:

missing_field
invalid_parameter
unsupported_media_type
pdf_page_limit_exceeded
rate_limit_exceeded
rate_limit_store_unavailable
monthly_quota_exceeded
server_busy
timeout
pdf_tool_failed

The shared upload/error layer can also return:

invalid_upload
file_too_large
too_many_files
total_upload_exceeded

HTTP Status Codes #

400 → invalid request fields or parameters
413 → upload size/count limits exceeded or PDF page limit exceeded
415 → unsupported media type
429 → rate limit or monthly quota exceeded
503 → timeout, rate-limit store unavailable, or server busy
500 → PDF processing failure

Usage Notes #

Idempotency-Key is optional and supported for retry-safe request handling.

This action is intended for extracting embedded image assets from PDF documents.

Each extracted image is returned as a separate output file.

pages can be used to limit extraction scope to selected document pages.

imageFormat controls the output format of extracted images.

Output files are returned through signed URLs. Applications that need long-term storage should store generated files externally rather than treating output URLs as permanent hosting.

Still stuck? How can we help?

Overview

Quickstart

Core Concepts

API Reference

Code Examples

Guides

Integration

Recipes

Errors and Limits

Changelog

About

Extract Images from PDF

Endpoint #

Action #

Description #

Request Format #

Parameters #

action #

files #

pages #

imageFormat #

quality #

density #

Supported Parameters #

Full cURL Example #

Success Response #

Response Fields #

results[] #

url #

request_id #

Errors #

HTTP Status Codes #

Usage Notes #

Was it helpful ?

Extract Images from PDF

Endpoint #

Action #

Description #

Request Format #

Parameters #

action #

files #

pages #

imageFormat #

quality #

density #

Supported Parameters #

Full cURL Example #

Success Response #

Response Fields #

results[] #

url #

request_id #

Errors #

HTTP Status Codes #

Usage Notes #

How can we help?

Was it helpful ?

Share This Article :