Extract Images from PDF

Endpoint #

POST https://pixlab.davix.dev/v1/pdf

Action #

action=extract-images

Description #

The Extract Images from PDF action extracts embedded images from a PDF document using the H2I engine (PixLab).

This action scans the uploaded PDF and returns extracted images as separate output files. It is part of the public /v1/pdf API and is suitable for workflows that need to recover embedded image assets from PDF documents for reuse, inspection, or downstream processing. Generated outputs are returned as signed URLs under the public PDF output path.

Request Format #

Requests to /v1/pdf must use:

  • Content-Type: multipart/form-data
  • API key authentication in request headers
  • PDF upload through the files field

For non-merge PDF actions, the first uploaded PDF file is used as the primary input. This applies to action=extract-images.

Parameters #

action #

Type: string
Required: Yes
Accepted value: extract-images

Specifies that the request should extract embedded images from the uploaded PDF document.

files #

Type: file upload (multipart/form-data)
Required: Yes

The source PDF document.

  • uploaded through the files field
  • must be a valid PDF upload
  • for this action, the first uploaded PDF file is used as the source input

pages #

Type: string
Required: No
Default: all

Specifies which pages should be scanned for embedded images.

Supported forms:

  • all
  • first
  • single page such as 1
  • multiple pages such as 1,3,5
  • ranges such as 2-6

Notes:

  • page values use the same selector/parser behavior as to-images
  • values are parsed and clamped to the document page count

imageFormat #

Type: string
Required: No
Default: png

Defines the output format for extracted images.

Supported values:

  • png
  • jpeg
  • jpg
  • webp

The external example table notes that this field is passed internally as the target output format.

quality #

Type: integer-like
Required: No

Controls output quality for extracted image outputs when the selected format uses compression. The public /v1/pdf parameter table lists quality as supported for extract-images.

density #

Type: integer-like
Required: No

Controls rendering density for PDF page image extraction paths. The public /v1/pdf parameter table lists density as supported for extract-images.

Supported Parameters #

The Extract Images from PDF action supports the following public parameters:

ParameterDescription
actionMust be extract-images
filesSource PDF upload
pagesPage selector for extraction scope
imageFormatOutput image format
qualityOutput quality control
densityExtraction/render density control

These are the documented public parameters relevant to action=extract-images.

Full cURL Example #

curl -sS -X POST "https://pixlab.davix.dev/v1/pdf" \
-H "X-Api-Key: <YOUR_API_KEY>" \
-H "Idempotency-Key: pdf-extract-images-001" \
-F "action=extract-images" \
-F "files=@/path/to/document.pdf" \
-F "pages=all" \
-F "imageFormat=jpeg" \
-F "quality=90" \
-F "density=300"

This example includes the full documented public parameter surface for the Extract Images from PDF action:

  • action=extract-images
  • source PDF upload in files
  • pages
  • imageFormat
  • quality
  • density
  • optional idempotency header

The public docs accept both Idempotency-Key and X-Idempotency-Key.

Success Response #

Successful /v1/pdf requests return either a single output object or a results array, depending on the action. For action=extract-images, the public success pattern is a multi-output response because extracted images are returned as separate output files. Output URLs are signed and served from the public PDF output path.

Use this conservative public example:

{
"results": [
{
"url": "https://pixlab.davix.dev/pdf/image-1.jpeg"
},
{
"url": "https://pixlab.davix.dev/pdf/image-2.jpeg"
}
],
"request_id": "req_abc123"
}

Response Fields #

results[] #

Array of extracted image outputs. Each result represents one generated output file.

url #

Signed output URL for an extracted image file. PDF action output URLs are signed under /pdf/<file>.

request_id #

Request identifier returned by the API when available.

Errors #

The public /v1/pdf endpoint documents the following PDF-route errors:

  • missing_field
  • invalid_parameter
  • unsupported_media_type
  • pdf_page_limit_exceeded
  • rate_limit_exceeded
  • rate_limit_store_unavailable
  • monthly_quota_exceeded
  • server_busy
  • timeout
  • pdf_tool_failed

The shared upload/error layer can also return:

  • invalid_upload
  • file_too_large
  • too_many_files
  • total_upload_exceeded

HTTP Status Codes #

  • 400 → invalid request fields or parameters
  • 413 → upload size/count limits exceeded or PDF page limit exceeded
  • 415 → unsupported media type
  • 429 → rate limit or monthly quota exceeded
  • 503 → timeout, rate-limit store unavailable, or server busy
  • 500 → PDF processing failure

Usage Notes #

Idempotency-Key is optional and supported for retry-safe request handling.

This action is intended for extracting embedded image assets from PDF documents.

Each extracted image is returned as a separate output file.

pages can be used to limit extraction scope to selected document pages.

imageFormat controls the output format of extracted images.

Output files are returned through signed URLs. Applications that need long-term storage should store generated files externally rather than treating output URLs as permanent hosting.

Was it helpful ?
Scroll to Top