working with xfa forms #129

vision10 · 2025-12-01T15:25:37Z

What?

First attempt at adding support for preserving XFA (XML Forms) forms and extracting/modifying JavaScript embedded in XFA templates.

Why?

pdf-lib strips by default XFA data when loading and saving PDFs, causing these forms to lose all functionality. Additionally, there was no way to programmatically access or modify the JavaScript code embedded in XFA templates

How?

**Preserv XFA forms **
- Added preserveXFA option to PDFDocument.load() and PDFDocument.save()
- When enabled, preserves the entire XFA array structure from the AcroForm dictionary
- Prevents XFA data loss during PDF modification
XFA JavaScript Extraction (getXFAJavaScripts())
- New method that extracts all JavaScript from XFA template XML
- Parses compressed PDF streams and XML structure
- Returns array of {field: string, event: string, script: string} objects
- Handles XFA's non-standard XML formatting (newlines in closing tags like </script\n>)
- Uses backward search to determine field and event context for each script
XFA JavaScript Modification (setXFAJavaScript(field, event, script))
- New method to modify specific scripts by field name and event name
- Finds matching field/event in XML, replaces script content
- Creates new compressed stream with modified XML
- Returns boolean indicating success/failure
- Preserves XFA structure and all other scripts

Technical details:

XFA data is stored as alternating name/stream pairs in a PDFArray
Template section contains the JavaScript in XML <script> elements
Implemented special regex pattern to handle XFA's malformed XML (<\/script\s*> instead of <\/script>)
Added PDFRef dereferencing for XFA array lookup
Uses decodePDFRawStream to handle compressed streams

Testing?

Unit Tests (7 tests in PDFDocumentXFA.spec.ts)
- ✅ Extract XFA JavaScript from template (29 scripts from test PDF)
- ✅ Returns empty array for non-XFA PDFs
- ✅ Can modify XFA JavaScript
- ✅ Returns false when modifying non-existent field
- ✅ Preserves XFA structure after modification
- ✅ Can save and reload PDF with modified XFA JavaScript
- ✅ Extracts scripts from multiple events on same field
- Uses assets/pdfs/with_xfa_fields.pdf (included in repo)
Integration Testing
- Tested with my own complex pdf and the ready made one from the tests
- Save/reload cycle preserves all modifications

New Dependencies?

No new production dependencies. The implementation uses existing dependencies:

pako (already in dependencies) - for stream compression/decompression
All XFA functionality built using existing pdf-lib core modules

Screenshots

Anything Else?

Documentation updates

Sorin-nightz added 2 commits December 1, 2025 14:46

xfa forms

d34b0ea

xfa forms

d723509

github-actions bot added the needs-triage label Dec 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

working with xfa forms #129

working with xfa forms #129

Uh oh!

vision10 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

working with xfa forms #129

Are you sure you want to change the base?

working with xfa forms #129

Uh oh!

Conversation

vision10 commented Dec 1, 2025

What?

Why?

How?

Testing?

New Dependencies?

Screenshots

Suggested Reading?

Anything Else?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants