Skip to content

Conversation

@vision10
Copy link

@vision10 vision10 commented Dec 1, 2025

What?

First attempt at adding support for preserving XFA (XML Forms) forms and extracting/modifying JavaScript embedded in XFA templates.

Why?

pdf-lib strips by default XFA data when loading and saving PDFs, causing these forms to lose all functionality. Additionally, there was no way to programmatically access or modify the JavaScript code embedded in XFA templates

How?

  1. **Preserv XFA forms **

    • Added preserveXFA option to PDFDocument.load() and PDFDocument.save()
    • When enabled, preserves the entire XFA array structure from the AcroForm dictionary
    • Prevents XFA data loss during PDF modification
  2. XFA JavaScript Extraction (getXFAJavaScripts())

    • New method that extracts all JavaScript from XFA template XML
    • Parses compressed PDF streams and XML structure
    • Returns array of {field: string, event: string, script: string} objects
    • Handles XFA's non-standard XML formatting (newlines in closing tags like </script\n>)
    • Uses backward search to determine field and event context for each script
  3. XFA JavaScript Modification (setXFAJavaScript(field, event, script))

    • New method to modify specific scripts by field name and event name
    • Finds matching field/event in XML, replaces script content
    • Creates new compressed stream with modified XML
    • Returns boolean indicating success/failure
    • Preserves XFA structure and all other scripts

Technical details:

  • XFA data is stored as alternating name/stream pairs in a PDFArray
  • Template section contains the JavaScript in XML <script> elements
  • Implemented special regex pattern to handle XFA's malformed XML (<\/script\s*> instead of <\/script>)
  • Added PDFRef dereferencing for XFA array lookup
  • Uses decodePDFRawStream to handle compressed streams

Testing?

  1. Unit Tests (7 tests in PDFDocumentXFA.spec.ts)

    • ✅ Extract XFA JavaScript from template (29 scripts from test PDF)
    • ✅ Returns empty array for non-XFA PDFs
    • ✅ Can modify XFA JavaScript
    • ✅ Returns false when modifying non-existent field
    • ✅ Preserves XFA structure after modification
    • ✅ Can save and reload PDF with modified XFA JavaScript
    • ✅ Extracts scripts from multiple events on same field
    • Uses assets/pdfs/with_xfa_fields.pdf (included in repo)
  2. Integration Testing

    • Tested with my own complex pdf and the ready made one from the tests
    • Save/reload cycle preserves all modifications

New Dependencies?

No new production dependencies. The implementation uses existing dependencies:

  • pako (already in dependencies) - for stream compression/decompression
  • All XFA functionality built using existing pdf-lib core modules

Screenshots

Suggested Reading?

  • PDF 1.7 Specification Section 12.7.8 (Interactive Forms - XFA)
  • Adobe XFA Specification 3.3
  • AcroForm dictionary structure (Section 12.7.2)
  • Stream encoding/compression (Section 7.3)

Anything Else?

Documentation updates

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants