Incremental Update Implementation (cleaner) #111

adnsistemas · 2025-06-03T21:27:45Z

What?

Ability to make incremental updates to PDFs

Why?

Required for proper electronic signing of documents.

How?

Making an snapshot of the original PDF and marking changed Refs that need to be saved in the "increment"

Testing?

All tests and Node app.

New Dependencies?

No

Screenshots

N/A

Anything Else?

Had to make a couple of small changes to package.json in order to be able to build, test, and commit the changes.

Checklist

…-scribe-master

SamiSaves · 2025-07-01T07:37:33Z

@Sharcoux Hey, is this PR in a clean enough state to be merged or worked on (the previous pr was #110) ? I am also interested in this as we need it for PAdes long term validation support.

I could probably try and help with this to make it merge ready

benjinus · 2025-07-01T13:30:07Z

I think this is a very important feature update, everyone needs this👏

Sharcoux · 2025-07-01T14:45:46Z

Ok, I'll try to have a look. It's quite complex. If one of you want to have a second look at the change, that would be great.

SamiSaves · 2025-07-02T04:33:00Z

I'll see if I have time to check the pr later this week and I also should be able to test it out and validate that it works for our scenario too 👍

SamiSaves · 2025-07-04T06:07:32Z

Here is some reading related to this PR:

PDF spec 7.5.6 Incremental Updates (pdf page 52)
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf#page=52

ETSI has a document "Part 1: Building blocks and PAdES baseline signatures" in which part 5.4 mentions incremental updates
https://www.etsi.org/deliver/etsi_en/319100_319199/31914201/01.02.01_60/en_31914201v010201p.pdf#page=10

SamiSaves · 2025-07-07T12:49:13Z

I was able to create an incrementally updated pdf with the branch:

async function addIncrementalUpdate(signedPdfBuffer: Buffer, update: string): Promise<Buffer> {
  // Load for incremental update
  const pdfDoc = await PDFDocument.load(signedPdfBuffer, {    forIncrementalUpdate: true  })

  // Add the update string as a text annotation to the first page to demonstrate any update
  const pages = pdfDoc.getPages()
  if (pages.length > 0) {
    pages[0].drawText(update, { x: 50, y: 700, size: 12 })
  }

  // Save incrementally (creates a new revision at the end)
  const newPdf = await pdfDoc.save({ useObjectStreams: false })
  return Buffer.from(newPdf)
}

I checked that increment didn't modify any of the old pdf bytes with this test:

const testPdf = await readFile(path.join(__dirname, './test.pdf'))
const incrementedPdf = addIncrementalUpdate(testPdf, 'testing incremental update')
assert(incrementalPdf.subarray(0, testPdf.length).equals(testPdf))

And I also observed from the pdf bytes that the new increment contains a new catalog, trailer and EOF.

adnsistemas · 2025-07-07T23:16:06Z

I added some tests in the PR, unit and app tests. One tests that the original PDF is unchanged.
There is not one that checks the increment, maybe is worth adding it.

SamiSaves

I went through the PR and didn't find much to say as I am not familiar with the codebase.

To me it looks good, it doesn't seem to break existing functionality and this is confirmed by existing tests. It has test coverage for the new feature as well.

I think the only thing missing here is some documentation what the incremental update is, what does it do and how to use it.

SamiSaves · 2025-07-03T12:02:50Z

package.json

-      "prettier --check || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)",
-      "eslint --fix || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)"


Probably should revert these changes. I wonder what here caused issues 🤔

I had to make changes in order to be able to run the tests. Don't know why, but in my environment the original failed with an error. Should be "reverted" to merge, but I can't do it in my branch.

SamiSaves · 2025-07-08T05:20:19Z

apps/deno/tests/test20.ts

+  let tw = font.widthOfTextAtSize('Electronic Signature Test #20', tamLetra);
+  let lw = fontBold.widthOfTextAtSize(usarLibreOffice, tamAdvertencia);
+  if (lw > tw) tw = lw;


I'd personally prefer more descriptive names for variables. As I am new to this project and the world of pdfs, it takes quite a bit of effort to understand what these are for.

SamiSaves · 2025-07-08T06:13:35Z

tests/api/PDFDocument.spec.ts

+        expect(pdfIncrementalBytes.byteLength).toBeGreaterThan(0);
+        expect(finalPdfBytes.byteLength).toBeLessThanOrEqual(
+          pdfSaveBytes.byteLength,
+        );


If we are expecting same output in both cases, would it make sense to compare that the actual bytes are equal?

Well, then do it, no?

Sharcoux · 2025-07-08T07:37:57Z

package.json

  "description": "Create and modify PDF files with JavaScript",
  "author": "Andrew Dillon <[email protected]>",
-  "packageManager": "[email protected]+",
+  "packageManager": "[email protected]",


Why this change?

Original don't run in my environment. Should be equivalent, but in my case npm complains about it.

Cantoo scribe master has

"packageManager": "[email protected]+",

This pr has

"packageManager": "[email protected]",

And your forks master has:

"packageManager": "[email protected]+sha512.a6b2f7906b721bba3d67d4aff083df04dad64c399707841b7acf00f6b133b7ac24255f2652fa22ae3534329dc6180534e98d17432037ff6fd140556e2bb3137e",

Have you tried with the cantoo master version? Perhaps the issue was the yarn with the full sha?

Yes, I try a lot of combinations that didn't work. But I'm not keen at all in configs.. so probably is something simple that I don't know about. Config and package changes shouldn't be necessaries, but I don't know how to exclude them from the PR.

Could you do somthing like this:

Revert the changes (no commiting yet)

Temporarily remove this from package.json

"husky": { "hooks": { "pre-commit": "lint-staged" } },

Stage and commit the reverted code (excluding the husky change above)

I think is done..
Also added a fix in largestObjectNumber determination, when XREF is a Stream. For that I had to fix some PDFparser tests expectations, a couple of which I'm not totally sure about, but it seems that all the expectations in those tests are "taken" from the test result, and not from the PDF actual content, so I'm guessing is "all right".

Sharcoux · 2025-07-08T07:38:04Z

package.json

    "{src,tests,apps}/**/*.{ts,js,json,html,css}": [
-      "prettier --check || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)",
-      "eslint --fix || (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)"
+      "prettier --check",


Why those changes?

Can't run it without the change.

For me, in your forks branch cantoo-scribe-master eslint gets stuck trying to lint /dist/pdf-lib.esm.min.js. So it seems to be trying to scan files should be ignored.

It seems to be due to this change: adnsistemas@417a8a6#diff-a32a0887ed9d1d707bbb3b845b7df7fd40e673c47e7b60a3ebd896b68d3b8839R10

Could this be the reason why it didn't work earlier with these changes?

It didn't got stuck for me, it always failed with the message after the ||
I have the same problem with both projects (the one I forked from, and this). I guess is the versions of node/npm/Linux I'm running in, but I really don't know (and to be honest I don't care much about it so I didn't dig into the actual problem).

What happens when you run yarn lint?

In @cantoo/pdf-lib master branch (node 20.17.0, npm 10.8.2):
Invalid package manager specification in package.json ([email protected]+); expected a semver version
Then, changing that line to [email protected]:
yarn run v1.22.19 $ yarn lint:prettier && yarn lint:eslint $ prettier --write "./{src,tests,apps}/**/*.{ts,js,json,html,css}" --log-level error $ eslint .
And there it hungs, after running only yarn lint:eslint --verbose, and then again yarn lint it works.
Then I changed a file and staged it. Running yarn lint-staged gives:
`yarn run v1.22.19
$ lint-staged
✔ Backed up original state in git stash (d85c0fa)
⚠ Running tasks for staged files...
❯ package.json — 1 file
❯ {src,tests,apps}/**/*.{ts,js,json,html,css} — 1 file
✖ prettier --check || (echo '
⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.' && exit 1) [FAILED]
◼ eslint --fix || (echo '
⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.' && exit 1)
↓ Skipped because of errors from tasks.
✔ Reverting to original state because of errors...
✔ Cleaning up temporary files...

✖ prettier --check || (echo '
⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.' && exit 1):
[error] No files matching the pattern were found: "||".
[error] No files matching the pattern were found: "(echo".
[error] No files matching the pattern were found: "
[error] ⚠️ Linting issues detected. Please run "yarn lint" to fix them and stage the changes.".
[error] No files matching the pattern were found: "&&".
[error] No files matching the pattern were found: "exit".
[error] No files matching the pattern were found: "1)".
Checking formatting...
All matched files use Prettier code style!
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.`

But yarn lint finds no problems.

Sharcoux · 2025-07-08T07:42:52Z

src/core/objects/PDFObject.ts

 import PDFContext from '../PDFContext';

 class PDFObject {
+  registerChange() {


Throwing an error will probably break something PDFPageLeaf, will it not? Shouldn't we just do nothing?

Every PDFObject should 'registerChange' (do noting is an option for specific descendants).
Otherwise you can't guarantee the increment will be correct.
If you just "forgot" to implement it, and the base implementation does nothing, will be difficult to trace why you have missing elements in the increment.

Sharcoux · 2025-07-08T07:45:42Z

rollup.config.mjs

  'Circular dependency: es/api/PDFPage.js -> es/api/PDFDocument.js -> es/api/form/PDFForm.js -> es/api/form/PDFOptionList.js -> es/api/PDFPage.js',
  'Circular dependency: es/api/PDFPage.js -> es/api/PDFDocument.js -> es/api/form/PDFForm.js -> es/api/form/PDFRadioGroup.js -> es/api/PDFPage.js',
  'Circular dependency: es/api/PDFPage.js -> es/api/PDFDocument.js -> es/api/form/PDFForm.js -> es/api/form/PDFTextField.js -> es/api/PDFPage.js',
+  'Circular dependency: es/core/index.js -> es/core/writers/PDFWriter.js -> es/api/snapshot/index.js -> es/api/snapshot/IncrementalDocumentSnapshot.js -> es/core/index.js',


IncrementalDocumentSnapshot should not depend on core/index.js.

Sharcoux · 2025-07-08T07:45:47Z

rollup.config.mjs

  'Circular dependency: es\\api\\PDFPage.js -> es\\api\\PDFDocument.js -> es\\api\\form\\PDFForm.js -> es\\api\\form\\PDFOptionList.js -> es\\api\\PDFPage.js',
  'Circular dependency: es\\api\\PDFPage.js -> es\\api\\PDFDocument.js -> es\\api\\form\\PDFForm.js -> es\\api\\form\\PDFRadioGroup.js -> es\\api\\PDFPage.js',
  'Circular dependency: es\\api\\PDFPage.js -> es\\api\\PDFDocument.js -> es\\api\\form\\PDFForm.js -> es\\api\\form\\PDFTextField.js -> es\\api\\PDFPage.js',
+  'Circular dependency: es\\core\\index.js -> es\\core\\writers\\PDFWriter.js -> es\\api\\snapshot\\index.js -> es\\api\\snapshot\\IncrementalDocumentSnapshot.js -> es\\core\\index.js',


Sharcoux · 2025-07-08T07:46:28Z

src/api/snapshot/DefaultDocumentSnapshot.ts

@@ -0,0 +1,29 @@
+import { PDFObject, PDFRef } from '../../core';


Do you want to import only the type? That might solve the cyclic dependency.

Sharcoux · 2025-07-08T07:46:34Z

src/api/snapshot/DocumentSnapshot.ts

@@ -0,0 +1,14 @@
+import { PDFObject, PDFRef } from '../../core';


Do you want to import only the type? That might solve the cyclic dependency.

Sharcoux · 2025-07-08T08:35:51Z

src/api/snapshot/IncrementalDocumentSnapshot.ts

@@ -0,0 +1,59 @@
+import { PDFContext, PDFObject, PDFRef } from '../../core';


Do you want to import only the type? That might solve the cyclic dependency.

Those imports are "inherited" from the project I took the incremental update base functionality. Not sure about the circular dependency issue, and solution.

I think this solution would work with DocumentSnapshot and DefaultDocumentSnapshot, in both all imports can be switched to import type I think. But in this file we have obj instanceof PDFRef ? obj : this.context.getObjectRef(obj) and that requires a normal import from core

Would it help if we import PDFRef straight from the file itself like PDFWriter? 🤔

import PDFRef from '../../core/objects/PDFRef';

Or maybe this.context.getObjectRef(obj) could handle the case of PDFRef and return the obj in that case?

Implemented most of the suggested changes and all seems to be working as before, no more circular dependency message, and no more '!' after getObjectRef.
Had to add a method to context, to be able to import only the type for PDFRef.
Is there a way to update this PR with the changes?

To answer my own question: Yes, by pushing to the branch the PR comes from.
Updated.

adnsistemas · 2025-07-08T17:04:35Z

I think the only thing missing here is some documentation what the incremental update is, what does it do and how to use it.
Didn't include it because makes the PR "messy". You can take the documentation from my project @adnsistemas/pdf-lib, the README.md has the documentation.

Sharcoux · 2025-07-10T15:11:15Z

src/api/snapshot/IncrementalDocumentSnapshot.ts

@@ -0,0 +1,59 @@
+import { PDFContext, PDFObject, PDFRef } from '../../core';


Or maybe this.context.getObjectRef(obj) could handle the case of PDFRef and return the obj in that case?

Sharcoux · 2025-07-10T15:17:21Z

src/api/snapshot/IncrementalDocumentSnapshot.ts

+  markObjsForSave(objs: PDFObject[]): void {
+    this.markRefsForSave(
+      objs.map((obj) =>
+        obj instanceof PDFRef ? obj : this.context.getObjectRef(obj)!,


It's probably not a good idea to assume this.context.getObjectRef(obj)!. It might be better to handle the case where it won't return what you expect. Does it make sense to accept filter the map? .filter((obj): result is PDFObject => !!obj)

I really don't like that '!' there, but as I didn't wrote that, I supposed it was just for TS not complaining. I like the filter a lot more, and makes more sense: if you can't get a Ref, then there is nothing to update in the increment.

tests/core/parser/PDFParser.spec.ts

SamiSaves · 2025-09-16T09:38:19Z

I did another round of review for the PR. I did some tests and didn't immediately find any issues with the incremental update feature.

I also did a review with Copilot and it suggested few possible issues which I wasn't able to reproduce or didn't see them causing issues with validation. For example one of the issues was that if you set a large arbitrary number for /Size manually, the code would take largestObjectNumber from there. Copilot implied that this would cause performance issues, but that wasn't the case for me at least.

I also decided to check the binary files with VirusTotal and no issues found there.

I found that some of the tests might benefit from more strict data checks that we persist original data correctly. I left comment on the one I found, but I suspect some other tests might benefit from something like this as well.

SamiSaves · 2025-09-27T09:14:33Z

@Sharcoux Do you see any more blockers for merge? Or should we wait for another reviewer like @erano067?

SamiSaves · 2025-10-24T06:53:35Z

The task where we need the incremental update is coming up on my todo list, so I am planning on forking this fork with the this PR to our organization as an internal package to see if we can use it in production.

I can report how it's working once I have it up and running.

Mikescops · 2025-10-24T21:33:55Z

Hello, I'm also interested by this PR, for now I'm using a Go project (https://github.com/subnoto/pdfsign) that is able to do the incremental update (used then for electronic signature) bundled into wasm.

I will try to switch the logic with this PR and give some feedback if anything breaks. The ideal scenario is to take a document that is already electronically signed, make some incremental updates and see if the signature integrity is broken or not.

Also hope @Sharcoux can shine in at some point.

Sharcoux · 2025-10-24T21:59:28Z

Hello guys. Sorry, I got a lot on my plate recently, but I promise we didn't forget about you.

Sharcoux · 2025-10-31T12:48:15Z

@Mikescops @SamiSaves @adnsistemas I reviewed. I also would have the following comments:

Having a test checking that existing signatures are still valid would be interesting. Like:

const shouldNotCompress =
  ref === this.context.trailerInfo.Encrypt ||
  object instanceof PDFStream ||
  object instanceof PDFInvalidObject ||
  ref.generationNumber !== 0 ||
  (object instanceof PDFDict &&
    (object as PDFDict).lookup(PDFName.of('Type')) === PDFName.of('Sig'));

What happens for encrypted pdf and the /Encrypt dict when using the incremental update?
If an object from the original pdf is in an object stream and is being updated, what happens?
We should probably test those cases:

// 1. Array updates
describe('incremental update with array modifications', () => {
  it('tracks array changes', async () => {
    const pdfDoc = await PDFDocument.load(bytes);
    const snapshot = pdfDoc.takeSnapshot();
    const page = pdfDoc.getPage(0);
    const annots = page.node.get(PDFName.of('Annots'));
    annots.push(newAnnotRef); // ← Est-ce tracké?
    const increment = await pdfDoc.save();
    // Now check that the annotation is in the increment
  });
});

// 2. Multiple snapshots
describe('multiple snapshots', () => {
  it('handles multiple snapshots correctly', async () => {
    const snap1 = pdfDoc.takeSnapshot();
    const snap2 = pdfDoc.takeSnapshot();
    // Now check which snapshot is really being used
  });
});

// 3. Signature preservation
describe('signature validation', () => {
  it('preserves existing signatures', async () => {
    const signed = await PDFDocument.load(signedPdfBytes);
    const snapshot = signed.takeSnapshot();
    signed.getPage(0).drawText('Modified');
    const result = await signed.save();
    // Now check (at least once with Adobe/Foxit) that the signature is still valid
  });
});

// 4. Memory check
describe('memory handling', () => {
  it('handles 50MB+ PDFs', async () => {
    const largePdf = createLargePdf(50 * 1024 * 1024);
    const before = process.memoryUsage().heapUsed;
    const doc = await PDFDocument.load(largePdf);
    doc.takeSnapshot();
    const after = process.memoryUsage().heapUsed;
    expect(after - before).toBeLessThan(200 * 1024 * 1024);
  });
});

I'm also a bit worried with doc.save() that would return a completely different object if someone, somewhere, at any point, has called takeSnapshot() in the code. When the user was expecting a full pdf, they will suddenly receive an incremental pdf instead. That could come as a surprise. I'd recommand adding the option "incremental" to the SaveOptions instead.

Sharcoux · 2025-10-31T12:26:34Z

src/core/objects/PDFDict.ts

  }

  set(key: PDFName, value: PDFObject): void {
+    this.registerChange();


Shouldn't we also intercept changes made to PDFArray, PDFStream, and others?

Like:

PDFArray.push(item)

PDFArray.set(index, item)

PDFStream.updateContents(bytes)

PDFRawStream modifications

I have problems implementing these test cases. I don't know in what case you modify this kind of pdf objects that is not in the internals of the library.
As far as I can tell, all of this are handled internally when the "user" makes "visible" changes on the PDF.
If someone can provide complete examples of use cases where you manually modify any of these, I can test and add the change registration if required.
Thanks.

Sharcoux · 2025-10-31T12:28:50Z

src/api/PDFDocument.ts

+      this.context.pdfFileDetails.prevStartXRef,
+      this.context,
+    );
+    if (!this.context.snapshot && this.context.pdfFileDetails.originalBytes) {


With !this.context.snapshot, if takeSnapshot is being called twice, only the first one will be used

This can also result in a memory leak

There are two ways of using incremental update. One is manual, calling takeSnapshot, the other is automatic, using the 'forIncrementalUpdate' on doc opening.
This is for the automatic path, and you only need one snapshot, taken on document opening.
If you take others snapshots you have to handle those snapshots yourself, those are not handled internally.

Isn't this a bit confusing for users?

Maybe. Originally I though of hiding the takeSnapshot() stuff, and only be able to use incremental update by opening the document forIncrementalUpdate.
But there may be use cases (that I don't know of) where you only want the increment, and do as you please with it.
The "use path" of manually handling the snapshot does not preserves the pdf bytes. That's another reason for it existance.
My general way of thinking about this is: PDF is not ease as it seems, is impossible to make a powerful tool "bulletproof simple", so, if you are using a PDF library, you need some PDF knowledge, or you will fail no matter how simple it is.
I don't see it more complex that understanding that PDF coordinates are "reversed". Maybe better documentation is requiered, but I wonder "is there a 'incremental update functionality' user that doesn't knows why he/she needs it?"

Sharcoux · 2025-10-31T12:32:29Z

src/core/PDFContext.ts

+    pdfSize: number;
+    prevStartXRef: number;
+    useObjectStreams: boolean;
+    originalBytes?: Uint8Array;


Are you sure about this? This will keep a reference to the whole pdf bytes.

To have a valid incremental update you need to preserve the original PDF "as is".
This is only done when you open a PDF for incremental update (automatic incremental update handling).
What's the alternative?

I wonder if we should not accept FileHandle or Path, or provide a way for the user to keep the bytes on his side and provide them on save, but I'm not sure.

The "problem" I see with that, is that you can't guarantee a valid output. Since you get the handle, or the path, until you use it to generate the original content, it may have changed.
Preserving the bytes you read as the original PDF is the only way to guarantee the output of save() is exactly what you read.
If I, as a user, want to handle the increment by myself, then I can take a snapshot, get the saveIncremental() and append it to whatever I want (that's why I haven't hidden that functionality, which was the original "incremental update" implementation).
The only reason I choose to preserve the bytes, as passed, is to make sure the final output is what you expect it to be.

adnsistemas · 2025-11-04T03:46:07Z

I'm also a bit worried with doc.save() that would return a completely different object if someone, somewhere, at any point, has called takeSnapshot() in the code. When the user was expecting a full pdf, they will suddenly receive an incremental pdf instead. That could come as a surprise. I'd recommand adding the option "incremental" to the SaveOptions instead.

That's not the case. Calling takeSnapshot() doesn't changes what save() does. That's why there is a saveIncremental().
When you open a pdf not for incremental update (normal opening), save() returns a new PDF (a reasembled one), whether you use takeSnapshot() or not.
When you open a pdf for incremental update, save() returns the original PDF plus the increment, wheter you use takeSnapshot() or not.
Taking snapshots of a PDF document allows you to "build the increment", from the point where you take the snapshot, to the point where you use saveIncremental(snapshot). It doesn't (and shouldn't) interfere with save().
save() has the additional paramenter of 'rewrite', for the case when you open a PDF for incremental update and you want it reassembled, instead of the original plus the increment.
I'm adding some of your suggestions, and answering your comments as I do (just adding a test to verify that takeSnapshot() doesn't chage result of saving).

adnsistemas · 2025-11-04T03:49:51Z

Having a test checking that existing signatures are still valid would be interesting.

This is absolutely out of scope of incremental update. Not changing the original PDF is what preserves signature validity, checking that by using incremental update, the first part of the resulting PDF is exactly as the original, guaranties signature validity preservation. That test already exists.

Mikescops · 2025-11-04T10:01:27Z

This is absolutely out of scope of incremental update. Not changing the original PDF is what preserves signature validity, checking that by using incremental update, the first part of the resulting PDF is exactly as the original, guaranties signature validity preservation. That test already exists.

Agree on that point, the signature validity is more of a use case rather than a test case. If the original part of the PDF is left untouched it shouldn't cause any issue with the signature.

Sharcoux · 2025-11-04T12:23:41Z

.
When you open a pdf for incremental update, save() returns the original PDF plus the increment, wheter you use takeSnapshot() or not.
Taking snapshots of a PDF document allows you to "build the increment", from the point where you take the snapshot, to the point where you use saveIncremental(snapshot). It doesn't (and shouldn't) interfere with save().
save() has the additional paramenter of 'rewrite', for the case when you open a PDF for incremental update and you want it reassembled, instead of the original plus the increment.

That's true, my bad, but still, having the result of save depend on the parameters load has been called with makes me feel uncomfortable because those 2 instructions might be quite distant from each other in the code...

Sharcoux · 2025-11-04T12:32:18Z

This is absolutely out of scope of incremental update. Not changing the original PDF is what preserves signature validity, checking that by using incremental update, the first part of the resulting PDF is exactly as the original, guaranties signature validity preservation. That test already exists.

I was worried of what the incremental update would do on a signed content, but apparently, the PDF spec seems to allow writing bytes after the /ByteRange. You can ignore that comment then.

adnsistemas · 2025-11-04T19:01:12Z

.
When you open a pdf for incremental update, save() returns the original PDF plus the increment, wheter you use takeSnapshot() or not.
Taking snapshots of a PDF document allows you to "build the increment", from the point where you take the snapshot, to the point where you use saveIncremental(snapshot). It doesn't (and shouldn't) interfere with save().
save() has the additional paramenter of 'rewrite', for the case when you open a PDF for incremental update and you want it reassembled, instead of the original plus the increment.

That's true, my bad, but still, having the result of save depend on the parameters load has been called with makes me feel uncomfortable because those 2 instructions might be quite distant from each other in the code...

I don't know what you mean. save() always produces the whole PDF, the only "caveat" is that if it was open 'forIncrementalUpdate' the way of writing the PDF file is different, as the original is respected. But the output is an equivalent PDF to the save({rewrite:true}) one. Both resulting PDF documents are exactly the same, the file structure is the only difference.
What's the use case that makes you uncomfortable? That's what I don't understands (I think we are talking about different aspects of usage).

…ify saving consistency

…e and incremental generation. Parameter adding for incremental update saving, to force useObjectStreams preference

- Add commit() method to PDFDocument enabling chained incremental updates - Fix font duplication: change 'modified' flag to 'alreadyEmbedded' in PDFFont - Use PDFWriter instead of PDFStreamWriter for incremental saves - Add comprehensive tests for commit() functionality (17 test cases)

erano067 · 2025-11-29T13:06:13Z

Hey everyone 👋

Been using this incremental update feature in production for a signing workflow and it's been working great. Wanted to share some feedback and a small addition I made.

What I tested:

The byte preservation works - I verified this both with unit tests and manually opening the output in Chrome, Firefox, and Preview on Mac. PDFs with signature fields also work fine.

I added a couple tests to cover this explicitly:

preserves bytes exactly for signature validity - loops through and checks byte-by-byte
preserves signature field widgets after incremental updates - uses with_signature.pdf

One thing I added: commit() method

I needed to do multiple incremental saves in a row without reloading the PDF each time. The current flow requires you to reload after each saveIncremental() which is slow and loses your in-memory state.

So I added a commit() method that handles this:

const pdfDoc = await PDFDocument.load(bytes, { forIncrementalUpdate: true });

page.drawText('First edit');
await pdfDoc.commit();

page.drawText('Second edit');  
const finalPdf = await pdfDoc.commit();

It basically calls saveIncremental() internally, then updates originalBytes and prevStartXRef so you can keep going. Added 19 tests for it.

Also fixed a font duplication bug - fonts were getting re-embedded on each commit because of a modified flag that should've been alreadyEmbedded.

@Sharcoux - RE your earlier questions:

The change tracking via registerChange() seems to work for all the cases I tried (pages, fonts, images, text). The high-level API calls it correctly.

On the memory concern - yeah originalBytes does hold the full PDF, but that's required by the PDF spec for valid incremental updates. With commit() we replace it after each save so old allocations can be GC'd.

@adnsistemas - any chance you could pull in my changes? PR is here: adnsistemas#3

Would be nice to get this merged - seems like it's been ready for a while and a few people are waiting on it.

Branch: https://github.com/erano067/pdf-lib/tree/incremental-commit

Sharcoux · 2025-11-29T13:51:44Z

Great feedback. I'll try to merge your version.

Previously, inline objects (arrays/dicts embedded directly in other objects, not registered as indirect objects) would not be tracked for incremental saves. This caused changes to inline arrays like MediaBox to be lost. The fix enhances PDFContext.registerObjectChange() to: 1. First check if the object is directly registered (indirect object) 2. If not, search through all indirect objects to find which one contains the inline object, and mark that parent for save Also fixed lint-staged config that had broken shell commands. Added test: 'tracks inline array modifications for incremental saves' All 673 tests pass.

erano067 · 2025-11-29T14:07:37Z

Hey, I took a closer look at the change tracking for incremental updates and found a gap that I've now fixed.

When you modify an array or stream in a PDF, those changes need to be tracked so they're included in the incremental save. The original implementation tracked changes to PDFDict but missed PDFArray and PDFRawStream.

I added registerChange() calls to:

PDFArray — for push(), insert(), remove(), and set() operations
PDFRawStream — for updateContents()

But then I discovered a trickier issue: inline objects weren't being tracked at all.

For example, when you do page.node.lookup(PDFName.MediaBox) and modify that array directly, it wasn't getting saved. Why? Because inline arrays (ones embedded directly inside a dict rather than stored as separate indirect objects) don't have their own reference — so context.getRef(array) returns undefined and the change was silently ignored.

The fix: I enhanced registerObjectChange() to detect when an object isn't directly registered, and in that case, it searches through all indirect objects to find which one contains the modified inline object, then marks that parent for save.

Added tests for:

Array modifications (new arrays and push() on existing arrays)
Stream content updates
Inline array modifications (the MediaBox edge case)
Encrypted PDFs correctly throwing an error with forIncrementalUpdate

All 673 tests pass.

It is ready in this PR

Add commit() method for multiple incremental updates without reloading

erano067 · 2025-12-05T13:48:34Z

Hey I synced with @adnsistemas. From what I see, his branch is production ready. What’s needed to get this branch merged?

Hopding and others added 14 commits November 6, 2021 17:28

Bump version to 1.17.1

4beebbe

Merge branch 'master' of github.com:Hopding/pdf-lib

93dd36e

add incremental save functionality

c6cbda3

make snapshot mechanism more generic

c3ea300

Incremental Update Change Automatic Registration

05376db

Generación de paquete npm

7d61af0

Merge branch 'master' of github.com:cantoo-scribe/pdf-lib into cantoo…

0f2abc3

…-scribe-master

Tests conciliation

40dcac3

Fixed merge with cantoo-scribe

872fbdc

Readme and package changes

b4e7323

Lint minor fix

d264a49

ELint exclussions

4b22104

Correcciones al proceso de publicación del paquete

b783161

Adecuación con la rama master de cantoo-scribe

417a8a6

github-actions bot added the needs-triage label Jun 3, 2025

SamiSaves approved these changes Jul 8, 2025

View reviewed changes

Sharcoux reviewed Jul 8, 2025

View reviewed changes

Sharcoux reviewed Jul 10, 2025

View reviewed changes

adnsistemas added 4 commits July 10, 2025 17:00

Circular dependency correction. Changed ref registration improvement

ce366d7

Circular dependency fix. Changed Refs registration improvement

3aac5c5

Version bump

2e21e49

Object number detection correction

cce6de3

SamiSaves reviewed Sep 16, 2025

View reviewed changes

tests/core/parser/PDFParser.spec.ts Show resolved Hide resolved

Improvement in originalbytes preservation test

bc251b4

Sharcoux reviewed Oct 31, 2025

View reviewed changes

adnsistemas added 2 commits November 4, 2025 17:15

Fix in XREFStreams saving of document. Added some saving tests to ver…

ef42bdf

…ify saving consistency

Fix in object streams usage on XREF generation, for incremental updat…

55a22c1

…e and incremental generation. Parameter adding for incremental update saving, to force useObjectStreams preference

chr33s mentioned this pull request Nov 27, 2025

Upstream changes chr33s/pdf#4

Open

3 tasks

erano067 added 2 commits November 29, 2025 14:05

Add signature preservation tests and PR documentation

9fc9a05

adnsistemas added 5 commits December 1, 2025 19:54

Merge branch 'cantoo-scribe-master' into partialsave

b784dd2

Minor optimizations

eb6bd83

Fix in file version

8ad9760

Updated documentation

4f8ce53

Merge pull request #3 from erano067/incremental-commit

53c1cec

Add commit() method for multiple incremental updates without reloading

		"prettier --check \|\| (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)",
		"eslint --fix \|\| (echo '\n⚠️ Linting issues detected. Please run \"yarn lint\" to fix them and stage the changes.' && exit 1)"

		@@ -0,0 +1,29 @@
		import { PDFObject, PDFRef } from '../../core';

		@@ -0,0 +1,14 @@
		import { PDFObject, PDFRef } from '../../core';

		@@ -0,0 +1,59 @@
		import { PDFContext, PDFObject, PDFRef } from '../../core';

Incremental Update Implementation (cleaner) #111

Are you sure you want to change the base?

Incremental Update Implementation (cleaner) #111

Uh oh!

Conversation

adnsistemas commented Jun 3, 2025

What?

Why?

How?

Testing?

New Dependencies?

Screenshots

Suggested Reading?

Anything Else?

Checklist

Uh oh!

SamiSaves commented Jul 1, 2025

Uh oh!

benjinus commented Jul 1, 2025

Uh oh!

Sharcoux commented Jul 1, 2025

Uh oh!

SamiSaves commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SamiSaves commented Jul 4, 2025

Uh oh!

SamiSaves commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adnsistemas commented Jul 7, 2025

Uh oh!

SamiSaves left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

SamiSaves commented Jul 2, 2025 •

edited

Loading

SamiSaves commented Jul 7, 2025 •

edited

Loading