Skip to content

Fixes #39198 - Bulk-insert new facts and add additive import mode#10942

Closed
pablomh wants to merge 1 commit into
theforeman:developfrom
pablomh:registration/bulk-fact-inserts
Closed

Fixes #39198 - Bulk-insert new facts and add additive import mode#10942
pablomh wants to merge 1 commit into
theforeman:developfrom
pablomh:registration/bulk-fact-inserts

Conversation

@pablomh
Copy link
Copy Markdown
Contributor

@pablomh pablomh commented Mar 30, 2026

What are the changes introduced in this pull request?

Two related improvements to the fact import pipeline:

1. Bulk inserts for new facts on persisted hosts

FactImporter#add_new_facts currently inserts each new fact individually via create! inside a loop. This PR replaces that with a single FactValue.insert_all call for non-nil facts on persisted hosts, reducing N INSERT statements to 1.

Nil-value compose facts (parent/hierarchy nodes produced by StructuredFactImporter#fill_hierarchy) are separated at the add_new_facts level and handled individually via the existing add_new_fact path. This preserves per-fact error handling semantics and keeps bulk_insert_new_facts with a single, clean responsibility: bulk insert a pre-filtered list of non-nil facts.

The host.new_record? path is preserved unchanged since bulk insert requires a known host_id.

2. Additive import mode

FactImporter#import! and HostFactImporter#import_facts now accept an additive: false keyword argument. When additive: true, the delete_removed_facts step is skipped — facts in the new set are added or updated, but facts already in the DB that are absent from the new set are preserved rather than deleted.

This enables callers to import a small, targeted subset of facts (e.g. only the facts needed for OS detection during host registration) without wiping the host's existing fact set. Katello uses this in the registration flow to import only the 3 facts needed for RhelLifecycleStatus to be immediately correct, deferring the full sync to the subsequent PUT /rhsm/consumers/:id call.

Considerations taken when implementing this change?

During host registration, ~193 RHSM facts are imported synchronously as part of the critical POST /rhsm/consumers path (which already takes ~3,256ms under no load). Under concurrent bulk registration, 193 individual INSERTs per host compounds significantly. The bulk insert reduces this to 1 INSERT per registration.

FactValue has no after_create callbacks — only a uniqueness validation enforced at the DB level — making insert_all safe here. unique_by: [:fact_name_id, :host_id] is specified explicitly.

The additive: keyword defaults to false, so all existing callers are unaffected.

What are the testing steps for this pull request?

  • Register a host and verify all facts are correctly imported, including compose/parent facts from structured fact importers
  • Verify host.new_record? path still builds facts correctly (e.g. during unattended provisioning)
  • Verify that additive: true preserves existing facts not present in the new set
  • Run existing fact importer tests including StructuredFactImporterTest

Fixes #39198

@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented Mar 30, 2026

@pablomh pablomh force-pushed the registration/bulk-fact-inserts branch from 78ee0e8 to 8c913f2 Compare April 1, 2026 14:46
@pablomh pablomh changed the title Use bulk inserts when importing new facts for a persisted host Fixes #39198 - Use bulk inserts when importing new facts for a persisted host Apr 1, 2026
@pablomh pablomh force-pushed the registration/bulk-fact-inserts branch 2 times, most recently from b1a2a36 to 8070033 Compare April 2, 2026 19:24
@pablomh pablomh force-pushed the registration/bulk-fact-inserts branch from 8070033 to cf304fa Compare April 2, 2026 20:05
@pablomh pablomh changed the title Fixes #39198 - Use bulk inserts when importing new facts for a persisted host Fixes #39198 - Bulk-insert new facts and add additive import mode Apr 2, 2026
@pablomh pablomh force-pushed the registration/bulk-fact-inserts branch from cf304fa to 14ec47e Compare April 2, 2026 20:12
Two related improvements to the fact import pipeline:

1. Bulk inserts for new facts on persisted hosts

   add_new_facts inserts each new fact individually via create! inside a
   loop. With 193 RHSM facts per registration this produces 193 individual
   INSERT statements. For persisted hosts, replace the per-fact loop with a
   single FactValue.insert_all call for facts with non-nil values, reducing
   N INSERT statements to 1.

   Nil-value facts are compose/parent nodes produced by StructuredFactImporter's
   fill_hierarchy method. They are separated at the add_new_facts level and
   handled individually via the existing add_new_fact path, preserving per-fact
   error handling semantics.

   The host.new_record? path is preserved unchanged since bulk insert requires
   a known host_id.

2. Additive import mode

   import! and HostFactImporter#import_facts now accept an additive: false
   keyword argument. When additive: true, the delete_removed_facts step is
   skipped — facts in the new set are added or updated, but facts already in
   the DB that are absent from the new set are preserved.

   This enables callers to import a targeted subset of facts without wiping
   the host's existing fact set. Katello uses this to import only the 3 facts
   needed for RhelLifecycleStatus during registration, deferring the full sync
   to the subsequent PUT /rhsm/consumers/:id call.

   All existing callers are unaffected (additive defaults to false).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@pablomh pablomh force-pushed the registration/bulk-fact-inserts branch from 14ec47e to 6268428 Compare April 2, 2026 23:04
@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented Apr 17, 2026

Performance testing results

Tested on a medium Satellite (8 CPU, 32 GB RAM) with 3 registration topologies (direct, standalone capsule, load-balanced capsule pair) across concurrency levels from 56 to 560 concurrent hosts. This PR was tested together with its companion Katello/katello#11701 (which depends on the additive: mode introduced here).

Direct impact of this PR (bulk INSERT):

  • Fact INSERT statements per registration: 108 → ~6 (−94%)
  • The remaining individual INSERTs are for compose/parent facts (nil-value hierarchy nodes from StructuredFactImporter) that continue through the existing add_new_fact path

Combined impact with Katello/katello#11701 (measured in the same batch):

Metric Before After Change
POST /rhsm/consumers P50 15,288 ms 8,247 ms −46%
ActiveRecord time P50 3,413 ms 566 ms −83%
Ruby allocations P50 1,570k 581k −63%
PUT /rhsm/consumers/:id P95 16,280 ms 5,180 ms −68%
First-try success rate (56–560 concurrent) 68% 90% +22 pp

The bulk INSERT accounts for most of the ActiveRecord time reduction — replacing ~108 individual round-trips to PostgreSQL with a single INSERT ... ON CONFLICT eliminates per-statement overhead that compounds under concurrent load. At 280+ concurrent registrations the per-INSERT overhead was the dominant contributor to DB time, as each INSERT acquired a row-level lock, wrote a WAL record, and waited for the WAL flush individually.

The additive: mode introduced here has no direct performance impact but is the enabler for Katello/katello#11701's critical-path fact reduction — without it, importing a 3-fact subset would delete the host's other 190 facts.

Copy link
Copy Markdown
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two related improvements to the fact import pipeline:

Why are these combined in a single PR? Combining multiple changes in a single PR makes life harder for reviewers so please separate this into its own PR.

@counters[:added] = facts_to_create.size
end

def bulk_insert_new_facts(names)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be consistent in what it does:

Suggested change
def bulk_insert_new_facts(names)
def bulk_insert_new_fact_values(names)

@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented May 7, 2026

Two related improvements to the fact import pipeline:

Why are these combined in a single PR? Combining multiple changes in a single PR makes life harder for reviewers so please separate this into its own PR.

Should I created an additional Redmine issue when I split them?

@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented May 7, 2026

Split into two PRs per @ekohl's review feedback:

  • #10979 — Bulk insert for fact values (Redmine #39298)
  • #10980 — Additive fact import mode (Redmine #39299)

Closing this one in favour of the above.

@pablomh pablomh closed this May 7, 2026
@pablomh pablomh deleted the registration/bulk-fact-inserts branch May 7, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants