Fixes #39198 - Bulk-insert new facts and add additive import mode#10942
Fixes #39198 - Bulk-insert new facts and add additive import mode#10942pablomh wants to merge 1 commit into
Conversation
78ee0e8 to
8c913f2
Compare
b1a2a36 to
8070033
Compare
8070033 to
cf304fa
Compare
cf304fa to
14ec47e
Compare
Two related improvements to the fact import pipeline: 1. Bulk inserts for new facts on persisted hosts add_new_facts inserts each new fact individually via create! inside a loop. With 193 RHSM facts per registration this produces 193 individual INSERT statements. For persisted hosts, replace the per-fact loop with a single FactValue.insert_all call for facts with non-nil values, reducing N INSERT statements to 1. Nil-value facts are compose/parent nodes produced by StructuredFactImporter's fill_hierarchy method. They are separated at the add_new_facts level and handled individually via the existing add_new_fact path, preserving per-fact error handling semantics. The host.new_record? path is preserved unchanged since bulk insert requires a known host_id. 2. Additive import mode import! and HostFactImporter#import_facts now accept an additive: false keyword argument. When additive: true, the delete_removed_facts step is skipped — facts in the new set are added or updated, but facts already in the DB that are absent from the new set are preserved. This enables callers to import a targeted subset of facts without wiping the host's existing fact set. Katello uses this to import only the 3 facts needed for RhelLifecycleStatus during registration, deferring the full sync to the subsequent PUT /rhsm/consumers/:id call. All existing callers are unaffected (additive defaults to false). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
14ec47e to
6268428
Compare
Performance testing resultsTested on a medium Satellite (8 CPU, 32 GB RAM) with 3 registration topologies (direct, standalone capsule, load-balanced capsule pair) across concurrency levels from 56 to 560 concurrent hosts. This PR was tested together with its companion Katello/katello#11701 (which depends on the Direct impact of this PR (bulk INSERT):
Combined impact with Katello/katello#11701 (measured in the same batch):
The bulk INSERT accounts for most of the ActiveRecord time reduction — replacing ~108 individual round-trips to PostgreSQL with a single The |
ekohl
left a comment
There was a problem hiding this comment.
Two related improvements to the fact import pipeline:
Why are these combined in a single PR? Combining multiple changes in a single PR makes life harder for reviewers so please separate this into its own PR.
| @counters[:added] = facts_to_create.size | ||
| end | ||
|
|
||
| def bulk_insert_new_facts(names) |
There was a problem hiding this comment.
Please be consistent in what it does:
| def bulk_insert_new_facts(names) | |
| def bulk_insert_new_fact_values(names) |
Should I created an additional Redmine issue when I split them? |
What are the changes introduced in this pull request?
Two related improvements to the fact import pipeline:
1. Bulk inserts for new facts on persisted hosts
FactImporter#add_new_factscurrently inserts each new fact individually viacreate!inside a loop. This PR replaces that with a singleFactValue.insert_allcall for non-nil facts on persisted hosts, reducing N INSERT statements to 1.Nil-value compose facts (parent/hierarchy nodes produced by
StructuredFactImporter#fill_hierarchy) are separated at theadd_new_factslevel and handled individually via the existingadd_new_factpath. This preserves per-fact error handling semantics and keepsbulk_insert_new_factswith a single, clean responsibility: bulk insert a pre-filtered list of non-nil facts.The
host.new_record?path is preserved unchanged since bulk insert requires a knownhost_id.2. Additive import mode
FactImporter#import!andHostFactImporter#import_factsnow accept anadditive: falsekeyword argument. Whenadditive: true, thedelete_removed_factsstep is skipped — facts in the new set are added or updated, but facts already in the DB that are absent from the new set are preserved rather than deleted.This enables callers to import a small, targeted subset of facts (e.g. only the facts needed for OS detection during host registration) without wiping the host's existing fact set. Katello uses this in the registration flow to import only the 3 facts needed for
RhelLifecycleStatusto be immediately correct, deferring the full sync to the subsequentPUT /rhsm/consumers/:idcall.Considerations taken when implementing this change?
During host registration, ~193 RHSM facts are imported synchronously as part of the critical
POST /rhsm/consumerspath (which already takes ~3,256ms under no load). Under concurrent bulk registration, 193 individual INSERTs per host compounds significantly. The bulk insert reduces this to 1 INSERT per registration.FactValuehas noafter_createcallbacks — only a uniqueness validation enforced at the DB level — makinginsert_allsafe here.unique_by: [:fact_name_id, :host_id]is specified explicitly.The
additive:keyword defaults tofalse, so all existing callers are unaffected.What are the testing steps for this pull request?
host.new_record?path still builds facts correctly (e.g. during unattended provisioning)additive: truepreserves existing facts not present in the new setStructuredFactImporterTestFixes #39198