Skip to content

Fixes #39298 - Use bulk insert for new fact values on persisted hosts#10979

Open
pablomh wants to merge 1 commit into
theforeman:developfrom
pablomh:registration/bulk-insert-fact-values
Open

Fixes #39298 - Use bulk insert for new fact values on persisted hosts#10979
pablomh wants to merge 1 commit into
theforeman:developfrom
pablomh:registration/bulk-insert-fact-values

Conversation

@pablomh
Copy link
Copy Markdown
Contributor

@pablomh pablomh commented May 7, 2026

Summary

Replace per-fact create! loop with a single insert_all for new facts on persisted hosts, reducing DB round-trips by ~94% (108 INSERTs → 6).

Changes

  • fact_importer.rb: add_new_facts partitions into compose facts (nil values, handled individually) and regular facts (bulk inserted via insert_all)
  • bulk_insert_new_fact_values: new method using FactValue.insert_all with unique_by: [:fact_name_id, :host_id]
  • Counter initialization: @counters = { added: 0, updated: 0, deleted: 0 } prevents nil access

How to test

  1. bundle exec rake test TEST=test/unit/fact_importer_test.rb
  2. Register a host and verify facts are imported correctly

Copy link
Copy Markdown
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8823 was the last attempt at improving fact import performance. Have you had a look at that? The benchmark code may also be useful as a reference.

Comment thread app/services/fact_importer.rb
Comment thread app/services/foreman_ansible/structured_fact_importer.rb
Comment thread app/services/fact_importer.rb Outdated
Copy link
Copy Markdown
Contributor Author

@pablomh pablomh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference is which method each PR targets: #8823 applied upsert_all to update_facts (existing facts changing value), where the current code already uses update_all with no callbacks — so there was no measurable gain. Our PR applies insert_all to add_new_facts (facts being created for the first time), where the current code calls create! per fact with individual INSERTs.

During registration this is the dominant path — all facts are new (100 per RHEL 8 host, 117 per RHEL 9/10 host) — and at concurrent scale the per-fact INSERTs compound. Measured on a 16-CPU Satellite with only this patch applied (no other changes), mixed workload (50% RHEL 8, 25% RHEL 9, 25% RHEL 10):

Concurrency INSERTs/host (before → after) AR P50 (before → after) AR P95 (before → after)
152 100 (el8) / 117 (el9,el10) → 1 2,610ms → 1,297ms (−50%) 8,214ms → 3,350ms (−59%)
304 100 (el8) / 117 (el9,el10) → 1 4,579ms → 2,004ms (−56%) 11,922ms → 6,232ms (−48%)
456 100 (el8) / 117 (el9,el10) → 1 8,492ms → 2,619ms (−69%) 13,036ms → 6,884ms (−47%)
608 100 (el8) / 117 (el9,el10) → 1 8,701ms → 3,215ms (−63%) 13,963ms → 7,566ms (−46%)
760 100 (el8) / 117 (el9,el10) → 1 9,789ms → 3,666ms (−63%) 14,138ms → 7,120ms (−50%)

Comment thread app/services/fact_importer.rb Outdated
Comment thread app/services/fact_importer.rb Outdated
Copy link
Copy Markdown
Member

@ekohl ekohl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These items may not be needed, but I wrote down my thoughts and notes when reviewing the code.

Comment thread app/services/fact_importer.rb Outdated
Comment thread app/services/fact_importer.rb
Comment thread app/services/fact_importer.rb
Replace per-record create! with insert_all for new fact values when
the host already exists in the database. For new (unsaved) hosts,
continue using build to defer persistence until the host is saved.

Key changes:

- bulk_insert_new_fact_values uses host.fact_values.insert_all with
  record_timestamps: true, letting Rails handle host_id, created_at,
  and updated_at automatically
- facts_to_create changed from Array to Hash via Hash#except, carrying
  both name and value to avoid redundant lookups
- add_new_fact simplified to always use build (only called for new
  records now)

Verified on foremanctl deployment: 117 fact_values created with
correct host_id and timestamps via association-scoped insert_all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented May 21, 2026

Ping?

@pablomh
Copy link
Copy Markdown
Contributor Author

pablomh commented May 21, 2026

I took another pass over your suggestion and I think the nicest end state would be to apply the { name => value } shape already in update_facts, then carry that through add_new_facts into bulk_insert_new_fact_values.

Concretely, I mean:

  • in update_facts:
    @facts_to_create = facts.except(*db_facts_names)

  • in add_new_facts:
    use facts_to_create.each_key on the host.new_record? path, and pass the hash directly to bulk_insert_new_fact_values on the persisted-host path

  • in bulk_insert_new_fact_values:
    iterate |name, value| and use:
    host.fact_values.insert_all(records, unique_by: [:fact_name_id, :host_id], record_timestamps: true)

So roughly:

def add_new_facts
  if host.new_record?
    facts_to_create.each_key { |name| add_new_fact(name) }
  else
    bulk_insert_new_fact_values(facts_to_create)
  end
  @counters[:added] = facts_to_create.size
end

def bulk_insert_new_fact_values(new_facts)
  return if new_facts.empty?

  records = new_facts.map do |name, value|
    { value: value, fact_name_id: fact_names[name].id }
  end

  host.fact_values.insert_all(records, unique_by: [:fact_name_id, :host_id], record_timestamps: true)
end

and in update_facts:

db_facts_names = db_facts.map(&:first) & fact_names.keys
@facts_to_create = facts.except(*db_facts_names)

I like that shape because it keeps the current no-partition flow for compose facts, removes repeated facts[name] lookups, and makes the data flow a bit cleaner end-to-end.

If we go this way, I’d also add focused coverage for host_id, created_at, updated_at, and persisted structured/compose facts through the bulk path.

Does that match what you had in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants