Skip to content

Route the bulk (many) function through the batching pipeline #2

@twinn

Description

@twinn

Summary

Calls to the user-defined bulk (plural) function — e.g. `get_many/1` — currently bypass the batching pipeline entirely. They execute the function body directly in the caller's process. This means callers that already have a list of ids in hand can't take advantage of coalescing with concurrent callers.

Motivating example

```elixir
defmodule MyApp.UserBatcher do
use Flurry, repo: :none

@decorate batch(get(id))
def get_many(ids) do
Repo.all(from u in User, where: u.id in ^ids)
end
end

Caller A (has a single id):

user = MyApp.UserBatcher.get(42)

Caller B (has a list of ids):

users = MyApp.UserBatcher.get_many([1, 2, 3])
```

Today, caller A goes through the producer/consumer pipeline and may coalesce with other singleton callers. Caller B calls `get_many/1` directly — it runs in caller B's process with its own DB connection, fires its own query, and never interacts with the pipeline.

What we'd want

Caller B should be able to opt into the pipeline too. When they call `get_many([1, 2, 3])`, the library should:

  1. Enqueue each id individually into the producer (same as if three separate `get/1` calls arrived)
  2. The producer coalesces them with any other pending work in the same group
  3. The bulk fn gets called once with the combined id list
  4. Results are correlated and returned to caller B as a list in the original input order

Caller B doesn't see the batching — they just get back `[user1, user2, user3]`. But their requests participated in the global coalescing.

Semantic questions

  • Order preservation. Caller B passed `[1, 2, 3]`. The batched result might include ids from other callers. We need to return caller B's results in the order they asked, which means filtering + reordering the correlation map.
  • Duplicates in input. If caller B passes `[1, 1, 2]`, do they get back `[user1, user1, user2]` (preserving duplicates) or `[user1, user2]` (deduped)? Preserving is probably what users expect.
  • Single vs list mode return. If the decorator declares `returns: :list`, each input element corresponds to a list of records. Caller B's result would be `[[record, record], [], [record]]` — a list of lists. Works naturally with the existing correlation logic.
  • Transaction bypass. Caller B should still respect `in_transaction:` modes — if they're inside a transaction with `in_transaction: :bypass`, run inline in their process with a flat call to the bulk fn, same as the singular path does.
  • Interaction with `overridable:`. If the user has `overridable: [get: 1]` and also wants to override `get_many/1`, the overridable list needs to cover both arities — `overridable: [get: 1, get_many: 1]`. Or we generate both transparently. TBD.

Implementation sketch

  • The `@decorate batch(get(id))` already knows the bulk fn name (`get_many`). We could generate a wrapper around `get_many/1` that:
    1. Enqueues each id into the producer via a new `GenServer.call(producer, {:enqueue_many, args, ...})` variant
    2. Waits for all replies
    3. Returns results in the caller's original order
  • OR: keep the user's `get_many/1` as the internal bulk function (what the pipeline invokes), and generate a new public `get_many_batched/1` (or similar) that does the enqueue dance. Less invasive, avoids renaming the bulk fn.
  • Or, cleaner: rename the user's bulk function internally to a helper like `_flurry_get_bulk/1`, and generate both `get/1` and `get_many/1` as public entry points — `get_many/1` enqueues and blocks, same as `get/1` but with a list input.

The naming collision with the user's `def get_many/1` is the tricky part. Either we accept that the user-defined `get_many` is internal-only and the public `get_many` is generated, or we introduce a different public name for the pipeline-routed variant.

Workaround for now

Manually call the singular in a `Task.async_stream`:

```elixir
users =
[1, 2, 3]
|> Task.async_stream(&MyApp.UserBatcher.get/1)
|> Enum.map(fn {:ok, user} -> user end)
```

Each singular call goes through the pipeline and coalesces with others. Loses the ergonomic `get_many([1, 2, 3])` call site but achieves the same effect.

Priority

Medium. The workaround works for most cases. This is quality-of-life for callers who naturally have lists in hand and want to participate in coalescing without contorting their code into async streams.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions