-
Notifications
You must be signed in to change notification settings - Fork 0
load/offload with mmap #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…refine_offload
| Convert a tensor to a memory-mapped CPU tensor using PyTorch's native mmap support. | ||
| """ | ||
| # Move to CPU if needed | ||
| if t.is_cuda: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
考虑到 comfyui 可能要适配其它的硬件,比如摩尔的 musa,(摩尔上的显卡tensor,t.is_cuda 返回 False,t.is_musa 才是 True)。
这里的条件是否改成
if not t._is_cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,晚点改个兼容 musa 的情况
问题描述comfyui cpu oom 复现云函数地址:https://cloud.siliconflow.cn/sft-d1s6t1r3jrms73f3ltpg/dedicated/functions/fniglisnqx?tab=definition 公网 API 端点: https://fniglisnqx.fn.6scloud.com 环境信息
|
看了工作流,里面有比较大的 lora 总共超过了 5G,所以那块有很大可能导致 OOM 可以先试下这个版本,增加了 lora 的 mmap:#8 |
应该不是lora的问题,我把 lora取消掉还是会cpu oom |
测试工作流只有 clip 能走到 comfyui 的 model_unload 逻辑,看工作流里面大部分节点都是第三方节点,他们的 offload 看起来是第三方节点内部做的,不受 comfyui 控制了。 |
启动 comfyui 的命令要加上环境变量 MMAP_MEM_THRESHOLD_GB=5,含义是若 cpu mem 小于 5G 时,遇到 offload 会 offload 到 mmap,避免爆 cpu 内存