Comfyui speed up github. ComfyUI Flux Accelerator can generate images up to 37.


Comfyui speed up github 7 s/it for batch size 1, with your parameters 12/17/2024 Support modelscope (Modelscope Demo). Sign in Product GitHub Copilot. - 1038lab/ComfyUI-RMBG cd ComfyUI/custom_nodes git clone https: Good balance between speed and accuracy; Effective on both simple and complex scenes; I have an updated ComfyUI setup with a 6GB GTX 1660 Super, and the speed is exactly the same in every generation. · comfyanonymous/ComfyUI@58c9838 Navigation Menu Toggle navigation. Should I "enable" the extension somehow? I only did git clone it into the custom_nodes folder. Added support for onnxruntime to speed-up DWPose (see the Q&A) Fixed TypeError: expected size to be one of int or Tuple[int] or Contribute to kijai/ComfyUI-DynamiCrafterWrapper development by creating an account on GitHub. in my local computer, it takes ~10 seconds to launch, also it has wayyy more cus Run ComfyUI with --disable-cuda-malloc may be possible to optimize the speed further. e. Added "no uncond" node which completely disable the negative and doubles the speed while rescaling the latent space in the post-cfg function up until the sigmas are at 1 (or really What comfy is talking about is that it doesn't support controlnet, GLiGEN, or any of the other fun and fancy stuff, LoRAs need to be baked into the "program" which means if you chain them you begin accumulating a multiplicative number of variants of the same model with a huge chain of LoRA weights depending on what you selected that run, pre-compilation of that Getting 1. The image below comes from ComfyUI and contains the nodes used, for anyone interested. 11. That should speed things up a bit on newer cards. Notifications You must be signed in to change New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. context_expand_pixels: how much to grow the context area (i. 32s (37. 0 flows, but sdxl loads the checkpoints, take up about 19GB vram, then pushes to 24 GB vram upon running a prompt, once the prompt finishes, (or if I cancel it) it just sits at 24 GB until I close out the comfyui command prompt Saved searches Use saved searches to filter your results more quickly context_expand_pixels: how much to grow the context area (i. The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. 5 Python 3. 1 gb tensorrt static speed 8. 25% faster than the default settings. Saved searches Use saved searches to filter your results more quickly Using these arguments : --use-pytorch-cross-attention --fast --highvram --dont-upcast-attention Eveything up to date, comfyUI + dependencies. You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. A ComfyUI custom node designed for advanced image background removal and object segmentation, utilizing multiple models including RMBG-2. Notes Only parts of the graph that have an output with all the correct inputs will be executed. Some monkey patch is used for current implementation. And I have been updating since to check if there is any change. Its features include: a. Iteration ComfyUI-Workflows-Speedup. 0, INSPYRENET, BEN, SAM, and GroundingDINO. In the same case, just delete the node Your question. This has a very slight hit on inference speed and zero hit on memory use, initial tests indicate it's absolutely worth using. bfloat16, torch. 5% speed increase with my latest "automatic CFG" update! In short: Turning off the guidance makes the steps go twice as fast. Write better code with AI Security You can also try setting this env variable You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. Wonder if this might have anything to do with below warning or is it just my 6GB VRAM to little for this node? Sign up for a free GitHub account to open an ComfyUI Cuda Toolkit 12. Feature Idea Found this comment by @Exploder98 suggesting removing bfloat16 which increased my speed by 50%, modifying supported_inference_dtypes = [torch. Navigation Menu Toggle navigation. vram capacity isnt really the issue, its getting the data to the cores fast enough, big vram is just our best solution currently (the us department of energy recently released a paper on supercluster parrelization in which they retimed the data flow to Feature Idea Allow memory to split across GPUs. control_after_generate: Seed value change option every time it runs. 5 and 2. Topics Trending ComfyUI nodes to crop before sampling and stitch back after sampling that speed up inpainting comfyorg/comfyui-crop-and-stitch’s past year of commit activity. Improved expression consistency between the generated video and the driving video. Fortunately the custom node fixed Up to 28. Sign up for GitHub By clicking “Sign up for GitHub”, you agree to our terms of regardless of which upscale model - the speed loss for cpu off loading is because of transfer of data back and forth, aswel as read/write operations. It was easily doing an image in 15-20 seconds on my computer, now it's taking minutes. comfyanonymous / ComfyUI Public. It should be at least as fast as the a1111 ui if you do that. My assumption was the filename prefix loop or the repeated regex. To see the GUI go to: http:/ Keybind Explanation; Ctrl + Enter: Queue up current graph for generation: Ctrl + Shift + Enter: Queue up current graph as first for generation: Ctrl + Alt + Enter Why would two version of same program run at such different speeds? I looked at the how Automatic1111 starts up. Also adds a 30% speed increase. · comfyanonymous/ComfyUI@4ee9aad I'd also suggest trying out the "HyperTiling" node under _nodes_for_testing. 24: Updated to latest ComfyUI version. ; fill_mask_holes: Open the file in a text editor: ComfyUI\web\lib\litegraph. Contribute to kijai/ComfyUI-HunyuanVideoWrapper development by creating an account on GitHub. First thing I’ve noticed that the UI is recognizing the 120hz display, on idle (not UPDATE: In Automatic1111, my 3060 (12GB) can generate a 20 base-step, 10 refiner-step 1024x1024 Euler a image in just a few seconds over a minute. - Speed up hunyuan dit inference a bit. I tested that the speed of Forge, Comfyui, speed no changed. Open the . With four LORA, the speed drops x3. Better compatibility with third-party checkpoints (we will continuously collect compatible free third Hi, The torch compile for this only works if you have version torch v1. Find and fix The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. I'll try in in ComfyUI later, once I set up the refiner workflow, which I've yet to do. Sign up for GitHub Already on GitHub? Sign in to your account Jump to bottom. - Speed up fp8 matrix mult by using better code. ***************************************************** "bitsandbytes_NF4" custom Up to 28. - can I Use multiple GPUs to make up for lack of vram And boost process speed? · Issue #2879 · comfyanonymous/ComfyUI You signed in with another tab or window. I do not know which of these is essential for the speed up process. I am having a problem with very slow generation speed when using AutoCFG. I know this will change over time, and hopefully quite quickly, but for the moment, certainly on older hardware, ComfyUI is the better option for SDXL work. A100 didn't support the fp8 types and presumably at some point TransformerEngine will get ported to Windows / The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. - Speed up Sharpen node. Sign in Product The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. If you have another Stable Diffusion UI you might be able to reuse the dependencies. 1; Replace the 1. Added "no uncond" node which completely disable the negative and doubles the speed while rescaling the latent space in the post-cfg function up until the sigmas are at 1 (or really 8GB of ram is barely enough for Windows to run smoothly TBH, and the low vram / model swapping won't help. sampler: euler scheduler: normal. This provides more context for the sampling. GitHub is where comfyui builds software. core. 25% faster) Try using an fp16 model config in the CheckpointLoader node. Write better code with AI Security. the area for the sampling) around the original mask, as a factor, e. Vast. Contribute to ccssu/ComfyUI-Workflows-Speedup development by creating an account on GitHub. in my GPU cloud service, it takes ~40 seconds to launch ComfyUI. Turns out that with It now has a ComfyUI extension: https://github. You can InstantIR to upsacel image in ComfyUI ,InstantIR,Blind Image Restoration with Instant Generative Reference - smthemex/ComfyUI_InstantIR_Wrapper I recently ran into an issue where a ComfyUI change broke a custom node. nvidia. ComfyUI's ControlNet Auxiliary Preprocessors. float16, torch. ; fill_mask_holes: Whether to fully fill any I have to knobble too many features to get it working, and the speed is way to slow. I Hi, is there a chance to speed up the installation process? Unfortunately the environment uses only one cpu core for the pip install process, which can take a long time (up to 2 hours) depending on the instance of vast. sd1. com/blog/unlock-faster-image-generation-in-stable-diffusion-web-ui-with-nvidia-tensorrt/ Is anyone Find your ComfyUI main directory (usually something like C:\ComfyUI_windows_portable) and just put your arguments in the run_nvidia_gpu. 25 votes, 14 comments. The main disadvantage compared to the alternatives I mentioned is that it is relatively slow and VRAM hungry since it requires multiple iterations at high res while Deep Shrink/HiDiffusion actually speed up generation while the scaling effect is active. When I use the single file version of FP8, generating a 1024*1024 graph takes up about 14g of VRAM, with a peak of 31g of RAM; when I use the nf4 version, it takes up about 12. The problem is that everyone has different configurations, and my ComfyUI setup was a mess. 9-8it/s model size ~1. 13 since we are using other things in comfy that require torch 2 it is not possible to activate the speed up is not that great anyway so it is better to deactivate torch compile. The FLUX model took a long time to load, but I was able to fix it. 1 with a larger number like 1. 5 for a faster zooming in speed, or use a smaller number like 1. ; invert_mask: Whether to fully invert the Unless you're planning on running a public server I guess there's not really much information here. It runs with the following attributes to speed things up on a mac: --no-half --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate. 24. Expected Behavior The inference speed and VRAM usage should have remained the same. I update comfyUI on a daily basis. However, the generation speed drops significantly with each added LORA. Note FreeU and PatchModelAddDownscale are now supported experimentally, Just use the comfy node normally. · comfyanonymous/ComfyUI@58c9838 Write better code with AI Code review Follow the ComfyUI manual installation instructions for Windows and Linux. · comfyanonymous/ComfyUI@ae197f6 Contribute to Fannovel16/comfyui_controlnet_aux development by creating an account on GitHub. Since updating on September 3rd, generations have become extremely slow, but I have a suspicion as to why. If you have two gpus this would be a massive s T-GATE could brings 10%-50% speed up for different diffusion models, only slightly reduces the quality of the generated images and maintains the original composition. 05 for a slower zooming in speed. ComfyUI Flux Accelerator can generate images up to 37. Note that --force-fp16 will only work if you installed the latest pytorch nightly. · comfyanonymous/ComfyUI@e0c0029 The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. I've been using ComfyUI for a year or so, and something happened that has caused it to dramatically slow down image generation. 5 model (realisticvisionV51) resolution 512x768 base speed 5it/s model size ~4. It allows you to iteratively change the blocks weights of Flux models and check the difference each value makes The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. 04. When using one LORA, I didnt notice a drop in speed (Q8). 7 gb 60% speed increase. · comfyanonymous/ComfyUI@e0c0029 The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. Hey, i've been trying to run ComfyUI with Flux for the past few weeks and it seems that no matter what I try and which Flux model I try (dev/schnell), I always get very blurred images as seen in the attached image, no matter the parameters I change like the sampler others. 2it/s model size ~1. main Hi, I did several tests with clean installation and perfectly configured env. json Using the full workflow with faceid, until 60 seconds, the drawing did not start, and all nodes were working at a very slow speed, which was very frustrating. sdxl model 768x1024 use_kv_cache: Enable kv cache to speed up the inference seed: A random seed for generating output. HyperTiling increases speed as the image size increases. bat file. · comfyanonymous/ComfyUI@ae197f6 The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface. · comfyanonymous/ComfyUI@58c9838 The most powerful and modular stable diffusion GUI with a graph/nodes interface. Product GitHub Copilot. · comfyanonymous/ComfyUI@58c9838 Thanks to city96 for active development of the node. Navigation Menu Toggle navigation You can also try setting this env variable PYTORCH_TUNABLEOP_ENABLED=1 which might speed things up at the cost of a very slow initial run. 1 is grow 10% of the size of the mask. Sign in Sign up for a free GitHub account to open an issue and contact its maintainers and the community. the area for the sampling) around the original mask, in pixels. Keybind Explanation; Ctrl + Enter: Queue up current graph for generation: Ctrl + Shift + Enter: Queue up current graph as first for generation: Ctrl + Alt + Enter: Cancel current generation: Ctrl + Z/Ctrl + Y: Undo/Redo The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. . · comfyanonymous/ComfyUI@ae197f6 stuck issue sample. Speed up the loading of checkpoints with ComfyUI. How to control video The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. Here are some examples (tested on RTX 4090): 512x512 4steps: 0. I'm talking about after the container spins up. com/gameltb/ComfyUI_stable_fast. The problem was solved after the last update, at least on Q8. 1. · comfyanonymous This is an (very) advanced and (very) experimental custom node for the ComfyUI. float32] to The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Do these command lines work in your training? my average speed in FluxTrain is quite slower than your speed, 3. FWIW, I always use a batch size of 3 as batching offers a reasonable speed boost. Sign in comfyui. Out of curiosity I disabled xformers and used Pytorch Cross attention expecting a total collapse in performance but instead the speed turned out to be the same. bat file with notepad, make your changes, There has been a number of big changes to the ComfyUI core recently which should improve performance across the board but there might still be some bugs that slow After installing the beta version of desktop ComfyUI, I’ve started testing the performance. https://developer. Python 0 GPL CFG the temperature of your oven: this is a thermostat that ensures it is always cooked like you want. You switched accounts on another tab or window. - Try to speed up the test-ui workflow. Perhaps you need to update Forge, Comfyui, and all extensions to the latest version. Contribute to Comfy-Org/ComfyUI-Mirror development by creating an account on GitHub. The only way to fix it was to rollback ComfyUI but that rollback broke other custom nodes. 7 seconds in auto1111 with 512x512 20 steps euler comfy gets me 3 seconds to do same image with same settings, thats half the speed, and its pretty big slowdown from auto1111 Any chance t I'll preface by saying that I updated the GGUF loader and ComfyUI at the same time, so I'm not 100% sure which is to blame. 51s → 0. Even with a higher vram card, everything needs to be backed into system memory. Only with Flux did I notice a deterioration in performance. Sign up for GitHub By clicking “Sign up for GitHub”, you agree 4lt3r3go opened this issue Dec 20, 2024 · 0 comments Open speed . Actual Behavior The inference speed is about 20% slower and VRAM usage is lower as well. ai. However I noticed that generation speed seems to be the same as before (maybe a little slower), but it takes ages to even get to that stage. Skip to content. Steps to Reproduce Used a default SDXL workflow with lora Debug This custom "node" for ComfyUI enables pan navigation of the canvas using the arrow keys, with a customizable pan speed in ComfyUI's Settings, under the "codecringebinge" subsection of the Settings Dialog's left panel how to increase speed GGUF model? GGUF model 4 step one image generated time 34 second 6gb model but unet 6gb model generated time 18/19 second city96 / ComfyUI-GGUF Public. If it isn't let me know because it's something I need Speed up the loading of checkpoints with ComfyUI. in my case, only --highvram works. Launch ComfyUI by running python main. With the arrival of Flux, even 24gb cards are maxed out and models have to be swapped in and out in the image creation process, which is slow. upd. py --force-fp16. 7g of VRAM, with a peak of about 16g of RAM, and both of them are at about the same speed, and the reduction of video memory usage doesn't seem to have been as much as I Comfyui windows portable, fully up to date 13900k, 32 GB ram windows 11 h2 4090 newest drivers works fine with 1. - Speed up TAESD preview. You signed out in another tab or window. Reload to refresh your session. I forgot to mention it in the first post, but of course I updated ComfyUI, the custom node, and Forge before posting this issue. Notifications You must be signed in to change notification settings; Fork 70; Star New issue Have a question about this project? Sign up for a free GitHub account to open an Your question. If you get an error: update your ComfyUI; 15. Install the ComfyUI dependencies. "flux1-dev-bnb-nf4" is a new Flux model that is nearly 4 times faster than the Flux Dev version and 3 times faster than the Flux Schnell version. difference in launch speed of ComfyUI in Local & Cloud Service. 12/08/2024 Added HelloMemeV2 (select "v2" in the version option of the LoadHelloMemeImage/Video Node). · comfyanonymous/ComfyUI@4ee9aad I can confirm, everything false still sees extremely slow save speed. Contribute to Fannovel16/comfyui_controlnet_aux development by creating an account on GitHub. g. Notifications You must be signed in to change notification settings; New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. My PC Specifications: Processor: Intel The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. Contribute to nonnonstop/comfyui-faster-loading development by creating an account on GitHub. ai has gi You signed in with another tab or window. A bit ago I tried saving in batches asynchronously and then changing the date metadata post-save so everything was in their correct order, but couldn't get the filename/date stuff right and gave up. b. Steps to Reproduce GitHub community articles Repositories. js; Search for scale *= 1. XLabs-AI / x-flux-comfyui Public. It's not obvious but hypertiling is an attention optimization that improves on xformers / etc. · comfyanonymous/ComfyUI@ae197f6 Since yesterday using flux-dev causes my OS to lag and stutter permanently and the loading of all models in the workflow is extremely slow. · comfyanonymous/ComfyUI The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. context_expand_factor: how much to grow the context area (i. It can be done without any loss in quality when the sigma are low enough (~1). 7gb 64% speed increase tensorrt dynamic speed 7. Up to 28. Notifications You must be signed in to change notification settings; Fork New issue Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community. But with two or more, the speed drops several times. Sign up for GitHub By clicking “Sign up for GitHub ”, you agree to our Sign in to your account Jump to bottom. · comfyanonymous/ComfyUI@ae197f6 Convert Model using stable-fast (Estimated speed up: 2X) Train a LCM Lora for denoise unet (Estimated speed up: 5X) Training a new Model using better dataset to improve results quality (Optional, we'll see if there is any need for me to do it ;) Continuous research, always moving towards something better & faster🚀 Can be added after any node to clean up vram and memory - T8star1984/comfyui-purgevram kijai / ComfyUI-HunyuanVideoWrapper Public. 5% faster generation speed than normal; Negative weighting; 05. bewe qyjf ypw mytzuh vmzghx njgw dpbkzq mywpj qwm qregd