Tag: image-generation

37 discussions across 10 posts tagged "image-generation".

AI Signal - July 14, 2026

Local Image to 3D (<2gb RAM, <20s, Apple Silicon, iPhone) r/LocalLLaMA Score: 850

Swift-mlx port of Hunyuan3D enabling image-to-3D generation on Apple Silicon in under 20 seconds using less than 2GB RAM, even running on iPhones. Represents significant progress in making 3D generation accessible on consumer devices.

#local-models #image-generation
I spent weeks optimizing Krea 2 & LTX 2.3 workflows—here they are for free r/StableDiffusion Score: 653

Community member shared optimized workflows for Krea 2 and LTX 2.3 image/video generation, providing free access to weeks of experimentation. Demonstrates the collaborative knowledge-sharing culture around open-source generative models.

#image-generation #open-source
I benchmarked every Krea 2 Turbo checkpoint format in ComfyUI - BF16 vs FP8 vs INT8 ConvRot vs MXFP8 vs NVFP4 (150 matched images) r/StableDiffusion Score: 266

Comprehensive benchmark of Krea 2 quantization formats showing INT8 ConvRot provides the best quality/speed tradeoff on consumer GPUs, outperforming both NVIDIA's NVFP4 and higher-precision formats. Rigorous methodology with 150 matched images across perceptual, semantic, and latent measurements.

#image-generation #local-models

AI Signal - July 07, 2026

I created a node for Krea2 that adds Multi-LORA support with no identity bleeding and per region bounding box control like Ideogram 4 r/StableDiffusion Score: 215

A custom ComfyUI node for Krea2 enables multiple character LoRAs in a single image with bounding-box control, preventing identity bleeding. This brings Ideogram 4-style regional prompting to Krea2.

#image-generation #open-source
SesquiLSR: tiny 1-2x learned latent upscaler for Flux2, Anima, SDXL and more r/StableDiffusion Score: 236

A tiny, fast latent upscaler offering arbitrary scale upscaling as an alternative to bilinear/bicubic for multiple model architectures. The ComfyUI implementation targets improved quality over traditional upscaling methods.

#image-generation #open-source
New Face Id lora seems to be great r/StableDiffusion Score: 283

New LTX Face ID LoRA from alissonerdx retains identity from close-up reference images, addressing one of the key challenges in consistent character generation across video models.

#image-generation
Character Loras with Krea2 (again) r/StableDiffusion Score: 117

After additional testing, user completely reverses earlier position and finds Krea2 matches or exceeds Ideogram and Z Image for character accuracy with proper LoRA training. Shares training settings that work well.

#image-generation

AI Signal - June 30, 2026

VNCCS 3.0 Has been released! r/StableDiffusion Score: 783

Complete rebuild of VNCCS, a ComfyUI extension, with so many changes it's effectively a new project. Represents continued innovation in the Stable Diffusion ecosystem, making complex workflows more accessible.

#image-generation #open-source
So is INT8-ConvRot the new hot thing? r/StableDiffusion Score: 129

ComfyUI's stable branch added native INT8 support, with claims that ConvRot quantization beats FP8 variants on speed/quality metrics while supporting wider GPU compatibility (2xxx-5xxx NVIDIA cards). This could democratize access to larger image generation models.

#image-generation
Bring the rotten tomatoes r/StableDiffusion Score: 541

Community reaction to Dario Amodei's anti-open-source stance, with calls to download and archive models while they remain available. Reflects concern that open-source image models may face restrictions.

#open-source #image-generation
Krea 2 vs Z-Image Turbo r/StableDiffusion Score: 167

Side-by-side comparison of Krea 2 and Z-Image Turbo image generation models at 2MP resolution, providing practical insight into model quality differences for practitioners evaluating which to use.

#image-generation

AI Signal - June 23, 2026

Krea 2 Turbo — Native ComfyUI Workflow + FP8 Weights (12GB, Drag & Drop) r/StableDiffusion Score: 373

Krea 2 now has native ComfyUI support built-in with FP8 quantized weights (24.76GB → 12.01GB). Careful quantization preserving critical layers while compressing weight matrices to float8_e4m3fn format. Makes high-quality image generation accessible on more modest hardware configurations.

#image-generation #open-source
As promised Krea 2 Turbo + "Raw" Quantized in FP8, MXFP8, NVFP4, INT8 and Convrot INT8! r/StableDiffusion Score: 202

Community member released Krea 2 (Base & Turbo) quantized in multiple formats (FP8, MXFP8, NVFP4, INT8, ConvRot INT8) for different GPU tiers. Includes detailed comparison of Raw vs Turbo models and quantization tradeoffs. Demonstrates active open-source optimization ecosystem around new image models.

#image-generation #open-source
LTX-2.3 Water Sim LoRA flooding the Joker stairs (v2v test) r/StableDiffusion Score: 893

Demonstration of LTX-2.3 water simulation IC-LoRA applied to famous Joker stairs location. Wide shots work well, close-ups more challenging. Shows progress in specialized LoRA for physics simulation in video models, potentially useful for VFX and creative applications.

#image-generation #machine-learning

AI Signal - June 16, 2026

How far away are we from feature-length AI films? I made this trailer in one week for under $100 r/ChatGPT Score: 832

Creator produced a 4K film trailer in one week for under $100 using Seedance 2.0, Runway, ElevenLabs, Adobe Premiere, and ChatGPT. Demonstrates the accessibility of AI filmmaking tools for independent creators with minimal budgets.

#image-generation #tts
Quick SCAIL-2 test in ComfyUI r/StableDiffusion Score: 588

Demonstration of SCAIL-2 animation in ComfyUI using Z-Image Turbo character LoRA and TikTok dance clip as motion reference. Created helper node for longer clips to reduce identity drift. Workflow available, showcasing local animation capabilities.

#image-generation #local-models
Nothing but Prompts. Ideogram 4 Has Scary Control r/StableDiffusion Score: 290

Recreation of iconic 1980s horror posters using only Ideogram 4 prompts and bounding boxes—no image reference, controlnets, or LoRAs. Demonstrates impressive compositional control available through prompting alone in newer image generation models.

#image-generation

AI Signal - June 09, 2026

Ideogram 4.0's Understanding of Characters and IP is Crazy for an Open Model r/StableDiffusion Score: 835

Ideogram 4.0 demonstrates exceptional character and IP knowledge without LoRAs, running locally in ComfyUI at 1.5 megapixels. Initial workflow issues and safety filters have been resolved, making it one of the most capable open image generation models. Generated at 1440x1024 using INT8 versions on consumer hardware.

#image-generation #open-source
I did not expect this quality from local so soon r/StableDiffusion Score: 704

Ideogram 4 running locally on RTX 3060 12GB with 64GB RAM producing high-quality results at ~80 seconds per 1MP image. Demonstrates that cutting-edge image generation is now viable on consumer hardware with careful optimization and cherry-picking.

#image-generation #local-models
Ideogram 4 isn't overhyped, it's underrated r/StableDiffusion Score: 299

Defense of Ideogram 4 as the closest open model to commercial quality (NB/GPT Image), surpassing recent releases like Ernie, MS Lens, and HiDream. Author emphasizes this is the first model since Z-Image to genuinely impress, suggesting it represents a quality tier shift for open image models.

#image-generation #open-source
How to bypass Ideogram 4's "Image blocked by safety filter" for swimwear/beachwear (Understanding the filter mechanics) r/StableDiffusion Score: 176

Technical analysis of Ideogram 4's safety filter mechanics with methods to bypass for legitimate use cases like swimwear/beachwear photography. Demonstrates how subtle prompt and parameter adjustments can work around overly aggressive filtering while staying within acceptable use.

#image-generation
Tried some 17MP ideogram 4 images for fun r/StableDiffusion Score: 100

Experimenting with 17-megapixel Ideogram 4 generations taking 10-15 minutes per image. Demonstrates the model's capability at very high resolutions, though composition is hard to predict until deep into generation. Uses Qwen3.6-35B for prompt engineering.

#image-generation #local-models
Ideogram 4: a solution for removing the annoying censorship has been found. r/StableDiffusion Score: 267

Two methods discovered to bypass Ideogram 4's safety filter: shifting first sigma step by +0.005 or +0.01, or using a custom preset with adjusted sigma values. Both methods work by slightly moving the starting point of the diffusion trajectory away from what triggers the filter.

#image-generation
Photanima v2.1 showcase. Each image takes about 2 seconds to generate. r/StableDiffusion Score: 297

Anima 2B model fine-tune (Photanima v2.1) generating quality images in ~2 seconds. Demonstrates exceptional speed and prompt adherence for a 2B model, showing the potential of small, specialized models for specific use cases.

#image-generation
Lodestone is thinking about training ideogram! Prove him it's a good idea! r/StableDiffusion Score: 191

Community discussion encouraging Lodestone (creator of Chroma) to create a fine-tune or variant of Ideogram 4. Reflects community desire for specialized variants of the new base model to address specific use cases and aesthetic preferences.

#image-generation #open-source

AI Signal - June 02, 2026

Nvidia releases Cosmos3-Super-Image2Video — 64B parameters r/StableDiffusion Score: 404

Nvidia dropped a 64B parameter image-to-video model (Cosmos3-Super-Image2Video) on Hugging Face. The near-perfect 0.98 ratio and 132 comments indicate genuine excitement in the image generation community. At 64B parameters, this is a significant resource requirement for local inference but represents a meaningful step in open video generation capability.

#image-generation #open-source
Does anyone else can't stand ComfyUI and prefers classic Automatic/Forge UI? r/StableDiffusion Score: 225

A user frustrated with ComfyUI's node-graph complexity asks for alternatives. The 265-comment thread surfaced SwarmUI (Automatic-style front end over ComfyUI) and Forge Neo as active, maintained alternatives. Represents an ongoing developer experience split in the image generation community: power users favor ComfyUI's programmability; others want the simpler form.

#image-generation #development-tools

AI Signal - May 26, 2026

Nvidia solved VAE? Fast and High-Resolution Latent Decoding with Pixel Diffusion

NVIDIA's Pixel Diffusion (PiD) approach treats latent-to-image decoding as conditional pixel diffusion, combining decode and upscale into one step. This addresses long-standing quality issues with VAE decoding in diffusion models and could significantly improve image generation quality and speed.

#image-generation #open-source
I made an Anima AI Character & Artist search engine with 49,000 sample images

A community member built a searchable database of 49,000 sample images to explore character knowledge and artistic styles in the Anima Base model. The tool allows searching by characteristics beyond just names, making it practical to discover which characters and styles work out-of-the-box with the model.

#image-generation #development-tools
Reconstructing different angles from live footage

4D Gaussian Splatting converts flat images into three-dimensional spatial data, enabling reconstruction of different camera angles from single-viewpoint footage. This technology has implications for video editing, sports broadcasting, and virtual environments.

#image-generation
ComfyUI node for NVIDIA PiD pixel diffusion decoding

Community member created a ComfyUI node implementing NVIDIA's Pixel Diffusion decoder, making the research practical for image generation workflows. Supports multiple backbone models including Flux, SD3, and DINOv2 with auto-download of checkpoints.

#image-generation #development-tools

AI Signal - May 19, 2026

Lance by ByteDance: 3B Apache2 model for image and video understanding, generation, and editing r/StableDiffusion Score: 337

ByteDance releases Lance, a 3B parameter unified multimodal model supporting image/video understanding, generation, and editing. Apache 2.0 license, trained from scratch. Demonstrates strong performance across generation, editing, and video benchmarks despite small size.

#image-generation #open-source
bytedance released an open source model that attempts to do just about anything with only 3b parameters r/LocalLLaMA Score: 279

Duplicate coverage of ByteDance's Lance model emphasizing its unified architecture for image/video understanding, generation, and editing in 3B parameters. Community excited about Apache 2.0 licensing enabling commercial use and local deployment.

#image-generation #open-source #local-models

AI Signal - May 12, 2026

Animation is solved. This is like Pixar level quality.

Video showcasing AI-generated animation with claims of Pixar-level quality, generating significant discussion about the state of AI video generation. While hyperbolic, demonstrates continued progress in video quality and coherence, though still far from replacing production animation pipelines.

#image-generation
A new video model "Omni" from Google is leaked, user notes text coherence

Leaked Google "Omni" video model shows improved text coherence in generated videos, a long-standing weakness of video generation models. If validated, represents meaningful progress toward text-accurate video generation, important for practical applications requiring readable text.

#image-generation
Flux.2-Klein pipeline for real-time webcam stream processing in 30 FPS

Open-source pipeline achieving real-time video stream processing at 30 FPS with ~0.2s latency on RTX 5090, using Flux.2-Klein-4B with custom spatial-aware KV-cache that only recomputes changing regions. Demonstrates significant progress toward real-time image generation use cases.

#image-generation #open-source
HiDream-O1-Image - A pixel space model, no need for VAE, 8B parameters

Novel image generation architecture working directly in pixel space without VAE, using Pixel-level Unified Transformer (UiT). 8B parameter model that natively encodes raw pixels, eliminating VAE-related artifacts and simplifying the generation pipeline.

#image-generation #open-source