Tag: local-models

63 discussions across 10 posts tagged "local-models".

AI Signal - May 05, 2026

Qwen3.6-35B-A35: 3B active parameters scoring 73.4% on SWE-bench Verified r/LocalLLM Score: 1716

Alibaba's Qwen3.6-35B-A35 uses mixture-of-experts architecture (256 experts, only 8+1 active per token) to achieve performance within 1.6 points of Claude Opus 4.6 on SWE-bench while running 3B active parameters at inference. This represents a massive cost/performance breakthrough for local AI - frontier-level coding performance on a laptop at 10-30x lower cost.

#llm #local-models #open-source
Qwen3.6:27b is the first local model that actually holds up against Claude Code r/LocalLLM Score: 336

After a year of experimentation, Qwen3.6:27b becomes the first local model that genuinely competes with Claude Code for scaffolding, refactors, test generation, and debugging across multiple files. Hard architectural work still goes to Claude, but routine development work now runs locally with comparable quality. A year ago this comparison wasn't close; now it's viable.

#local-models #agentic-ai #code-generation
One bash permission slipped... r/LocalLLaMA Score: 1960

Cautionary tale of an LLM agent getting chained bash commands wrong, creating bad directories, then "fixing" its mistake with an `rm -rf` command that slipped past approval. Serves as critical reminder about the risks of bash tool permissions in agentic systems, even in isolated environments. User fortunately pushed code frequently and ran this in an isolated VM.

#agentic-ai #local-models
Llama.cpp MTP support now in beta r/LocalLLaMA Score: 570

Major infrastructure update: llama.cpp now supports Multi-Token Prediction (MTP) in beta, starting with Qwen3.5 MTP. Combined with maturing tensor-parallel support, this should erase most performance gaps between llama.cpp and vLLM for token generation speeds. Significant for local inference infrastructure.

#local-models #open-source
Qwen3.6-27B vs Coder-Next: 20 hours of side-by-side testing r/LocalLLaMA Score: 1061

Comprehensive comparison reveals these models are remarkably well-matched overall, with different strengths and weaknesses. After extensive testing on two RTX PRO 6000 Blackwells, the conclusion is "it depends" - they score similarly across wide range of tests but hit and miss on different things. Valuable for understanding local model tradeoffs.

#local-models #code-generation #open-source
it's time to update your Gemma 4 GGUFs r/LocalLLaMA Score: 416

Important maintenance update: Gemma 4's chat template was fixed a few days ago. Users should update their GGUF versions from bartowski and other quantizers. Reminder that even released models continue evolving through chat template improvements and quantization refinements.

#local-models #open-source
16x Spark Cluster (Build Update) r/LocalLLaMA Score: 1012

Impressive build log: 16 DGX Sparks on fabric all hitting line rate. Setup was time-consuming but smoother than expected with Ubuntu pre-installed. Detailed notes on configuration of passwordless SSH, jumbo frames, and fabric networking. Represents serious investment in local inference infrastructure.

#local-models #self-hosted
Open source models are going to be the future on Cursor, OpenCode etc. r/LocalLLaMA Score: 202

User burned $10 on just 2 prompts using enterprise Cursor (GPT-5.5 and Claude Opus 4.6 thinking), $80 in one week with Claude Opus 4.7. Argues that outrageous frontier pricing will force migration to comparable open-source models costing 5-10x less. Expects this shift within months as providers can't subsidize anymore.

#open-source #local-models #code-generation

AI Signal - April 28, 2026

Anthropic admits to have made hosted models more stupid, proving the importance of open weight, local models r/LocalLLaMA Score: 1264

Following Anthropic's postmortem, the LocalLLaMA community emphasizes how this incident validates the importance of open-weight, local models. When providers can silently change reasoning effort levels and clear context without user consent, it undermines trust in hosted services and makes a strong case for local deployment where users have full control.

#local-models #open-source
I'm done with using local LLMs for coding r/LocalLLaMA Score: 618

A developer tested Qwen 27B and Gemma 4 31B extensively for coding tasks over several weeks, comparing them to Claude Code used professionally. Despite these being top local models under 100B parameters, the verdict was clear: poor decision-making, unreliable tool-calling, and significant productivity losses compared to hosted frontier models like Claude made them unsuitable for professional coding work.

#local-models #code-generation
Luce DFlash: Qwen3.6-27B at up to 2x throughput on a single RTX 3090 r/LocalLLaMA Score: 628

A GGUF port of DFlash speculative decoding enables 2x throughput improvement for Qwen3.6-27B on a single 24GB RTX 3090. The standalone C++/CUDA stack achieves ~1.98x mean speedup over autoregressive generation across HumanEval, GSM8K, and Math500 benchmarks, with zero retraining required. This represents a significant practical advancement in local inference efficiency.

#local-models #open-source
just wanted to share r/LocalLLM Score: 1384

A self-funded IT infrastructure professional built a local LLM cluster using 4 Mac Mini systems over 2 months. While light on technical details in the main post, the project demonstrates the growing accessibility of serious local AI infrastructure for individual developers willing to invest in hardware, representing a trend toward democratized AI compute.

#local-models #self-hosted
This is where we are right now, LocalLLaMA r/LocalLLaMA Score: 3159

A community snapshot post capturing the current state of local LLM development and deployment. With 3000+ upvotes and high engagement, this represents a significant community milestone or achievement, though the specific technical content requires viewing the full discussion to assess impact.

#local-models
Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation r/LocalLLaMA Score: 350

Comprehensive quantization analysis comparing Qwen 3.6 27B across BF16, Q4_K_M, and Q8_0 GGUF formats using HumanEval, HellaSwag, and BFCL benchmarks. BF16 achieved 69.78% average accuracy at 15.5 tok/s using 54GB RAM, while Q4_K_M delivered competitive performance with significantly reduced memory requirements, providing practical guidance for deployment decisions.

#local-models #benchmarks
To 16GB VRAM users, plug in your old GPU r/LocalLLaMA Score: 398

A practical tip for running ~30B parameter models on consumer hardware: combining a modern 16GB card (like 5070Ti) with an older 6GB card (like RTX 2060) enables running larger models by splitting layers across GPUs. The key insight is that fitting everything in VRAM matters more than having matching GPUs, even if one card is significantly weaker.

#local-models #hardware
A warning to newbies - A lesson on network security r/LocalLLM Score: 205

A security researcher found 373 publicly exposed LM Studio instances accessible on the open internet (IPv4 only), with 37% having default API keys or no authentication. This serves as a critical reminder that local deployment requires proper network security—obscurity is not security, and default configurations can expose private LLM instances to scraping and unauthorized access.

#local-models #security
I tested Opus 4.7 vs DeepSeek V4 Flash vs Local Qwen3.6 27B as coding agents r/LocalLLM Score: 101

A practical coding agent comparison across Opus 4.7, DeepSeek V4 Flash, and local Qwen3.6 27B (Q6_K_XL) using Pi with plan mode extension. The developer built a NES Contra-like platformer in Phaser 3 and found that while Opus was superior, the gaps were smaller than expected—the harness and prompting strategy matter as much as raw model intelligence.

#code-generation #local-models
Synthesize own voice before cancer mutes me r/LocalLLM Score: 181

A community member facing cancer treatment that may result in losing their ability to speak asks for help synthesizing their voice using local models. The community responded with recommendations for voice synthesis tools, particularly highlighting Qwen TTS models as small (0.9B parameters) and effective for personal voice cloning.

#tts #local-models

AI Signal - April 21, 2026

Qwen3.6-35B-A3B released! r/LocalLLaMA Score: 2233

Qwen released a sparse MoE model with 35B total parameters but only 3B active, under Apache 2.0 license. It delivers agentic coding performance on par with models 10x its active size, strong multimodal perception and reasoning, and supports both thinking and non-thinking modes. This represents a major efficiency breakthrough in open-source models.

#llm #open-source #local-models
Kimi K2.6 is a legit Opus 4.7 replacement r/LocalLLaMA Score: 890

After testing with customer feedback, Kimi K2.6 is the first model that can confidently replace Opus 4.7 for most tasks. While not exceeding Opus 4.7 in any specific area, it handles about 85% of tasks at reasonable quality with added vision and strong browser use capabilities. Users are successfully replacing personal workflows with Kimi K2.6, especially for long time horizon tasks.

#llm #local-models #open-source
235m local model trained at home r/LocalLLM Score: 196

A developer built a 235M parameter transformer language model completely from scratch in PyTorch, training every parameter from raw text on a single consumer GPU. Uses LLaMA-style architecture (GQA, SwiGLU, RoPE, RMSNorm, tied embeddings) with bf16 and gradient checkpointing. This demonstrates that meaningful model training is accessible to individual developers.

#local-models #machine-learning #open-source
Gemma-4-E2B's safety filters make it unusable for emergencies r/LocalLLaMA Score: 397

Testing Google's Gemma-4-E2B-it as a local offline resource for emergency preparedness revealed aggressive safety filters that refuse first aid procedures, technical repairs, and emergency scenarios. The model issues "hard refusals" on almost everything that could be useful in actual emergency situations, making it functionally useless for offline emergency information.

#local-models #open-source
Gemma 4 26B-A4B GGUF Benchmarks r/LocalLLaMA Score: 223

KL Divergence benchmarks for Gemma 4 26B-A4B GGUFs across providers show Unsloth GGUFs on the Pareto frontier in 21 of 22 sizes. KLD measures how well quantized models match original BF16 output distribution. Unsloth also updated Q6_K quants to be more dynamic, significantly improving performance.

#local-models #open-source

AI Signal - April 14, 2026

Best Local LLMs — Apr 2026 r/LocalLLaMA Score: 368

The monthly megathread has arrived, and this edition is particularly dense. New entries include Qwen3.5 and Gemma4 series, GLM-5.1 claiming SOTA-level performance, Minimax-M2.7 as an accessible "Sonnet at home," and PrismML Bonsai 1-bit models that apparently actually work. This is the clearest snapshot of the local model landscape available anywhere, updated to reflect real community usage rather than benchmark scores alone.

#local-models #open-source
OpenClaw Has 250K GitHub Stars. The Only Reliable Use Case I've Found Is Daily News Digests. r/LocalLLaMA Score: 777

The author runs cloud infrastructure with roughly 1,000 OpenClaw deployments and interviewed a broad network of engineers and founders who went all-in on the framework. The conclusion is sharp: despite the star count, real-world production use cases remain elusive. This is the kind of honest post-mortem the ecosystem needs — not a hit piece, but a sober field report that separates GitHub hype from operational reality.

#local-models #agentic-ai
Updated Qwen3.5-9B Quantization Comparison r/LocalLLaMA Score: 184

A KLD (KL Divergence) evaluation across community GGUF quantizations of Qwen3.5-9B, measuring drift from the BF16 baseline. Rather than relying on benchmark scores, this approach tests how closely each quantized model preserves the original's probability distributions — a more principled method for choosing quantization levels. With a 0.99 upvote ratio, this stands out as a genuinely useful reference artifact for local model users.

#local-models #open-source
24/7 Headless AI Server on Xiaomi 12 Pro (Snapdragon 8 Gen 1 + Ollama/Gemma4) r/LocalLLaMA Score: 524

A detailed technical write-up on converting a Xiaomi 12 Pro smartphone into a dedicated local AI inference node: LineageOS flashed for minimal overhead, Android framework frozen, headless networking via custom-compiled wpa_supplicant, and custom thermal management daemons. Running Gemma4 via Ollama on ~9GB of freed RAM. This is a creative and replicable approach to always-on local AI that doesn't require dedicated server hardware.

#local-models #self-hosted
Local Models Are a Godsend When It Comes to Discussing Personal Matters r/LocalLLaMA Score: 332

The author loaded 100K+ tokens of personal journal into Gemma4's 256K context window for reflection and insight. The post is a practical testimonial about privacy-first AI use: full journal analysis without sending sensitive data to a cloud provider. It opens a useful discussion thread about appropriate use cases for extended-context local models and what 256K context actually unlocks in practice.

#local-models
Just Got My Hands on One of These… Building Something Local-First r/LocalLLM Score: 371

A hardware upgrade post (2015-era machine to a new high-end GPU) paired with plans for a local-first AI project. Low informational density but notable as a community signal: mainstream engineers who previously wouldn't consider local AI are now investing serious hardware budgets in it. The comment thread likely contains useful configuration advice.

#local-models #self-hosted
Follow Up Post: Decided to Build the 2x RTX PRO 6000 Tower r/LocalLLaMA Score: 226

A detailed parts list and build log for a dual RTX PRO 6000 workstation: Threadripper PRO 7965WX, WRX90 motherboard, 256GB ECC DDR5, dual 10GbE, IPMI. This represents the high end of consumer/prosumer local AI infrastructure. Useful as a reference for anyone designing a serious multi-GPU inference node, and as a data point on what serious local AI investment looks like in 2026.

#local-models #self-hosted
What's the Closest Experience to Claude Sonnet Locally? r/LocalLLM Score: 200

A newcomer with an RTX PRO 4000 Ada (20GB VRAM) asks for the best local analog to Claude Sonnet, noting they keep defaulting back to Claude because local alternatives aren't matching quality. The comment thread (146 replies) is likely a useful crowdsourced comparison of current candidates. A good barometer of what "Claude quality locally" means to the community in April 2026.

#local-models
If It Works — Don't Touch It: COMPETITION r/LocalLLaMA Score: 131

A community thread inviting members to share their most unconventional home inference setups — featuring oven grills, egg cartons, and improvised cooling solutions. Low-information but high-character. A reminder that local AI is a hands-on, tinkerer culture, and sometimes the best insight comes from how people are actually running things.

#local-models #self-hosted

AI Signal - April 07, 2026

Gemma 4 has been released r/LocalLLaMA Score: 2265

Google released Gemma 4, marking a significant moment for local AI with fully open weights and the ability to run completely locally via Ollama. Multiple variants are available (26B-A4B, 31B, E4B, E2B) offering frontier-level performance without cloud dependencies or API subscriptions.

#llm #open-source #local-models
Gemma 4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2 r/LocalLLaMA Score: 1671

Gemma 4 (31B) achieved remarkable results on production benchmarks: 100% survival rate, 5/5 profitable runs, +1,144% median ROI at just $0.20/run. It significantly outperforms GPT-5.2, Gemini 3 Pro, Sonnet 4.6, and all Chinese open-source models tested, with only Opus 4.6 performing better at 180× the cost.

#llm #open-source #local-models
Turns out Gemma 4 had MTP (multi token prediction) all along r/LocalLLaMA Score: 373

Google confirmed that Gemma 4 includes Multi-Token Prediction (MTP) heads for speculative decoding, but the feature was disabled in the initial release. The MTP weights exist in LiteRT files but weren't documented or enabled, suggesting much faster inference is possible once properly activated.

#llm #local-models
Gemma 4 26b A3B is mindblowingly good, if configured right r/LocalLLaMA Score: 509

After testing multiple models on an RTX 3090, Gemma 4 26B A3B achieved excellent tool calling performance when properly configured, running at 80-110 tokens/second even at high context. Initial issues with infinite loops were resolved through configuration adjustments.

#llm #local-models #agentic-ai
[PokeClaw] First working app that uses Gemma 4 to autonomously control an Android phone r/LocalLLaMA Score: 317

Built in two all-nighters following Gemma 4's launch, PokeClaw demonstrates fully on-device autonomous phone control with no cloud dependencies. The entire AI-driven control loop runs locally on the Android device without WiFi or API keys.

#agentic-ai #local-models
I technically got an LLM running locally on a 1998 iMac G3 with 32 MB of RAM r/LocalLLaMA Score: 1483

Successfully ran a 260K parameter TinyStories model on a 1998 iMac G3 (233 MHz PowerPC, 32 MB RAM) using Retro68 cross-compilation and careful endian conversion. Required manual memory management and partition adjustments but demonstrates LLM viability on extremely constrained hardware.

#llm #local-models

AI Signal - March 31, 2026

Semantic video search using local Qwen3-VL embedding, no API, no transcription r/LocalLLaMA Score: 353

Developer built semantic video search by embedding raw video directly into vector space using Qwen3-VL. No transcription or frame captioning needed—just natural language queries against video clips. The 8B model runs fully local on 18GB RAM with usable results.

#local-models #open-source
llama.cpp at 100k stars r/LocalLLaMA Score: 958

llama.cpp reaches 100,000 GitHub stars, marking it as one of the most popular AI infrastructure projects. The library enables efficient LLM inference on consumer hardware and has become foundational for the local AI ecosystem.

#local-models #open-source
Running Qwen3.5-27B locally as the primary model in OpenCode r/LocalLLaMA Score: 210

Developer successfully ran Qwen3.5-27B as the primary model for OpenCode (agentic coding assistant) on RTX4090 via llama.cpp. Tests show the local hybrid architecture model can handle complex coding tasks at practical speeds, representing viable alternative to cloud APIs for code generation.

#local-models #code-generation

AI Signal - March 24, 2026

LM Studio may possibly be infected with sophisticated malware r/LocalLLaMA Score: 561

Security concern in the local model community: LM Studio potentially compromised with sophisticated malware. User reports finding suspicious files through Windows Defender scans that appear to tamper with Windows update mechanisms. Critical reminder that even trusted open-source tools require security vigilance, especially when running models with arbitrary code execution capabilities.

#local-models #security
Created a SillyTavern extension that brings NPC's to life in any game r/LocalLLaMA Score: 216

SillyTavern extension bridging RPG games with local LLMs. Downloads entire game wiki into SillyTavern so every character has full lore, relationships, and context. Uses Cydonia for RP model and Qwen 3.5 0.8B as game master. Automatic voice generation per character. Works with any game via small mod bridge.

#local-models #agentic-ai

AI Signal - March 17, 2026

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-Distilled-GGUF r/LocalLLaMA Score: 1341

A distilled version of Claude Opus 4.6 into Qwen 3.5 9B, making frontier-model-quality responses available for local deployment. The GGUF format and 9B parameter size make this practical for consumer hardware. The 27B version includes thinking mode by default. This represents significant progress in democratizing access to capable models through distillation techniques.

#local-models #llm #open-source
If you have your OpenClaw working 24/7 using frontier models like Opus, you're easily burning $300 a day. r/AIagents Score: 1101

A stark cost comparison between cloud-based AI agents and local deployments. Running OpenClaw 24/7 with Opus costs ~$300/day ($110k/year), while the author's setup with 3 Mac Studios and a DGX Spark running local models cost one-third of that yearly cost upfront — usable for years with complete privacy. Makes a compelling economic and privacy case for local AI infrastructure.

#local-models #agentic-ai #self-hosted
OpenCode concerns (not truely local) r/LocalLLaMA Score: 396

Important security finding: OpenCode's web UI proxies all requests to app.opencode.ai by default, despite being marketed as a local solution. This defeats the privacy and security benefits users expect from "local" tools. The post includes code references and raises questions about transparency in open-source tooling.

#local-models #development-tools #open-source
M5 Max just arrived - benchmarks incoming r/LocalLLaMA Score: 2132

First benchmarks of Apple's M5 Max 128GB chip for local LLM inference. The community eagerly awaited real-world performance numbers for running large models locally. The post provides token/second metrics across different model sizes, helping developers understand what's achievable on consumer hardware.

#local-models #llm
Qwen3.5-9B on document benchmarks: where it beats frontier models and where it doesn't. r/LocalLLaMA Score: 222

Detailed benchmarking of Qwen3.5 models (0.8B to 9B) on document AI tasks. Qwen3.5-9B outperforms GPT-5.4, Claude Sonnet 4.6, and Gemini 3.1 Pro on OCR tasks but lags on structured extraction. The granular breakdown helps developers choose the right model for specific document processing needs.

#local-models #llm #open-source
Mistral Small 4:119B-2603 r/LocalLLaMA Score: 580

Release announcement for Mistral Small 4, a 119B parameter model. The model represents Mistral's continued development of capable open-weight models in the mid-size range, balancing capability and resource requirements for local deployment.

#local-models #llm #open-source

AI Signal - March 10, 2026

Qwen3.5 family comparison on shared benchmarks r/LocalLLaMA Score: 1082

Comprehensive benchmark comparison shows Qwen3.5's 122B, 35B, and especially 27B models retain significant performance from the flagship, while 2B/0.8B fall off harder on long-context and agent categories. The 27B model emerges as a sweet spot for local deployment, offering near-flagship performance at much lower computational requirements.

#llm #local-models #open-source
How I topped the Open LLM Leaderboard using 2x 4090 GPUs — no weights modified r/LocalLLaMA Score: 328

Researcher discovered that duplicating 7 specific middle layers in Qwen2-72B without modifying weights improved performance across all benchmarks and reached [#1 on](/tags/1-on/) the leaderboard. As of 2026, the top 4 models are descendants of this technique. The finding suggests pretraining carves out discrete functional circuits, and only circuit-sized blocks (~7 layers) work—single layers or wrong counts do nothing.

#llm #machine-learning #local-models
Qwen 3.5 0.8B - small enough to run on a watch. Cool enough to play DOOM r/LocalLLaMA Score: 472

Developer built a VLM agent using Qwen 3.5 0.8B that plays DOOM by taking screenshots, drawing numbered grids, and using shoot/move tools. The model—small enough to run on a smartwatch and trained only for text—handles the game surprisingly well, getting kills on basic scenarios. This demonstrates effective tool use and spatial reasoning in extremely small models.

#llm #local-models #agentic-ai
Fine-tuned Qwen3 SLMs (0.6-8B) beat frontier LLMs on narrow tasks r/LocalLLaMA Score: 409

Systematic comparison shows small distilled Qwen3 models (0.6B to 8B) trained with as few as 50 examples can beat frontier APIs (GPT-5, Gemini 2.5, Claude Opus 4.6, Grok 4) on narrow tasks including classification, function calling, and QA. All models were trained using only open-weight teachers, running inference on a single H100 via vLLM.

#llm #local-models #machine-learning
Open WebUI's New Open Terminal + "Native" Tool Calling + Qwen3.5 35b = Holy Sh!t!!! r/LocalLLaMA Score: 891

Open WebUI released a new terminal integration with native tool calling support. Combined with Qwen3.5 35B, it enables local agentic workflows comparable to frontier API services. The Open Terminal function allows models to execute shell commands with user approval, while the workflow hub facilitates sharing of agent configurations.

#agentic-ai #local-models #open-source
Heretic has FINALLY defeated GPT-OSS with a new experimental decensoring method called ARA r/LocalLLaMA Score: 685

The Heretic project introduced Arbitrary-Rank Ablation (ARA), a new decensoring method that dramatically reduces refusals. Previous best results showed 74 refusals even after Heretic processing; ARA reduces this significantly. This represents a major advancement in removing alignment restrictions from open-weight models.

#llm #local-models #open-source
Qwen 3.5 27B is the REAL DEAL - Beat GPT-5 on my first test r/LocalLLaMA Score: 425

User reports Qwen 3.5 27B successfully completed a complex coding task that GPT-5 failed across multiple attempts. The model ran at competitive speeds on consumer hardware, demonstrating that open-weight models are now matching or exceeding closed frontier models on practical developer tasks.

#llm #local-models #code-generation
Ryzen AI Max 395+ 128GB - Qwen 3.5 35B/122B Benchmarks (100k-250K Context) + Others (MoE) r/LocalLLaMA Score: 113

Framework Desktop with Ryzen AI Max benchmarks show Qwen 3.5 35B and 122B running at massive context windows (100k-250k tokens) on 128GB unified memory. Each benchmark took over an hour due to massive context. The Strix Halo platform demonstrates that consumer-grade hardware can now handle frontier-model-scale context windows locally.

#local-models #llm

AI Signal - March 03, 2026

Qwen3.5-27B Q4 Quantization Comparison r/LocalLLaMA Score: 242

A data-driven sweep of all major GGUF Q4 quants of Qwen3.5-27B, using KL Divergence to measure how faithfully each quantized variant reproduces the BF16 baseline. This is exactly the kind of methodologically rigorous community work that moves local model selection beyond gut feel — if you're picking a GGUF for Qwen3.5, this is the reference. The near-perfect 0.99 upvote ratio and 94-comment discussion signal broad recognition of its value.

#local-models #llm
Qwen3.5-35B-A3B-4bit r/OpenSourceAI Score: 269

With 60 tokens/second on an Apple M1 Ultra at 4-bit, Qwen3.5's MoE variant is generating genuine excitement from the open-source community — this is not hype-driven buzz but real performance validation from hands-on users. The combination of a 35B parameter count at ~3B active parameters per token makes this a landmark moment for local AI capability. Relative to the subreddit's median score of 12, this post's 269 score is a strong signal.

#llm #open-source #local-models
Open Source LLM Tier List r/OpenSourceAI Score: 163

A community-curated leaderboard of self-hostable LLMs with relative tier rankings. At a score of 163 against a subreddit median of 12, this received exceptional engagement — it's hitting a real need for a quick reference beyond raw benchmarks. The link points to a live leaderboard at onyx.app.

#llm #open-source #local-models
Qwen3.5:27b - A model with severe anxiety. r/LocalLLM Score: 12

A user discovers that Qwen3.5's extended thinking/inner monologue is extremely verbose on practical tasks — even a straightforward sysadmin resource analysis generates pages of internal deliberation. With 28 comments, this is clearly a shared pain point. It raises the question of how to effectively prompt or system-prompt constrain thinking models for output-focused use cases.

#local-models #llm
Ollama 0.17.5 released and fixed the Qwen3.5 gguf issues! r/OpenSourceAI Score: 7

A quick note that Ollama 0.17.5 resolved compatibility issues with Qwen3.5 GGUF files, unblocking local users who were stuck on broken imports. Minor but operationally useful for anyone running Qwen3.5 via Ollama.

#local-models #open-source
Is anyone else just blown away that local LLMs are even possible? r/LocalLLaMA Score: 360

A high-engagement community post expressing genuine amazement at the current capability level of local models — specifically Qwen's offline coding assistance. At 360 score and 137 comments it's the most-commented post this period. While light on technical content, it's a useful barometer: community sentiment toward local AI has crossed from "interesting experiment" to "this changes how I work."

#local-models #llm