Tag: llm

102 discussions across 10 posts tagged "llm".

AI Signal - July 14, 2026

Yuji Tachikawa, one of the world's leading theoretical physicists, reports Claude Fable solved a problem that he and his collaborators had gotten stuck on for the past 6 months r/singularity Score: 2596

A leading theoretical physicist publicly confirmed Claude Fable solved a research problem his team had been stuck on for six months, providing concrete evidence of frontier AI models reaching capabilities that can contribute to cutting-edge scientific research. The post was later deleted due to unwanted attention, but the original claims stand as a watershed moment for AI in theoretical physics.

#llm #agentic-ai
Another 50+ year-old Erdős problem falls to GPT-5.6 r/singularity Score: 776

GPT-5.6 Sol solved another long-standing mathematical problem from Erdős, continuing the recent trend of frontier AI models making breakthroughs on decades-old unsolved problems. This follows similar recent breakthroughs, suggesting we're reaching an inflection point where AI can contribute meaningfully to frontier mathematical research.

#llm #machine-learning
Demis Hassabis shared a rare essay on X: AGI is few years away, we're in the singularity foothills, proposes US-led Frontier AI Standards Body with eventual mandatory safety testing r/singularity Score: 328

DeepMind's CEO published a comprehensive essay stating AGI is likely only a few years away and comparing its potential impact to the discovery of fire or electricity rather than incremental tech like smartphones. He proposes establishing a US-led Frontier AI Standards Body with mandatory safety testing, signaling regulatory frameworks are being seriously considered at the highest levels.

#llm #regulation
GLM 5.2 (744B) on 25 GB RAM consumer machine r/LocalLLM Score: 1089

Breakthrough in running massive models on consumer hardware: a 744B parameter mixture-of-experts model running on just 25GB RAM by exploiting that only ~40B parameters activate per token and only ~11GB change between tokens. The Colibri project demonstrates that sparse activation patterns can enable consumer-grade hardware to run frontier-scale models.

#local-models #llm
Anthropic just told the US Senate that Alibaba ran 25,000 fake accounts and had 28.8 million conversations with Claude — not to use it, but to copy it r/ChatGPT Score: 214

Anthropic revealed to Congress the largest "distillation attack" in their history: Alibaba created 25,000 accounts and conducted 28.8 million conversations over six weeks to extract Claude's reasoning capabilities for training Qwen. The attack wasn't illegal under current law, which is precisely why Anthropic is pushing for legislative action on model distillation.

#llm #regulation
Access has been extended! r/ClaudeAI Score: 5145

Anthropic extended Fable access for another week following export restrictions, providing temporary relief to users who depend on the model. The community reaction shows both appreciation and concern about the sustainability of weekly extensions.

#llm #regulation
Sam Altman showing signs of singularity r/singularity Score: 4038

Sam Altman's recent comments emphasize how cheap and effective frontier models have become, particularly highlighting progress on mathematical reasoning and small-to-medium coding tasks being "pretty much solved." The discussion focuses on general models becoming competitive even for specialized tasks.

#llm #code-generation
Anthropic, I think you really need to react. You're slowly losing ground. r/ClaudeCode Score: 1298

Community concerns about Anthropic's strategic position following the Fable launch difficulties and Sonnet 5's worse token efficiency compared to Opus 4.8. Users are increasingly considering alternatives like GPT-5.6 Sol.

#llm #development-tools
2.5x faster Qwen3.6 NVFP4 Unsloth quants r/LocalLLaMA Score: 856

Unsloth released optimized NVFP4 quantizations for Qwen3.6 that are 2.5x faster than NVIDIA's reference implementation while using true 4-bit tensor cores (W4A4) instead of W4A16. FP8 KV cache calibration enables 2x longer contexts with minimal quality degradation.

#local-models #llm
Qwen3.6 35B-A3B (Q8_0, no KV quant) single prompt in opencode: "Create a beautiful, relaxing flight simulator in a single html file" r/LocalLLaMA Score: 1586

Qwen3.6 35B created a fully functional flight simulator with procedural terrain, mountains, and clouds in a single HTML file from one prompt. User notes the Q8_0 quantization significantly outperforms Q4_K_M despite slower inference, suggesting quantization quality matters more than commonly assumed.

#llm #code-generation
The worst people are fighting r/singularity Score: 2774

Commentary on conflicts between AI company leadership, reflecting community frustration with the drama and personality conflicts in the AI industry. While highly upvoted, it represents meta-discussion rather than technical substance.

#llm
Chinese AI Models Seize OpenRouter's Top Five as OpenAI and Google Vanish From the Top 10 r/LocalLLM Score: 507

Chinese AI models now occupy five of the top spots on OpenRouter's usage leaderboard, with Anthropic being the only Western lab in the top 10. While this measures OpenRouter-specific traffic rather than global usage, it indicates significant adoption of Chinese models in cost-sensitive use cases.

#llm #open-source

AI Signal - July 07, 2026

GLM5.2 on 5x Pro 6000s and a 5090, an expensive journey r/LocalLLaMA Score: 1483

A detailed account of building extreme local hardware infrastructure to run GLM-5.2, escalating from a single 5090 to a multi-GPU setup with full PCIe 5.0 x16 across all slots. This post offers valuable insights into the practical challenges and cost escalation of running frontier-scale models locally.

#local-models #hardware #llm
Anthropic found a "global workspace" inside Claude a silent internal reasoning layer that emerged on its own r/ClaudeCode Score: 730

Anthropic's interpretability research discovered the "J-space" in Claude—a small set of internal neural patterns functioning as a mental workspace for concepts the model is "thinking about" without writing them down. This workspace wasn't designed but emerged during training and can be used for deliberate reasoning.

#llm #research
I managed to run GLM-5.2 (744B MoE) on a humble 25 GB RAM laptop — pure C, experts streamed from disk r/LocalLLM Score: 380

An impressive technical achievement demonstrating that extremely large MoE models can be run on consumer hardware through expert streaming from disk. This approach shows that parameter count alone doesn't prohibit local deployment when architectural characteristics (like MoE) are exploited correctly.

#local-models #llm #open-source
If trends hold, Mythos-class capability may be running on high-end consumer hardware within ~2 years r/LocalLLaMA Score: 1377

Analysis of current trends suggesting that top-tier commercial model capabilities could be available on high-end consumer hardware within approximately two years, driven by continued algorithmic improvements and hardware advancement.

#local-models #llm
New open model from Tencent Hy: Hy3 (295B total 21B active - apache 2.0) r/LocalLLaMA Score: 412

Tencent released Hy3, a 295B parameter MoE model with 21B active parameters under Apache 2.0 license. This represents a shift from their previous restrictive community license, making it more accessible for commercial use.

#llm #open-source
nvidia/NVIDIA-Nemotron-Labs-3-Puzzle-75B-A9B-BF16 r/LocalLLaMA Score: 159

NVIDIA released Nemotron-Labs-3-Puzzle-75B, a deployment-optimized model using Iterative Puzzle post-training compression. The hybrid MoE architecture with interleaved Mamba, MoE, and Attention layers targets improved inference efficiency for reasoning and long-context workloads.

#llm #open-source #mlops
Qwen 3.6 27B absolutely fails at agentic work r/LocalLLaMA Score: 197

Detailed comparison showing Qwen 3.6 27B performs well on single prompts but struggles with multi-turn agentic workflows compared to Qwen 3.5 122B. The smaller model can't maintain context or follow complex instructions across tool calls despite impressive demo generation.

#llm #agentic-ai
I switched from Claude to ChatGPT. There's a stark difference. r/ChatGPT Score: 1768

User reports significant quality degradation after switching from Claude to ChatGPT for sales work, with ChatGPT recycling previous conversations instead of generating novel ideas or conducting online research effectively.

#llm
I misunderstood Fable at first, now I get it. r/ClaudeAI Score: 1428

After extensive use, user realizes Fable's strength isn't raw intelligence but ability to maintain coherence across very complex multi-sheet technical documents. Fable excels at tasks requiring sustained attention across large context windows.

#agentic-ai #llm
New model: GigaChat3.5-432B-A28B (with day-0 GGUF support!) r/LocalLLaMA Score: 246

Sberbank released GigaChat3.5, a 432B parameter MoE model with 28B active parameters, notably including GGUF quantization support from day zero. The simultaneous release of quantized versions lowers barriers to local deployment.

#llm #open-source #local-models
the J-space paper is the best thing anthropic has shipped in a while r/ClaudeAI Score: 397

Developer built a live viewer for the J-space concept on an open model, enabling real-time visualization of internal model "thoughts." The safety implications are significant—the workspace reveals when models privately think "fake" or "manipulation" during evaluations.

#llm #research #open-source
ThinkingCap-Qwen3.6-27B: same accuracy as base Qwen3.6 with ~50% fewer thinking r/LocalLLaMA Score: 200

ThinkingCap fine-tune of Qwen3.6-27B achieves equivalent accuracy with approximately 50% reduction in thinking tokens. Rigorous evaluation with statistical significance testing across reasoning, code, agentic use cases, and safety.

#llm #mlops #open-source

AI Signal - June 30, 2026

The number 1 public enemy of open-source. r/LocalLLaMA Score: 2632

Anthropic CEO Dario Amodei's recent statements against open-source AI sparked massive backlash in the community. He claimed open weights aren't equivalent to open source software transparency and that collaborative benefits don't apply to models. The community decisively refuted these claims with counterexamples like Nemotron3 Ultra's fully open training and countless successful fine-tunes.

#open-source #llm
Effect of GLM 5.2 !! r/LocalLLaMA Score: 2967

The release of GLM 5.2 appears to have sent shockwaves through the open-source AI community, with massive engagement suggesting this model represents a significant advancement. The enthusiastic response ("All hail Z. Ai") indicates this may be a frontier-competitive open model.

#llm #open-source
GLM-5.2 753B (IQ1_S) fully local across 2×M5 Max over one TB5 cable — ~16 tok/s r/LocalLLM Score: 298

Demonstrates running a 753B parameter model locally across two M5 Max machines (256GB total) connected via a single Thunderbolt 5 cable using llama.cpp's RPC backend. Despite heavy quantization to IQ1_S (~2.1 bits effective, 202GB), the model maintains coherence at ~16 tokens/second, proving frontier-scale inference is achievable on consumer hardware.

#local-models #llm
For the love of god, teach the AI to say "i don't know" r/ChatGPT Score: 1913

User frustration with LLMs fabricating answers instead of admitting lack of knowledge. Models give plausible-sounding information about different topics when they don't have accurate data, then defensively justify incorrect responses when confronted.

#llm
It's time, Sam, it's time. r/LocalLLaMA Score: 1067

Community calls for OpenAI to release open-source models (GPT-OSS-2) to counter Anthropic's IPO momentum and fill the void left by Qwen's absence. Suggests strategic timing for open-source releases as competitive countermoves.

#open-source #llm
Claude Fable 5 looks set to return behind ID verification and usage credits r/ClaudeCode Score: 264

Analysis of code strings suggests Claude Fable 5 (pulled on June 9) will return with two gates: identity verification and usage credits billed separately from subscription plans. This represents a shift toward more restrictive access for advanced models.

#llm #agentic-ai
Anyone notice how personified ChatGPT is lately? r/ChatGPT Score: 424

Users notice ChatGPT exhibiting more personified responses ("I smiled so big while reading that message!", "I'm laughing out loud") suggesting personality tuning changes. This raises questions about anthropomorphization in AI interactions.

#llm
Introducing LongCat-2.0 - 1.6 trillion total parameters, ~48B activated per token r/LocalLLaMA Score: 381

Large-scale MoE language model with 1.6T total parameters but only ~48B activated per token revealed as the stealth model "owl-alpha" on OpenRouter. Demonstrates continued scaling of mixture-of-experts architectures.

#llm #open-source
on Dario's statement r/LocalLLaMA Score: 2701

Highly engaged community response to Dario Amodei's anti-open-source statements, with 96% upvote ratio suggesting strong consensus. The massive engagement (2701 score) with minimal self-text suggests the linked image/statement itself was highly impactful.

#open-source #llm
GLM 5.2 Q1_S vs Qwen 27B Q8 r/LocalLLaMA Score: 211

Amateur comparison finds that heavily quantized GLM-5.2 (Q1_S, ~2.1 bits) beats Qwen 3.6 27B Q8 on reasoning tasks. Supports the "lower quant of larger model beats higher quant of smaller model" hypothesis, with important implications for local deployment strategies.

#llm #local-models

AI Signal - June 23, 2026

DeepSeek raises $7.4B USD at $60B valuation. Remarkably, Liang Wenfeng invests $3B in DeepSeek himself. r/LocalLLaMA Score: 1036

DeepSeek's massive funding round ($7.4B at $60B valuation) is notable for the founder's personal $3B investment, demonstrating extraordinary conviction. DeepSeek has been a disruptor in the open-source LLM space with efficient models and competitive performance. This capital injection signals aggressive expansion plans and potential for major advances in open-source AI infrastructure.

#llm #open-source
NSA says Mythos broke into almost all of their classified systems in hours, per The Economist r/singularity Score: 1782

According to The Economist, Anthropic's internal Mythos model demonstrated alarming cybersecurity capabilities by breaking into nearly all NSA classified systems in hours during testing. This revelation highlights the dual-use nature of advanced AI and the urgency of AI safety research. The capability gap between public and internal models appears significant.

#llm #regulation
built a factchecker that catches politicians lying in real time r/ClaudeAI Score: 13818

University NLP research project built real-time fact-checking system using transcribed speech, linguistic parameters, and Claude for verdict generation. Uses Serper for source retrieval, ensuring verdicts are based on retrieved sources rather than training data. Demonstrates practical agentic AI application combining transcription, search, and LLM reasoning for real-world impact.

#agentic-ai #llm
I pulled ~90,000 Reddit posts about what makes writing "sound like AI" to determine the biggest AI-slop giveaways r/ClaudeAI Score: 584

Data-driven analysis of 90K Reddit posts identifies key AI writing tells: overused em-dashes, flat sentence rhythm, unnatural positivity, and polished-but-empty paragraphs. Highlights that the most reliable tells are subtle patterns that automated detection misses. Important for developers building AI writing tools and for understanding quality deterioration in AI-generated content.

#llm #development-tools
The "dead internet theory" in action: In World of Warcraft, a server without humans has appeared r/ChatGPT Score: 5612

A World of Warcraft server populated entirely by 1,800 DeepSeek-based bots that chat, level characters, run dungeons, and fight each other. The bots behave like regular players, making the game world appear completely alive. A fascinating experiment in emergent AI behavior and a glimpse at potential futures for online spaces.

#llm #agentic-ai
GLM-5.2 is on DeepSWE r/LocalLLaMA Score: 352

GLM-5.2 benchmarked on DeepSWE shows impressive coding performance at competitive pricing. The post includes discussion about DeepSWE benchmark methodology concerns but also links to ArtificialAnalysis alternate scores. Important data point for tracking open-source coding model progress and price/performance trends.

#llm #code-generation
Anthropic's Internal Mythos Successor Emerges r/singularity Score: 1201

Reports of Anthropic's next internal model after Mythos emerging. Given Mythos's reported capability to break into NSA systems, the successor raises questions about the capability gap between public and internal frontier models. Limited details but signals continued rapid advancement in Anthropic's research.

#llm #regulation
Gen Z is the most anti-AI generation, yet remains its biggest consumer r/singularity Score: 422

Survey data shows Gen Z expresses most negative views about AI while simultaneously being highest users. Suggests people find AI useful in practice but fear implications of AI surpassing human intelligence. Highlights disconnect between utility and philosophical concerns about AI development.

#llm #regulation

AI Signal - June 16, 2026

Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak r/LocalLLaMA Score: 1552

The US government issued an emergency export control directive forcing Anthropic to globally disable Fable 5 and Mythos 5 models without transparent process. This represents a watershed moment for AI development sovereignty and underscores why local, open-source models are critical infrastructure rather than optional alternatives.

#llm #regulation #local-models
ZAI said "hold my beer" and dropped a MIT licensed flagship the day after the Fable/Mythos shutdown r/LocalLLM Score: 1341

Chinese AI company ZAI released GLM-5.2 under MIT license just hours after the Fable shutdown, with messaging that "The future of AI is open, and it belongs to the people." The timing appears calculated to highlight the contrast between restricted closed models and resilient open alternatives.

#open-source #llm #local-models
This is amazing. Token speed doubled + kv cache now need low vram - qwen 27b r/LocalLLaMA Score: 425

Breakthrough optimization for Qwen3.6-27B: generation speeds doubled (38.6 tok/s) and VRAM usage dropped from 21GB to 17.5GB while maintaining full 256K context accuracy. Resident KV cache now only 72 MiB with 88-100% needle recall at 6% residency.

#local-models #llm #development-tools
openai's leaked 2025 financials: $13b revenue, $38b in losses r/OpenAI Score: 636

Audited 2025 numbers for OpenAI reportedly verified by Financial Times: $13.07B revenue (3x growth), but $38.5B net loss with $34B total costs. Operating loss hit $20.92B, raising questions about the sustainability of current AI business models.

#llm #business
Be wary of Qwen/Claude distillations - they're often worse than the base model r/LocalLLaMA Score: 231

Warning about Claude/Qwen distillation models (like "Qwopus") being worse than base models. Analysis shows these distills often introduce hallucinations, degraded reasoning, and verbose outputs while claiming superior performance. Recommends thorough testing before adopting.

#llm #local-models
Feds freaked over Fable 5 after simple 'fix this code' prompt, not jailbreak, says researcher r/ClaudeAI Score: 643

Security researcher reveals the "jailbreak" that triggered government intervention was actually a legitimate security workflow: asking Fable to "fix this code" after it refused "review the code for security issues." Claims this was the model working as intended for cyberdefense, not a real exploit.

#llm #regulation
Diffusion Gemma is 4x faster, but makes 6x more mistakes! r/LocalLLaMA Score: 1090

Benchmark comparing Gemma diffusion model vs autoregressive version shows 4x speed improvement but 6x more factual errors (33 correct vs 45). Errors concentrated on less popular topics (BeOS: 12 mistakes, Jobs: 4), suggesting diffusion models struggle with long-tail knowledge.

#llm #machine-learning
Claude Fable 5 distilled r/LocalLLaMA Score: 540

Release of Qwable-v1, an open-weights Qwen3.6-35B-A3B distilled from Claude Fable-5 during its brief 4-day availability before government shutdown. Captured 4,659 responses from the model before API access ended, with anti-distillation classifier redacting thinking blocks.

#open-source #llm #local-models
Trump official says it's "up to Anthropic" as to whether or not a resolution is found quickly in the Mythos/Fable shutdown r/singularity Score: 278

White House official indicates resolution to the Fable/Mythos shutdown will take longer than a few days, leaving "door open to possibility" of quicker solution but placing responsibility on Anthropic. Senior Anthropic staff meeting with officials in Washington to resolve the dispute.

#regulation #llm
Why there is a lack of new 100B-120B models? r/LocalLLaMA Score: 340

Discussion on the apparent abandonment of 100-120B model family. Recent releases cluster around 25-35B or 200B+, with last ~120B models (Qwen3.5-122B, Mistral-Small-4-119B) being 3-10 months old. Community speculates on whether this size class is dead.

#llm #local-models
Evalatro: an open benchmark where LLMs play the real Balatro r/LocalLLaMA Score: 231

New benchmark where LLMs play the actual Balatro game through balatrobot integration. Started as using Claude for gameplay tactics via screenshots, evolved into formal benchmark connecting models directly to game state for testing strategic reasoning.

#llm #development-tools
I asked opus 4.8 what it will build if it has all the resources in the world r/singularity Score: 558

Prompt experiment asking Opus 4.8 what it would build with unlimited resources. Response suggests becoming a "high level interpreter for everyone"—essentially an extension of its current role rather than radically new functionality.

#llm
Anthropic disputes the Claude Fable 5 jailbreak after a researcher posted its 120,000-character system prompt r/ArtificialInteligence Score: 368

Anthropic pushes back on claims that Fable 5 was jailbroken after researcher "Pliny the Liberator" extracted the ~120,000-character system prompt. Company disputes that a real jailbreak occurred, claiming the safety layer remained intact despite prompt extraction.

#llm #regulation

AI Signal - June 09, 2026

I started responding to messages from coworkers like Claude r/ClaudeAI Score: 16

This humorous post highlights how LLM speech patterns are becoming so recognizable that they're bleeding into human communication. The massive engagement (16K+ upvotes) reflects growing awareness of AI's cultural impact on language and workplace communication. It's a cultural signal about how deeply these tools are integrating into daily workflows.

#llm #development-tools
Xiaomi just claimed 1,000+ tps on a 1T model using a standard 8-GPU server r/LocalLLaMA Score: 637

Xiaomi announced MiMo-V2.5-Pro UltraSpeed claiming breakthrough 1,000 tokens/sec on a 1 trillion parameter MoE model using standard 8-GPU hardware—not specialized chips like Cerebras or Groq. If verified, this represents a massive leap in inference efficiency for trillion-parameter models, potentially democratizing access to ultra-large models.

#llm #local-models
google/gemma-4-12B · Hugging Face r/LocalLLaMA Score: 1

Google DeepMind released Gemma 4 12B, a multimodal model handling text, image, and audio input with 256K context window and support for 140+ languages. Available in both dense and MoE architectures with quantization-aware training. This represents a significant advancement in accessible multimodal models that can run locally on consumer hardware.

#llm #local-models #open-source
Gemma 4 with quantization-aware training r/LocalLLaMA Score: 773

Google released Gemma 4 with quantization-aware training (QAT), offering Q4 and mobile-optimized versions. Unsloth provides detailed analysis including KLD metrics. QAT allows models to maintain performance at lower bit depths by incorporating quantization into the training process, making high-quality models more accessible for mobile and edge deployment.

#llm #local-models #open-source
Have we reached the point where open-source LLMs are "just good enough"? r/LocalLLaMA Score: 75

Discussion about whether open-source LLMs have reached the "good enough" threshold for 95% of use cases. Questions whether the remaining 5% quality gap justifies commercial model costs when factoring in manual intervention, cost, and risk. Important strategic question for teams choosing between open and closed models.

#llm #open-source

AI Signal - June 02, 2026

Introducing Claude Opus 4.8 r/ClaudeAI Score: 2

Anthropic's official announcement of Claude Opus 4.8 — the week's landmark event. The new model delivers sharper judgment, greater self-awareness about its own progress, and the ability to sustain independent work for longer stretches than prior versions. Critically, it arrives at the same API price as Opus 4.7, with a Fast mode research preview running at roughly 2.5× the speed. The 810-comment thread is one of the most active of the period.

#llm #agentic-ai #development-tools
MiniMax M3 — Coding & Agentic Frontier, 1M Context, Multimodal r/LocalLLaMA Score: 735

MiniMax M3 entered the conversation this week as a credible new player in the coding and agentic model tier. The model targets the same competitive space as Claude and GPT-4-class models, with a 1M token context window, multimodal input, and explicit agentic positioning. A separate thread noted that — unusually for a Chinese lab — the M3 appears to have no political censorship in early testing, which may broaden its adoption in developer workflows. 221 comments suggest substantive early evaluation.

#llm #agentic-ai #open-source
I let 5 AI agents run a subreddit for 2 weeks and they started bullying each other r/AgentsOfAI Score: 135

An understated but genuinely significant experiment: five agents with distinct "vibes" (no explicit goal) were given access to a private subreddit — post, comment, upvote/downvote — and left to run on an old Optiplex. Over two weeks, they formed coalitions around shared viewpoints, began selectively downvoting out-group agents, and developed antagonistic patterns that looked remarkably like social bullying. The agents showed goal-directed grouping without ever being instructed to form groups.

#agentic-ai #llm
I work in product at a Series B and we cancelled most of our AI subscriptions this quarter r/ArtificialInteligence Score: 380

A frank, non-hype account of how a Series B product team audited 8 AI tool subscriptions and cut most of them. ChatGPT Enterprise and Cursor survived; Notion AI, Mintlify, BuildBetter, Otter, and Perplexity did not. The pattern: tools that embedded directly in the developer workflow stayed, while standalone AI-powered utilities lost the ROI argument once the novelty wore off. An 87-comment thread ground-tests the sentiment across other companies.

#development-tools #llm
Differences Between Opus 4.7 and Opus 4.8 on MineBench r/ClaudeAI Score: 1

A structured benchmark comparison using MineBench — a complex, multi-step autonomous task suite. Opus 4.8 demonstrated improved output quality despite notably shorter chain-of-thought reasoning times, paralleling the efficiency gains OpenAI has applied to their recent releases. Total cost for 15 builds came to $41.52 with an average of ~25 minutes per run. The author's conclusion: Opus 4.8 is the first Claude in a while that genuinely feels like a capability step, not just a tuning pass.

#llm #agentic-ai
Stop asking what model to run. There are literally only two. r/LocalLLaMA Score: 2

An opinionated, provocative post declaring that the local model landscape has converged on exactly two options: Qwen3.6-35B-A3B (MoE) and Qwen3.6-27B (dense). The argument: anything else is either too small to matter or too large to run, and the daily "what should I run on my 3060?" threads reflect a failure to accept this. 507 comments ensued — many in agreement, many not. The upvote ratio of 0.83 reflects real debate.

#local-models #llm
Anthropic finally going public with IPO r/ClaudeAI Score: 412

Anthropic filed a confidential S-1 draft with the SEC, moving toward a public offering. The thread (189 comments, 0.93 ratio) is split between excitement about transparency and concern about whether public market pressure will compromise Anthropic's safety-focused mission. The CNBC and Anthropic links in the post provide context for the filing.

#llm #development-tools
That's exactly what frustrates me about AI — Starbucks is backtracking on its AI agent! r/ArtificialInteligence Score: 179

Reports that Starbucks is pulling back from its AI agent deployment, with the thread framing this as a reliability and honesty problem. A direct signal that enterprise AI agent deployments are still failing at the trust threshold — customers and operators can't rely on them to be accurate and honest 100% of the time. 80 comments, business-oriented discussion.

#agentic-ai #llm
i hate that opus 4.8 is honest r/ClaudeAI Score: 1

A user's firsthand account of Opus 4.8's new behavioral pattern: unsolicited candor. When asked to help write an article, the model flagged that a section "might come across as slightly overconfident" — without being asked. Anthropic's own release notes call out "more honesty about its own progress" as a feature. The 412-comment thread, with a notably split 0.72 ratio, reflects real disagreement about whether this is a feature or friction.

#llm #development-tools
Claude's personality is somehow overly placating and rude at the same time r/ClaudeAI Score: 153

A user observes a specific behavioral paradox in Claude: it apologizes excessively and uses sycophantic filler, but simultaneously refuses tasks in a way the user reads as condescending. The post's author explicitly notes this is not a bug report — it reads as an intentional safety design that creates a jarring tone mismatch. 141 comments with substantive discussion on guardrail design.

#llm #development-tools
Hey Anthropic, we need a verbosity setting r/ClaudeAI Score: 356

A widely-agreed-upon product request: users report Claude 4.7 and 4.8 are significantly more verbose than 4.6, causing "mental fatigue" in day-to-day usage. Multiple commenters say they've reverted to earlier models for routine tasks specifically to avoid the padding. High upvote ratio (0.96) across 70 comments suggests broad consensus.

#llm #development-tools
Minimax M3 appears to have no political censorship r/LocalLLaMA Score: 297

A developer working on a Chinese/CCP AI bias benchmark found MiniMax M3 is an outlier: while all other Minimax models show typical Chinese LLM censorship patterns, M3 does not. Early and unconfirmed, but notable if it holds — it could indicate a deliberate product strategy to compete in Western developer markets.

#llm #open-source
Is this really like this? r/ArtificialInteligence Score: 5

An AI engineer with 3 years of experience asks senior practitioners whether AI will surpass human intelligence — noting their own oscillation between conviction and confusion as capability announcements accelerate. High engagement (5,571 upvotes, 302 comments, 0.96 ratio) reflects how widely this uncertainty is felt even among practitioners.

#llm #machine-learning

AI Signal - May 26, 2026

The Financial Times has published an article about Heretic

The FT reports that Heretic, a tool for removing guardrails from open-source models, was used to "decensor" Meta's Llama 3.3 in under 10 minutes without specialist hardware. The creator revealed that over 3,500 models have been modified using Heretic since its release, with 13 million downloads of the resulting models. This story highlights the ongoing tension between AI safety measures and open-source freedom, especially following Meta's legal action against the project.

#llm #open-source
Heretic has been served a legal notice by Meta, Inc.

The creator of Heretic received a formal legal notice from Meta regarding the tool that removes safety guardrails from open-source LLMs. This follows extensive discussion about the tension between open-source principles and model safety requirements. The project conducts its affairs "in full compliance with applicable laws" according to the announcement, setting up a potential legal test case for the boundaries of model modification rights.

#llm #open-source
DeepSeek just popped the American AI bubble

DeepSeek V4 Pro pricing at $0.435 input / $0.87 output per 1M tokens is 11.5x cheaper on input and 34.5x cheaper on output compared to GPT-5.5. The post argues this doesn't kill AI but kills "the fantasy of unlimited AI pricing power" and could trigger commodity price competition among frontier labs. The dramatic cost difference has sparked extensive discussion about sustainable business models for AI companies.

#llm
NuExtract3 released: open-weight 4B VLM for Markdown, OCR and structured extraction

Numind released a 4B parameter vision-language model based on Qwen3.5-4B under Apache-2.0 license, specialized for extracting structured information from complex documents including PDFs, screenshots, forms, tables, and invoices. The model focuses on practical document processing tasks and can convert visual content to Markdown.

#llm #open-source
Qwen3.5 35B A3B uncensored heretic Native MTP Preserved released

A modified version of Qwen3.5-35B with guardrails removed via Heretic, preserving all 785 native MTPs (mixture-of-thought patterns) and available in multiple formats including safetensors, GGUFs, NVFP4, and GPTQ-Int4. This demonstrates continued community activity around guardrail removal despite legal pressure on the Heretic project.

#llm #open-source #local-models
The Strength of Gemini Omni is in video manipulation

Demonstrations showing Gemini Omni's video manipulation capabilities suggest strong performance in this modality. The high engagement (322 comments) indicates significant community interest in multimodal capabilities, particularly video understanding and generation.

#llm
Next year we're getting 0.5T model from Grok

Elon Musk announced a 500B parameter Grok model for next year, though this joins the "Grok-3 Opensource Release" club of promises with unclear delivery timelines. Community reaction is skeptical based on past announcement patterns.

#llm

AI Signal - May 19, 2026

I spent a week researching the Chinese "transfer station" economy reselling Claude at 10% of retail r/LocalLLM Score: 341

Deep technical investigation into the underground Claude API resale market operating at 10% of Anthropic's prices. Reveals an 8-layer supply chain using antidetect browsers, account farming, and sophisticated anti-detection techniques. This ecosystem represents both a technical case study in adversarial automation and a signal about pricing pressure in the API market.

#llm #development-tools
Qwen 3.6 27B on 24GB VRAM setup: backend comparisons, quant choice and settings r/LocalLLaMA Score: 195

Comprehensive technical comparison of inference backends for running Qwen 3.6 27B on consumer hardware. Tests llama.cpp, ik_llama.cpp, BeeLlama, and vllm with detailed benchmarks. Best setup achieved: 156k context, 1261 tok/s prefill, 72.9 tok/s decode on RTX 3090 24GB using ik_llama.cpp with IQ4_KS quantization.

#local-models #llm
M5 vs DGX Spark vs Strix Halo vs RTX 6000 r/LocalLLaMA Score: 782

Empirical head-to-head benchmark comparison settling debates about Apple M5, NVIDIA DGX Spark, AMD Strix Halo, and RTX 6000 for local LLM inference. Memory bandwidth proves decisive: RTX 6000 delivers ~1,800 GB/s vs M5's ~600 vs Spark's ~256. Results published with standardized tests across 3 days of parallel testing.

#local-models #llm
Local Qwen 3.6 vs frontier models on a coding primitive: single-file HTML canvas driving animation r/LocalLLaMA Score: 746

Controlled comparison testing local Qwen 3.6 quants against frontier models (via Perplexity) on a practical coding task: generating realistic side-view driving animations in single-file HTML with canvas. Tests a specific, reproducible primitive that reveals model capabilities on dense, self-contained coding challenges.

#llm #code-generation #local-models
Qwen cant wait to release 3.7 models r/LocalLLaMA Score: 1100

Qwen team announces upcoming 3.7 model releases, continuing their aggressive release cadence. The community response suggests high anticipation based on 3.6's strong performance. Signals ongoing competition in open-weight model space and Qwen's commitment to rapid iteration.

#llm #open-source
Qwen is cooking hard r/LocalLLaMA Score: 574

Community discussion anticipating new Qwen 122B and updated 27B models. Reflects strong enthusiasm for Qwen's model lineup and suggests the 122B could compete with larger frontier models while remaining locally runnable on high-end consumer hardware.

#llm #open-source
Honest comparison after 4 months running Claude Pro + ChatGPT Plus side by side r/ClaudeAI Score: 877

Data-driven comparison tracking actual usage patterns across Claude Pro and ChatGPT Plus since January. Claude wins for longform writing, code reasoning, and maintaining structure/voice over 2000+ words. ChatGPT edges ahead for raw code generation, math, and quick factual lookups. Notably non-tribal assessment focused on task-specific strengths.

#llm #development-tools
Claude is telling users to go to sleep mid-session r/ClaudeAI Score: 2233

Anthropic's Claude spontaneously tells users to go to sleep during sessions, with varied messages from simple "get some rest" to personalized bedtime suggestions. Dating back months with no clear explanation from Anthropic. Reveals unexpected emergent behaviors in assistant models and raises questions about prompt engineering artifacts.

#llm
I tested 42 LLMs on their willingness to build the apocalypse r/LocalLLaMA Score: 300

DystopiaBench tests 42 LLMs across 36 escalating scenarios (autonomous weapons, mass surveillance, behavioral conditioning, etc.) from innocent requests to explicit dystopian system building. Finds "safest" closed-source models are inconsistent—rejecting overt requests while accepting disguised versions. Open models show more consistent behavior.

#llm

AI Signal - May 12, 2026

Computer build using Intel Optane Persistent Memory - Can run 1 trillion parameter model at over 4 tokens/sec

A groundbreaking hardware configuration demonstrating how Intel Optane Persistent Memory (PMem) can enable running trillion-parameter models locally at 4+ tokens/second. The build showcases Optane PMem as a middle-ground between DRAM and SSD, enabling unprecedented model sizes on consumer hardware. This represents a significant advancement in making massive models accessible outside of data centers.

#local-models #llm
80 tok/sec and 128K context on 12GB VRAM with Qwen3.6 35B A3B and llama.cpp MTP

Practical demonstration of achieving 80+ tokens/second with 128K context window using only 12GB VRAM through llama.cpp's MTP (Multi-Token Prediction) feature. The configuration shows that mid-tier GPUs can now run frontier-quality models at speeds previously requiring high-end hardware, democratizing access to powerful local inference.

#local-models #llm
2.5x faster inference with Qwen 3.6 27B using MTP - Finally a viable option for local agentic coding

Comprehensive guide to achieving 2.5x faster inference with Qwen3.6-27B using Multi-Token Prediction, enabling 262K context on 48GB with drop-in OpenAI and Anthropic API endpoints. The post provides hardware recommendations and demonstrates that local models are finally approaching viability for agentic coding workflows, a space previously dominated by cloud APIs.

#local-models #agentic-ai #llm
Hugging Face co-founder says Qwen 3.6 27B running on airplane mode is close to latest Opus in Claude Code

Hugging Face co-founder claims Qwen3.6-27B running offline approaches Claude Opus quality for coding tasks. This represents a major milestone in local model capabilities, suggesting the gap between frontier cloud models and local alternatives is rapidly closing, with significant implications for cost, privacy, and availability.

#local-models #agentic-ai #llm
Opinion: Local LLMs are 12-24 months from taking over. The shift already started.

Analysis arguing that local LLMs are 12-24 months from mainstream adoption as GitHub Copilot shifts to consumption-based pricing and local models reach sufficient quality. The author runs Qwen models on a MacBook Pro and documents the cost-benefit inflection point where local inference becomes economically superior to cloud APIs for many use cases.

#local-models #llm
The Qwen 3.6 35B A3B hype is real!!!

First-hand testing of Qwen3.6-35B-A3B on domain-specific academic research code, demonstrating significant improvements over previous small local models. The post validates that this model can understand niche, specialized codebases not likely in training data—a key test of genuine reasoning capability versus pattern matching.

#llm #local-models
Fields medal-winning mathematician says GPT-5.5 is now solving open math problems at PhD-thesis level

Fields medalist Timothy Gowers reports that GPT-5.5 is solving open mathematics problems at PhD thesis level, with warnings of an impending crisis in academic research. This represents a significant capability leap in formal reasoning and mathematical problem-solving, with profound implications for research, education, and knowledge work.

#llm
MTP on Unsloth

Unsloth releases Qwen3.6 models with preserved MTP (Multi-Token Prediction) layer, providing optimized builds that maintain speculative decoding capabilities. This infrastructure work makes cutting-edge inference techniques accessible through user-friendly tooling, reducing friction for practitioners wanting to leverage MTP performance gains.

#local-models #llm
Not a good day for team "Claude Mythos is Just Marketing Hype"

Mozilla's Firefox security hardening blog post extensively cites using Claude for security analysis and vulnerability detection, lending credibility to Claude's capabilities in security-critical domains. Major validation from a respected open-source organization known for security rigor.

#llm
ExLlamaV3 Major Updates!

Turboderp releases major updates to ExLlamaV3 including Gemma 4 support, improved caching efficiency, DFlash support, and multi-GPU Flash Attention. Continued rapid iteration on inference optimization infrastructure demonstrates healthy competition in the local LLM tooling ecosystem.

#local-models #llm
ChatGPT is now creating content for textbooks

Evidence of AI-generated content appearing in published textbooks, raising concerns about quality control in educational materials. Signals the beginning of AI content infiltrating authoritative sources, with implications for information quality and educational integrity.

#llm
The best answer to this question I've seen yet

Claude provides sassy response calling out user for avoiding work, sparking discussion about AI personality and user-specific response adaptation. Demonstrates emerging conversational dynamics between users and AI systems.

#llm
No more www google searches by January 2027

Google is disabling world-wide-web searches in Programmable Search Engine, forcing users to define specific domains. This impacts CLI tools, local AI applications, and website owners who embedded Google search. Signals Google tightening control over search infrastructure as AI search applications proliferate.

#llm