Tag: benchmarks

3 discussions across 1 post tagged "benchmarks".

AI Signal - April 28, 2026

Qwen 3.6 27B BF16 vs Q4_K_M vs Q8_0 GGUF evaluation r/LocalLLaMA Score: 350

Comprehensive quantization analysis comparing Qwen 3.6 27B across BF16, Q4_K_M, and Q8_0 GGUF formats using HumanEval, HellaSwag, and BFCL benchmarks. BF16 achieved 69.78% average accuracy at 15.5 tok/s using 54GB RAM, while Q4_K_M delivered competitive performance with significantly reduced memory requirements, providing practical guidance for deployment decisions.

#local-models #benchmarks
Showed 4 AI models some abstract Kandinsky-style Pokémon art with no hints r/ArtificialInteligence Score: 836

A creative benchmark testing visual pattern recognition: showing abstract geometric Pokémon art to multiple models without hints. Opus 4.7 (no thinking) got all 4 immediately, GPT-5.5 (no thinking) got 3, Sonnet 4.6 (extended thinking) got 2, while Gemini 3.1 Pro spent 4.5 minutes thinking and incorrectly identified them as Sailor Moon characters.

#benchmarks #vision
Differences Between GPT 5.4 and GPT 5.5 on MineBench r/singularity Score: 365

Benchmark comparison of GPT 5.4 vs 5.5 on MineBench reveals that while official benchmarks showed marginal gains, practical performance improvements were more impressive than expected. The 5.5 family also shows smaller differences between Pro and standard variants, suggesting OpenAI may be achieving similar outputs with less compute.

#benchmarks #llm