Tag: benchmarks
3 discussions across 1 post tagged "benchmarks".
AI Signal - April 28, 2026
-
Comprehensive quantization analysis comparing Qwen 3.6 27B across BF16, Q4_K_M, and Q8_0 GGUF formats using HumanEval, HellaSwag, and BFCL benchmarks. BF16 achieved 69.78% average accuracy at 15.5 tok/s using 54GB RAM, while Q4_K_M delivered competitive performance with significantly reduced memory requirements, providing practical guidance for deployment decisions.
- Showed 4 AI models some abstract Kandinsky-style Pokémon art with no hints r/ArtificialInteligence Score: 836
A creative benchmark testing visual pattern recognition: showing abstract geometric Pokémon art to multiple models without hints. Opus 4.7 (no thinking) got all 4 immediately, GPT-5.5 (no thinking) got 3, Sonnet 4.6 (extended thinking) got 2, while Gemini 3.1 Pro spent 4.5 minutes thinking and incorrectly identified them as Sailor Moon characters.
-
Benchmark comparison of GPT 5.4 vs 5.5 on MineBench reveals that while official benchmarks showed marginal gains, practical performance improvements were more impressive than expected. The 5.5 family also shows smaller differences between Pro and standard variants, suggesting OpenAI may be achieving similar outputs with less compute.