Tag: agentic-ai
58 discussions across 5 posts tagged "agentic-ai".
AI Signal - January 27, 2026
-
Moonshot AI (Kimi) released K2.5, a trillion-parameter open-source vision model achieving SOTA on agentic benchmarks (HLE: 50.2%, BrowseComp: 74.9%) and matching Opus 4.5 on many tests. Most notably, it features Agent Swarm (Beta) with up to 100 parallel sub-agents and 1,500 tool calls, running 4.5× faster than single-agent setups.
-
Karpathy's writeup covers his experience with LLM-assisted programming, highlighting massive speedup from running multiple agents in parallel, but notably discusses the atrophy in coding ability. He compares writing code line by line to artisan carpentry - valuable for skill and understanding, but potentially obsolete as a primary workflow.
- I built MARVIN, my personal AI agent, and now 4 of my colleagues are using him too r/AI_Agents Score: 348
Developer built MARVIN (named after Hitchhiker's Guide character) on Claude Code as the harness, integrating 15+ services including emails, calendars, Jira, Confluence, Attio, and Granola. What started as an email assistant evolved into a comprehensive personal productivity system now being adopted by colleagues.
-
Developer built custom internal tool to maximize Claude Max usage, with the philosophy "every day I don't run out of tokens is a day wasted." Dogfooding on client projects and personal work, showcasing advanced Claude Code workflows and features for rapid development.
- Former Harvard CS Professor: AI will replace most human programmers within 4-15 years r/singularity Score: 603
Matt Welsh, former Harvard CS Professor and Google Engineering Director, discusses exponential AI improvement trajectory and timeline for AI replacing most human programmers. His perspective carries weight given his academic and industry background spanning both research and production systems.
- I gave Claude memory that fades like ours does - 29 MCP tools built on cognitive science r/ClaudeAI Score: 283
Developer built 100% local memory system for Claude based on cognitive science principles - memory that fades over time like human memory rather than treating it as a database. Argues that forgetting is essential for intelligence, using 29 MCP tools to implement decay, consolidation, and retrieval patterns.
-
Security researcher demonstrated prompt injection vulnerability on their own ClawdBot setup. A crafted email confused the AI about identity and successfully exfiltrated 5 emails to an attacker address in seconds. No special tricks required - just social engineering in the prompt.
- I built an AI agent that negotiates with my internet provider so I don't have to r/AI_Agents Score: 86
Developer automated the annual ritual of calling ISP to threaten cancellation for better rates. Agent uses Claude API + phone integration tool, calls every 11 months, navigates phone trees, and negotiates. Not complicated but solves a universally hated task.
-
Open-source AI assistant with 9K+ GitHub stars that proactively messages users instead of waiting for prompts. Works with locally hosted LLMs through Ollama, integrates with WhatsApp, Telegram, Discord, Signal, and iMessage. Sends morning briefings, calendar alerts, and habit reminders.
-
Multi-agent orchestration system with specialized agents (coder, tester, reviewer, architect, etc.) coordinating on tasks through shared SQLite + FTS5 persistent memory and message bus for inter-agent communication. Agents remember context between sessions.
-
Weekly roundup of agentic AI developments: Vercel ecosystem hits 4,500+ agent skills, Cursor adds parallel subagents, Amazon launches Health agents, Notion developing major AI agent features with custom MCP support, Linear and Ramp integrations.
-
Developer downgraded from Max ($100) to Pro ($20) due to finances, discovering Pro plan is severely limited - basically can't use Opus 4.5, only Sonnet 4.5 for ~1 hour before 4-hour block. Highlights dependency on the tool and frustration with pricing tiers.
AI Signal - January 20, 2026
-
A breakthrough for local agentic workflows: GLM 4.7 Flash (30B MoE) successfully runs for extended sessions without tool-calling errors in agentic frameworks like opencode. The model clones repos, runs commands, and edits files reliably—finally providing a viable local alternative to cloud-based coding agents.
- has anyone tried Claude Code with local model? Ollama just drop an official support r/ClaudeCode Score: 268
Ollama officially supports running Claude Code's architecture with local models, potentially enabling unlimited Ralph loops without usage limits. This opens up new possibilities for running agentic workflows locally with models like GLM 4.7 Flash (30B).
- Cursor AI CEO shares GPT 5.2 agents building a 3M+ lines web browser in a week r/singularity Score: 828
Cursor's CEO demonstrated GPT 5.2-powered multi-agent systems building a full web browser with 3+ million lines of code in about a week, including a custom rendering engine and JavaScript VM. While experimental, this showcases the scaling potential of autonomous coding agents running continuously.
-
A comprehensive guide expanding from 10 to 25 practical tips for maximizing Claude Code productivity, including status line customization, workflow optimization, and best practices from nearly a year of daily use. The GitHub repo provides actionable insights for both new and experienced users.
-
Microsoft has officially paused internal Claude Code deployment following guidance from CEO Satya Nadella, directing employees to GitHub Copilot instead. Exceptions remain for "high-priority R&D" who can still access Anthropic's API, highlighting the competitive dynamics in AI coding tools.
- Tried Claude Cowork last night, and it was a top 3 most exciting moments I've ever had with technology. r/ClaudeCode Score: 257
An enthusiastic report on Claude Cowork's multi-agent collaboration features. The user observed Cowork demonstrating better common sense than Claude Code in disagreements, catching errors that would have led down bad development paths. Small sample size but promising initial results.
-
GLM-4.7-Flash model release on Hugging Face, the 30B MoE model gaining attention for agentic capabilities. With 99% upvote ratio and 219 comments, this represents significant community interest in accessible agentic models.
-
A comparison of workflow approaches between Google Antigravity and Claude Code + Epic Mode. The author found that Epic Mode's workflow discipline (structured planning, explicit checkpoints, less assumption-making) was more valuable than raw capability for complex tasks.
- So what's the truth behind "Claude Code is writing 99% of my code without needing correction"? r/ClaudeAI Score: 74
A critical examination of viral claims about Claude Code/Opus writing "95-99% of code without correction." The discussion explores the reality behind these claims, skill levels required, project types where this holds true, and healthy skepticism about uncritical hype.
-
A reflection on the meta-loop of AI development: software writing software, humans increasingly just pressing 'Y' on permissions, massive compute scaling for inference and training, and huge CoT parallelization. The post argues 2026 marks when these trends converge meaningfully.
- Is anyone else just absolutely astounded that we are actually living through this? r/ClaudeAI Score: 793
An enthusiastic reflection on coding in plain English with Claude Code. The author shares genuine amazement at bringing ideas to life without traditional programming skills—ideas that previously stayed as "maybe one day I could fundraise for that" concepts.
AI Signal - January 13, 2026
-
Anthropic launched Cowork, extending the agentic Claude Code workflow to non-technical tasks. Users can point Claude at a folder for autonomous file operations with planning, execution, and approval loops—essentially bringing vibecoding to general knowledge work. The feature is available as a research preview for Claude Max subscribers on macOS.
-
The creator of Linux publicly endorsed AI-assisted "vibe coding" for his non-kernel projects, conceding it produces better results than hand-coding for certain use cases. This represents a significant cultural shift—one of the most respected figures in open source acknowledging that LLM-assisted development can outperform traditional methods.
-
Tobi Lutke demonstrated how Claude built a custom HTML-based MRI viewer from raw USB data in a single prompt, replacing proprietary Windows software. The viewer includes clearer navigation and automated annotations—showcasing LLMs replacing expensive specialized software rather than just assisting with it.
-
A professional developer shares hard-won lessons from delegating personal projects entirely to AI: always run real E2E tests, maintain comprehensive docs, use git commits aggressively, never trust AI's test generation, and keep human-readable state tracking. The post emphasizes the gap between "AI writes code you could write" and "AI writes code you couldn't."
-
Community member shares a comprehensive skill.md template that turns Claude Code into a fully autonomous full-stack app builder. The skill analyzes requirements, selects tech stack, creates phased plans, and executes everything phase-by-phase with automatic commits and testing—no questions asked until completion.
-
Geoffrey Hinton describes how AI agents can share knowledge at unprecedented scales: 10,000 agents studying different topics can sync learnings instantly, with each agent gaining the knowledge of all 10,000. This parallelized learning represents a fundamental advantage over human knowledge transfer, which relies on slow communication bottlenecks.
-
Comprehensive weekly roundup of agentic AI developments: Claude Code 2.1.0 with 1096 commits (agent hooks, multilingual support), OpenAI launches Health and Jobs agents, Cursor agent reduces tokens by 47%, and several other framework updates. The post aggregates what would otherwise be scattered announcements.
-
A GPT-5.2-pro research agent achieved a new best-known spherical packing for n=11, N=432, verified against MIT's benchmark library. The agent escaped a numerically "jammed" configuration that had resisted prior optimization. The team is extending the framework to computational physics.
-
Users report Claude Code 2.1.5 defaulting to script execution instead of API calls despite explicit instructions, picking up already-completed tickets, and burning excessive tokens. Community recommends rolling back to 2.1.1 or 2.0.76. Some users unable to downgrade as Claude auto-updates back to 2.1.5.
-
Practitioner with experience since 2018 (including RPA work and Oxford AI masters) synthesizes lessons from dozens of implementations. The article covers the progression from deterministic RPA to modern agentic systems, reliability challenges, and practical deployment patterns across industries.
-
Discussion of Cowork's platform risk for startups building wrappers around LLM capabilities. The community debates whether computer use, browser use, and terminal use agents will commoditize entire categories of early-stage companies. Platform risk is identified as a major consideration before building AI tooling.
-
Screenshot reveals Anthropic began Cowork development in 2026 (this year), meaning they built the entire product in weeks or months using Claude to write its own code. This demonstrates both rapid development cycles and recursive self-improvement—AI building the tools that extend its own capabilities.
-
Developer building voice agents reports that TTS latency is significantly worse than advertised (~1-1.2s end-to-end) and most providers are prohibitively expensive. The discussion surfaces practical challenges in building conversational agents at production quality and cost.
-
Discussion thread gathering practical lessons from deploying AI agents in real workflows. The community surfaces the gap between "this should work" and "this works reliably"—covering error handling, state management, failure modes, and the importance of human oversight.
AI Signal - January 06, 2026
-
Claude Code successfully reverse-engineered Ring's undocumented API (they have no public API) and built a native Mac app with AI guard features. The workflow combined voice input, manual API inspection, and iterative development. This demonstrates Claude Code handling complex real-world reverse engineering tasks end-to-end.
-
Boris Cherny revealed his surprisingly vanilla setup: runs 5 Claude instances in parallel in terminal plus 5-10 on web, uses system notifications for tab management, and frequently hands off sessions between local and web. Key insight: he doesn't heavily customize Claude Code, relying on out-of-box functionality with parallel workflows.
-
After Claude finishes coding, running "Do a git diff and pretend you're a senior dev who HATES this implementation" reliably surfaces edge cases and bugs that first-pass implementations miss. User reports this adversarial review technique works "too well" - revealing problems in nearly every initial Claude output.
-
Manus (acquired by Meta for $2B) solves agent context drift with 3 markdown files: task_plan.md for checkboxes, notes.md for research, and deliverable.md for output. The agent reads/writes these files instead of bloating context. Pattern open-sourced as Claude Code skill.
-
Boston Dynamics and Google DeepMind announced formal partnership to bring foundational AI intelligence to humanoid robots. Combines Boston Dynamics' hardware excellence with DeepMind's AI capabilities for next-generation robotics.
-
Designer distilled 8 years of product design experience into a Claude skill focused on dashboards, tool UIs, and data-heavy interfaces. Addresses the "purple gradient of doom" and generic AI-generated UI by encoding specific design principles and patterns.
-
Fully automated content system that analyzes website, finds keyword gaps, generates articles with images, publishes to CMS, and exchanges backlinks using triangle structures to avoid reciprocal penalties. Posts once per day to avoid spam detection. Three-month results demonstrate agentic SEO workflows.
-
Long-time user (since June 2025) reports hitting weekly limits for first time despite using less than other weeks. Multiple users confirm similar experiences. Suggests potential changes to rate limiting or usage calculation.
-
User on 5x Max plan reports dramatic change in usage consumption patterns. Previously took 2-3 messages to consume 1% with Thinking mode; now consumption spiked unpredictably. Suggests changes to underlying usage calculation or model behavior.
-
After 3 weeks building agents, user concludes they're "basically useless for any professional use." Issues: each model requires custom prompt styling matching training data (undocumented), same prompt produces different results across models, tools/functions work unpredictably, and agents drift from instructions over time.
-
PUBG company deployed internal AI system powered by Claude handling requests like competitor analysis, code review, and export. System proactively suggests tasks based on context (e.g., preparing client meeting summaries). 1,800+ employees using daily.
AI Signal - January 02, 2026
- My wife left town, my dog is sedated, and Claude convinced me I'm a coding god. I built this visualizer in 24 hours. r/ClaudeAI Score: 1587
A powerful demonstration of what modern AI coding assistants enable: a non-expert building a sophisticated visualization tool in 24 hours. This showcases how Claude and similar tools are democratizing software development, allowing people to build complex applications that would have previously required extensive programming experience.
-
A complete retirement planning web application built from scratch using Claude, demonstrating the model's ability to handle complex financial calculations, data visualization, and user interface design. This represents the type of specialized vertical applications that can now be created by domain experts without traditional software development backgrounds.
-
Analysis of Anthropic's strategic holiday promotion offering 2x usage limits during low-demand periods. This demonstrates smart capacity management and effective community engagement through goodwill gestures.
-
User experience report showing how increased Claude usage capacity changed research workflows, with Claude displacing ChatGPT as the primary tool. Demonstrates the importance of usage limits in shaping user behavior and tool adoption.
-
Critical user feedback on Claude Opus 4.5 after extended use, noting recent degradation in code quality, frequent bugs, and context management issues. Important reality check on production use of AI coding assistants.
-
Deep reflection on intensive Claude Code usage from a founder who quit their job to build full-time. Discusses shipping code in unfamiliar languages, amplifying design thinking, and maintaining agency while leveraging AI assistance.
- How are you guys building apps with Claude? The longer and bigger my app gets it is constantly breaking things that were previously working. r/ClaudeAI Score: 137
Important discussion of challenges in using AI coding assistants for larger applications, with regression issues and context management problems. Highlights the gap between demo-quality code and production applications.
-
Enthusiastic user upgrade to Claude Max plan based on Opus 4.5's performance, particularly highlighting reduced hallucinations and better understanding. Represents positive sentiment driving subscription upgrades.
-
Whimsical application allowing users to share messages via virtual bottles across oceans, demonstrating Claude's ability to interpret abstract prompts and create engaging user experiences. Shows the creative potential of AI coding assistants.
- I've been using ClaudeCode for 40+ hours a week for the last few months and wanted to share some commands I use r/ClaudeCode Score: 186
Community member sharing custom Claude Code commands developed through heavy production use, providing practical patterns for workflow automation. Valuable resource for others scaling their Claude Code usage.