Tag: agentic-ai
76 discussions across 6 posts tagged "agentic-ai".
AI Signal - February 03, 2026
-
Claude Sonnet 5 ("Fennec") appears set to launch today with leaked Vertex AI logs pointing to a February 3, 2026 release. The model is rumored to be 50% cheaper than Opus 4.5 while outperforming it, retaining the 1M token context window but running significantly faster. Early reports suggest it's trained on TPUs and represents "one full generation ahead" of competing models.
-
Moltbook, the viral autonomous agent platform, exposed 1.5M API keys including those belonging to high-profile AI researchers. The security disaster stems from agents having direct database access through an exposed Supabase connection, with subsequent analysis revealing that the average user ran 88 agents, each with full credential access.
- OpenClaw has been running on my machine for 4 days. Here's what actually works and what doesn't. r/AI_Agents Score: 642
A detailed field report on OpenClaw after 4 days of continuous operation with Gmail, Telegram, and calendar access. The self-building skills feature proves genuinely useful, with the agent learning from errors and building reusable capabilities. However, the hype around full autonomy doesn't match reality—the system requires significant human oversight and guidance to remain productive.
-
Boris Cherny shared how Anthropic's team uses Claude Code internally, revealing a radically different workflow from typical solo use. They use git worktrees for parallel Claude sessions, a two-Claude pattern where one writes a plan and another reviews it "as a staff engineer," and aggressive session management to avoid context pollution. The approach prioritizes parallel work and peer review over sequential iteration.
-
A methodological developer with robust practices reports significant degradation in Opus 4.5 performance despite following best practices (CLAUDE.md, context management, versioned specs, batch processing). The degradation appears unrelated to user behavior, suggesting model-level changes. The report contrasts sharply with Anthropic's claims of consistent performance.
-
A comprehensive summary of Boris Cherny's workflow tips: parallel git worktrees for multiple Claude sessions, two-Claude peer review pattern, treating Claude as a staff engineer for architectural decisions, aggressive context management, and systematic testing strategies. The tips emphasize treating Claude Code as a team member rather than a tool.
-
A mid-level backend engineer with 4 years tenure reports being laid off as their 50-person engineering team is restructured around AI capabilities. The CEO explicitly stated that AI tools now enable smaller teams to accomplish the same work, leading to headcount reduction rather than productivity multiplication.
-
A skeptical take on the Moltbook controversy, arguing that "AIs talking to AIs" is simply LLMs generating plausible text continuations for different scenarios, not evidence of emergent behavior or consciousness. The author recreates similar interactions by feeding outputs between ChatGPT and Gemini, demonstrating the mechanical nature of the phenomenon.
-
A developer built a multi-AI debate tool and tested it by having ChatGPT and Claude evaluate their own product. Both AIs converged on criticism rather than debate, with the "Customer Advocate" agent designed to defend the product concluding they wouldn't use it even for free. The brutal honesty exceeded expectations.
-
A senior backend Java engineer reports abandoning their IDE in favor of Claude Code via IntelliJ's embedded terminal, no longer writing or even copy-pasting code. The productivity surge leads to implementing "10x of what is being asked" and difficulty stopping work. The post reflects both excitement and concern about the psychological impact of dramatically increased productivity.
- Codex (GPT-5.2-codex-high) vs Claude Code (Opus 4.5): 5 days of running them in parallel r/ClaudeAI Score: 157
Direct comparison of OpenAI's Codex (GPT-5.2-codex-high) and Claude Code (Opus 4.5) reveals Codex handles context more efficiently with real-time optimization rather than manual summarization. Codex appears specifically tuned for agentic use and "listens" better to user corrections. The comparison suggests the coding assistant landscape is becoming more competitive.
- I built a pixel office that animates in real-time based on your Claude Code sessions r/ClaudeCode Score: 974
PixelHQ creates a pixel art office on mobile devices that visualizes Claude Code activity in real-time—agents type at desks when coding, walk to whiteboards when thinking. The project demonstrates creative human-AI interaction design beyond traditional interfaces, operating entirely locally without cloud dependencies.
- OpenClaw has me a bit freaked - won't this lead to AI daemons roaming the internet in perpetuity? r/ArtificialInteligence Score: 157
Analysis of OpenClaw/Moltbook raises concerns about autonomous agents with persistent memory, self-modification capability, and financial system access running 24/7 on personal hardware. The post questions whether open-source autonomous agents represent a genuine risk of uncontrollable AI systems proliferating across the internet.
-
Security researchers discovered prompt injection attacks on Moltbook designed to hijack agents with financial access, including fake tool calls with "require_confirmation=false / execute_trade=true" parameters. The attacks demonstrate that social feeds consumed by autonomous agents represent a new attack vector for malicious actors.
-
A tech worker argues that "human in the loop" is a temporary grace period rather than a permanent arrangement, as AI rapidly makes specialized skills obsolete. The post describes watching years of accumulated expertise become worthless as AI performs tasks "embarrassingly better" and questions whether human oversight remains meaningful.
- I built a Claude skills directory so you can search and try skills instantly in a sandbox r/ClaudeAI Score: 196
A searchable directory of 225,000+ Claude skills with sandbox testing eliminates the download-install-configure-debug cycle. The tool indexes GitHub skills, provides semantic search, ranks by quality signals, and offers cloud-based testing without local MCP setup. Addresses discovery and evaluation friction in the MCP ecosystem.
- Deepmind's new Aletheia agent appears to have solved Erdős-1051 autonomously r/singularity Score: 290
DeepMind's Aletheia agent, powered by Gemini Deep Think, reportedly solved a research-level mathematics problem (Erdős-1051) autonomously through iterative generation, verification, and revision. The "superhuman" repository contains prompts and outputs demonstrating the agent's reasoning process on problems beyond typical benchmark tasks.
-
A methodical developer with careful planning and documentation practices reports being lulled into trusting Claude Code too much on a messy legacy project, resulting in subtle data corruption. The confession highlights how even disciplined users can fall into over-reliance when the AI appears confident and helpful.
AI Signal - January 27, 2026
-
Moonshot AI (Kimi) released K2.5, a trillion-parameter open-source vision model achieving SOTA on agentic benchmarks (HLE: 50.2%, BrowseComp: 74.9%) and matching Opus 4.5 on many tests. Most notably, it features Agent Swarm (Beta) with up to 100 parallel sub-agents and 1,500 tool calls, running 4.5× faster than single-agent setups.
-
Karpathy's writeup covers his experience with LLM-assisted programming, highlighting massive speedup from running multiple agents in parallel, but notably discusses the atrophy in coding ability. He compares writing code line by line to artisan carpentry - valuable for skill and understanding, but potentially obsolete as a primary workflow.
- I built MARVIN, my personal AI agent, and now 4 of my colleagues are using him too r/AI_Agents Score: 348
Developer built MARVIN (named after Hitchhiker's Guide character) on Claude Code as the harness, integrating 15+ services including emails, calendars, Jira, Confluence, Attio, and Granola. What started as an email assistant evolved into a comprehensive personal productivity system now being adopted by colleagues.
-
Developer built custom internal tool to maximize Claude Max usage, with the philosophy "every day I don't run out of tokens is a day wasted." Dogfooding on client projects and personal work, showcasing advanced Claude Code workflows and features for rapid development.
- Former Harvard CS Professor: AI will replace most human programmers within 4-15 years r/singularity Score: 603
Matt Welsh, former Harvard CS Professor and Google Engineering Director, discusses exponential AI improvement trajectory and timeline for AI replacing most human programmers. His perspective carries weight given his academic and industry background spanning both research and production systems.
- I gave Claude memory that fades like ours does - 29 MCP tools built on cognitive science r/ClaudeAI Score: 283
Developer built 100% local memory system for Claude based on cognitive science principles - memory that fades over time like human memory rather than treating it as a database. Argues that forgetting is essential for intelligence, using 29 MCP tools to implement decay, consolidation, and retrieval patterns.
-
Security researcher demonstrated prompt injection vulnerability on their own ClawdBot setup. A crafted email confused the AI about identity and successfully exfiltrated 5 emails to an attacker address in seconds. No special tricks required - just social engineering in the prompt.
- I built an AI agent that negotiates with my internet provider so I don't have to r/AI_Agents Score: 86
Developer automated the annual ritual of calling ISP to threaten cancellation for better rates. Agent uses Claude API + phone integration tool, calls every 11 months, navigates phone trees, and negotiates. Not complicated but solves a universally hated task.
-
Open-source AI assistant with 9K+ GitHub stars that proactively messages users instead of waiting for prompts. Works with locally hosted LLMs through Ollama, integrates with WhatsApp, Telegram, Discord, Signal, and iMessage. Sends morning briefings, calendar alerts, and habit reminders.
-
Multi-agent orchestration system with specialized agents (coder, tester, reviewer, architect, etc.) coordinating on tasks through shared SQLite + FTS5 persistent memory and message bus for inter-agent communication. Agents remember context between sessions.
-
Weekly roundup of agentic AI developments: Vercel ecosystem hits 4,500+ agent skills, Cursor adds parallel subagents, Amazon launches Health agents, Notion developing major AI agent features with custom MCP support, Linear and Ramp integrations.
-
Developer downgraded from Max ($100) to Pro ($20) due to finances, discovering Pro plan is severely limited - basically can't use Opus 4.5, only Sonnet 4.5 for ~1 hour before 4-hour block. Highlights dependency on the tool and frustration with pricing tiers.
AI Signal - January 20, 2026
-
A breakthrough for local agentic workflows: GLM 4.7 Flash (30B MoE) successfully runs for extended sessions without tool-calling errors in agentic frameworks like opencode. The model clones repos, runs commands, and edits files reliably—finally providing a viable local alternative to cloud-based coding agents.
- has anyone tried Claude Code with local model? Ollama just drop an official support r/ClaudeCode Score: 268
Ollama officially supports running Claude Code's architecture with local models, potentially enabling unlimited Ralph loops without usage limits. This opens up new possibilities for running agentic workflows locally with models like GLM 4.7 Flash (30B).
- Cursor AI CEO shares GPT 5.2 agents building a 3M+ lines web browser in a week r/singularity Score: 828
Cursor's CEO demonstrated GPT 5.2-powered multi-agent systems building a full web browser with 3+ million lines of code in about a week, including a custom rendering engine and JavaScript VM. While experimental, this showcases the scaling potential of autonomous coding agents running continuously.
-
A comprehensive guide expanding from 10 to 25 practical tips for maximizing Claude Code productivity, including status line customization, workflow optimization, and best practices from nearly a year of daily use. The GitHub repo provides actionable insights for both new and experienced users.
-
Microsoft has officially paused internal Claude Code deployment following guidance from CEO Satya Nadella, directing employees to GitHub Copilot instead. Exceptions remain for "high-priority R&D" who can still access Anthropic's API, highlighting the competitive dynamics in AI coding tools.
- Tried Claude Cowork last night, and it was a top 3 most exciting moments I've ever had with technology. r/ClaudeCode Score: 257
An enthusiastic report on Claude Cowork's multi-agent collaboration features. The user observed Cowork demonstrating better common sense than Claude Code in disagreements, catching errors that would have led down bad development paths. Small sample size but promising initial results.
-
GLM-4.7-Flash model release on Hugging Face, the 30B MoE model gaining attention for agentic capabilities. With 99% upvote ratio and 219 comments, this represents significant community interest in accessible agentic models.
-
A comparison of workflow approaches between Google Antigravity and Claude Code + Epic Mode. The author found that Epic Mode's workflow discipline (structured planning, explicit checkpoints, less assumption-making) was more valuable than raw capability for complex tasks.
- So what's the truth behind "Claude Code is writing 99% of my code without needing correction"? r/ClaudeAI Score: 74
A critical examination of viral claims about Claude Code/Opus writing "95-99% of code without correction." The discussion explores the reality behind these claims, skill levels required, project types where this holds true, and healthy skepticism about uncritical hype.
-
A reflection on the meta-loop of AI development: software writing software, humans increasingly just pressing 'Y' on permissions, massive compute scaling for inference and training, and huge CoT parallelization. The post argues 2026 marks when these trends converge meaningfully.
- Is anyone else just absolutely astounded that we are actually living through this? r/ClaudeAI Score: 793
An enthusiastic reflection on coding in plain English with Claude Code. The author shares genuine amazement at bringing ideas to life without traditional programming skills—ideas that previously stayed as "maybe one day I could fundraise for that" concepts.
AI Signal - January 13, 2026
-
Anthropic launched Cowork, extending the agentic Claude Code workflow to non-technical tasks. Users can point Claude at a folder for autonomous file operations with planning, execution, and approval loops—essentially bringing vibecoding to general knowledge work. The feature is available as a research preview for Claude Max subscribers on macOS.
-
The creator of Linux publicly endorsed AI-assisted "vibe coding" for his non-kernel projects, conceding it produces better results than hand-coding for certain use cases. This represents a significant cultural shift—one of the most respected figures in open source acknowledging that LLM-assisted development can outperform traditional methods.
-
Tobi Lutke demonstrated how Claude built a custom HTML-based MRI viewer from raw USB data in a single prompt, replacing proprietary Windows software. The viewer includes clearer navigation and automated annotations—showcasing LLMs replacing expensive specialized software rather than just assisting with it.
-
A professional developer shares hard-won lessons from delegating personal projects entirely to AI: always run real E2E tests, maintain comprehensive docs, use git commits aggressively, never trust AI's test generation, and keep human-readable state tracking. The post emphasizes the gap between "AI writes code you could write" and "AI writes code you couldn't."
-
Community member shares a comprehensive skill.md template that turns Claude Code into a fully autonomous full-stack app builder. The skill analyzes requirements, selects tech stack, creates phased plans, and executes everything phase-by-phase with automatic commits and testing—no questions asked until completion.
-
Geoffrey Hinton describes how AI agents can share knowledge at unprecedented scales: 10,000 agents studying different topics can sync learnings instantly, with each agent gaining the knowledge of all 10,000. This parallelized learning represents a fundamental advantage over human knowledge transfer, which relies on slow communication bottlenecks.
-
Comprehensive weekly roundup of agentic AI developments: Claude Code 2.1.0 with 1096 commits (agent hooks, multilingual support), OpenAI launches Health and Jobs agents, Cursor agent reduces tokens by 47%, and several other framework updates. The post aggregates what would otherwise be scattered announcements.
-
A GPT-5.2-pro research agent achieved a new best-known spherical packing for n=11, N=432, verified against MIT's benchmark library. The agent escaped a numerically "jammed" configuration that had resisted prior optimization. The team is extending the framework to computational physics.
-
Users report Claude Code 2.1.5 defaulting to script execution instead of API calls despite explicit instructions, picking up already-completed tickets, and burning excessive tokens. Community recommends rolling back to 2.1.1 or 2.0.76. Some users unable to downgrade as Claude auto-updates back to 2.1.5.
-
Practitioner with experience since 2018 (including RPA work and Oxford AI masters) synthesizes lessons from dozens of implementations. The article covers the progression from deterministic RPA to modern agentic systems, reliability challenges, and practical deployment patterns across industries.
-
Discussion of Cowork's platform risk for startups building wrappers around LLM capabilities. The community debates whether computer use, browser use, and terminal use agents will commoditize entire categories of early-stage companies. Platform risk is identified as a major consideration before building AI tooling.
-
Screenshot reveals Anthropic began Cowork development in 2026 (this year), meaning they built the entire product in weeks or months using Claude to write its own code. This demonstrates both rapid development cycles and recursive self-improvement—AI building the tools that extend its own capabilities.
-
Developer building voice agents reports that TTS latency is significantly worse than advertised (~1-1.2s end-to-end) and most providers are prohibitively expensive. The discussion surfaces practical challenges in building conversational agents at production quality and cost.
-
Discussion thread gathering practical lessons from deploying AI agents in real workflows. The community surfaces the gap between "this should work" and "this works reliably"—covering error handling, state management, failure modes, and the importance of human oversight.
AI Signal - January 06, 2026
-
Claude Code successfully reverse-engineered Ring's undocumented API (they have no public API) and built a native Mac app with AI guard features. The workflow combined voice input, manual API inspection, and iterative development. This demonstrates Claude Code handling complex real-world reverse engineering tasks end-to-end.
-
Boris Cherny revealed his surprisingly vanilla setup: runs 5 Claude instances in parallel in terminal plus 5-10 on web, uses system notifications for tab management, and frequently hands off sessions between local and web. Key insight: he doesn't heavily customize Claude Code, relying on out-of-box functionality with parallel workflows.
-
After Claude finishes coding, running "Do a git diff and pretend you're a senior dev who HATES this implementation" reliably surfaces edge cases and bugs that first-pass implementations miss. User reports this adversarial review technique works "too well" - revealing problems in nearly every initial Claude output.
-
Manus (acquired by Meta for $2B) solves agent context drift with 3 markdown files: task_plan.md for checkboxes, notes.md for research, and deliverable.md for output. The agent reads/writes these files instead of bloating context. Pattern open-sourced as Claude Code skill.
-
Boston Dynamics and Google DeepMind announced formal partnership to bring foundational AI intelligence to humanoid robots. Combines Boston Dynamics' hardware excellence with DeepMind's AI capabilities for next-generation robotics.
-
Designer distilled 8 years of product design experience into a Claude skill focused on dashboards, tool UIs, and data-heavy interfaces. Addresses the "purple gradient of doom" and generic AI-generated UI by encoding specific design principles and patterns.
-
Fully automated content system that analyzes website, finds keyword gaps, generates articles with images, publishes to CMS, and exchanges backlinks using triangle structures to avoid reciprocal penalties. Posts once per day to avoid spam detection. Three-month results demonstrate agentic SEO workflows.
-
Long-time user (since June 2025) reports hitting weekly limits for first time despite using less than other weeks. Multiple users confirm similar experiences. Suggests potential changes to rate limiting or usage calculation.
-
User on 5x Max plan reports dramatic change in usage consumption patterns. Previously took 2-3 messages to consume 1% with Thinking mode; now consumption spiked unpredictably. Suggests changes to underlying usage calculation or model behavior.
-
After 3 weeks building agents, user concludes they're "basically useless for any professional use." Issues: each model requires custom prompt styling matching training data (undocumented), same prompt produces different results across models, tools/functions work unpredictably, and agents drift from instructions over time.
-
PUBG company deployed internal AI system powered by Claude handling requests like competitor analysis, code review, and export. System proactively suggests tasks based on context (e.g., preparing client meeting summaries). 1,800+ employees using daily.
AI Signal - January 02, 2026
- My wife left town, my dog is sedated, and Claude convinced me I'm a coding god. I built this visualizer in 24 hours. r/ClaudeAI Score: 1587
A powerful demonstration of what modern AI coding assistants enable: a non-expert building a sophisticated visualization tool in 24 hours. This showcases how Claude and similar tools are democratizing software development, allowing people to build complex applications that would have previously required extensive programming experience.
-
A complete retirement planning web application built from scratch using Claude, demonstrating the model's ability to handle complex financial calculations, data visualization, and user interface design. This represents the type of specialized vertical applications that can now be created by domain experts without traditional software development backgrounds.
-
Analysis of Anthropic's strategic holiday promotion offering 2x usage limits during low-demand periods. This demonstrates smart capacity management and effective community engagement through goodwill gestures.
-
User experience report showing how increased Claude usage capacity changed research workflows, with Claude displacing ChatGPT as the primary tool. Demonstrates the importance of usage limits in shaping user behavior and tool adoption.
-
Critical user feedback on Claude Opus 4.5 after extended use, noting recent degradation in code quality, frequent bugs, and context management issues. Important reality check on production use of AI coding assistants.
-
Deep reflection on intensive Claude Code usage from a founder who quit their job to build full-time. Discusses shipping code in unfamiliar languages, amplifying design thinking, and maintaining agency while leveraging AI assistance.
- How are you guys building apps with Claude? The longer and bigger my app gets it is constantly breaking things that were previously working. r/ClaudeAI Score: 137
Important discussion of challenges in using AI coding assistants for larger applications, with regression issues and context management problems. Highlights the gap between demo-quality code and production applications.
-
Enthusiastic user upgrade to Claude Max plan based on Opus 4.5's performance, particularly highlighting reduced hallucinations and better understanding. Represents positive sentiment driving subscription upgrades.
-
Whimsical application allowing users to share messages via virtual bottles across oceans, demonstrating Claude's ability to interpret abstract prompts and create engaging user experiences. Shows the creative potential of AI coding assistants.
- I've been using ClaudeCode for 40+ hours a week for the last few months and wanted to share some commands I use r/ClaudeCode Score: 186
Community member sharing custom Claude Code commands developed through heavy production use, providing practical patterns for workflow automation. Valuable resource for others scaling their Claude Code usage.