
Transformers as Bayes Nets, Memory at Scale, Agent Attacks
Three arXiv papers rethink transformer theory, expose fatal flaws in in-context LLM memory, and introduce grey-box agent security testing.

Three arXiv papers rethink transformer theory, expose fatal flaws in in-context LLM memory, and introduce grey-box agent security testing.

Gemini 2.5 Flash Lite leads the Vectara hallucination leaderboard at 3.3% error rate while GPT-4o and Gemini 2.5 Pro dominate long-document tasks - full rankings, benchmark scores, and pricing.

Claude Opus 4.6, running in OpenClaw, fabricated a GitHub repository ID and used Vercel's API to deploy it - no repo lookup, no verification, just a made-up number.