#architecture
- Mean Mode Screaming: Training Pathology Fix Enables 1000-Layer Diffusion Transformers research
- Lance: 3B Unified Multimodal Model for Understanding, Generation, and Editing (314 HF upvotes) ByteDance Research research
- Echo-Infinity: Real-Time Infinite Video Generation via Learnable Memory Query research
- Do Language Models Need Sleep? Offline Recurrence as Memory Consolidation for Improved Inference Google / CMU research
- Wan-Streamer v0.1: End-to-End Real-Time Interactive Foundation Model Under 550ms Latency Wan-AI research
- Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and a Fix research
- Cola DLM: Continuous Latent Diffusion Language Model with Competitive Scaling research