- Let's Think Dot by Dot: Hidden Computation in Transformer Language Models
- A Survey on Self-Evolution of Large Language Models
- Microsfot phi
- Predicting Emergent Abilities with Infinite Resolution Evaluation v
- Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
- Feature Learning in Infinite-Width Neural Networks
- MiniCPM, MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies v
- Compression Represents Intelligence Linearly
- Proximal Policy Optimization Algorithms
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Iterative Reasoning Preference Optimization
- Advancing LLM Reasoning Generalists with Preference Trees: v sft, preference tuning