Bibliography

In this chapter

Bibliography

Bibliography

[AKH15]: Yousset I Abdel-Aziz, Hauck Michael Karara, and Michael Hauck. “Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry”. Photogrammetric engineering & remote sensing 81.2 (2015), pp. 103–107.
[AMS09]: P-A Absil, R Mahony, and R Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Apr. 2009.
[AAJ+16]: Alekh Agarwal, Animashree Anandkumar, Prateek Jain, and Praneeth Netrapalli. “Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization”. SIAM Journal on Optimization 26.4 (2016), pp. 2775–2799. eprint: https://doi.org/10.1137/140979861.
[AEB06]: Michal Aharon, Michael Elad, and Alfred Bruckstein. “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation”. IEEE Transactions on signal processing 54.11 (2006), pp. 4311–4322.
[ACM12]: William K. Allard, Guangliang Chen, and Mauro Maggioni. “Multi-scale geometric methods for data sets II: Geometric Multi-Resolution Analysis”. Applied and Computational Harmonic Analysis 32.3 (2012), pp. 435–462.
[AR20]: Jason M. Allred and Kaushik Roy. “Controlled Forgetting: Targeted Stimulation and Dopaminergic Plasticity Modulation for Unsupervised Lifelong Learning in Spiking Neural Networks”. Frontiers in Neuroscience 14 (2020).
[ADG+16]: Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. “Learning to learn by gradient descent by gradient descent”. Advances in neural information processing systems. 2016, pp. 3981–3989.
[ACB17]: Martin Arjovsky, Soumith Chintala, and Léon Bottou. “Wasserstein generative adversarial networks”. International conference on machine learning. PMLR. 2017, pp. 214–223.
[AGM+15]: Sanjeev Arora, Rong Ge, Tengyu Ma, and Ankur Moitra. “Simple, Efficient, and Neural Algorithms for Sparse Coding”. Proceedings of The 28th Conference on Learning Theory. Ed. by Peter Grünwald, Elad Hazan, and Satyen Kale. Vol. 40. Proceedings of Machine Learning Research. Paris, France: PMLR, July 2015, pp. 113–149.
[AW18]: Aharon Azulay and Yair Weiss. “Why do deep convolutional networks generalize so poorly to small image transformations?” arXiv preprint arXiv:1805.12177 (2018).
[BJC85]: B. Ans, J. Hérault, and C. Jutten. “Architectures neuromimétiques adaptatives : Détection de primitives”. Cognitiva 2 (1985), pp. 593–597.
[BKH16]: Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. “Layer normalization”. arXiv preprint arXiv:1607.06450 (2016).
[BM24]: Hao Bai and Yi Ma. “Improving neuron-level interpretability with white-box language models”. arXiv preprint arXiv:2410.16443 (2024).
[BGN+17]: Bowen Baker, Otkrist Gupta, N. Naik, and R. Raskar. “Designing Neural Network Architectures using Reinforcement Learning”. ArXiv abs/1611.02167 (2017).
[Bal11]: Pierre Baldi. “Autoencoders, unsupervised learning and deep architectures”. Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning Workshop - Volume 27. UTLW’11. Washington, USA: JMLR.org, 2011, pp. 37–50.
[BH89]: Pierre Baldi and Kurt Hornik. “Neural networks and principal component analysis: Learning from examples without local minima”. Neural networks 2.1 (1989), pp. 53–58.
[BS16]: Pierre Baldi and Peter Sadowski. “A theory of local learning, the learning channel, and the optimality of backpropagation”. Neural Networks 83 (2016), pp. 51–74.
[BLZ+22]: Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. “Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models”. arXiv preprint arXiv:2201.06503 (2022).
[BNX+23]: Fan Bao, Shen Nie, Kaiwen Xue, Yue Cao, Chongxuan Li, Hang Su, and Jun Zhu. “All are Worth Words: A ViT Backbone for Diffusion Models”. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, pp. 22669–22679.
[BSM+20]: Pinglei Bao, Liang She, Mason McGill, and Doris Y. Tsao. “A map of object space in primate inferotemporal cortex”. Nature 583 (2020), pp. 103–108.
[BYL+23]: Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel. “MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation”. International Conference on Machine Learning. PMLR. 2023, pp. 1737–1752.
[BKS15]: Boaz Barak, Jonathan A Kelner, and David Steurer. “Dictionary learning and tensor decomposition via the sum-of-squares method”. Proceedings of the forty-seventh annual ACM symposium on Theory of Computing. New York, NY, USA: ACM, June 2015.
[BT09]: Amir Beck and Marc Teboulle. “A fast iterative shrinkage-thresholding algorithm for linear inverse problems”. SIAM journal on imaging sciences 2.1 (2009), pp. 183–202.
[BHM+19]: Mikhail Belkin, Daniel Hsu, Siyuan Ma, and Soumik Mandal. “Reconciling modern machine-learning practice and the classical bias–variance trade-off”. Proceedings of the National Academy of Sciences 116.32 (2019), pp. 15849–15854.
[BBH+15]: Elizabeth A Bell, Patrick Boehnke, T Mark Harrison, and Wendy L Mao. “Potentially biogenic carbon preserved in a 4.1 billion-year-old zircon”. Proceedings of the National Academy of Sciences 112.47 (2015), pp. 14518–14521.
[Bel57]: Richard Bellman. Dynamic Programming. Princeton University Press, 1957.
[Ben23]: Max Bennett. A Brief History of Intelligence: Evolution, AI, and the Five Breakthroughs That Made Our Brains. Mariner Books, 2023.
[BN24a]: Jeremy Bernstein and Laker Newhouse. Old Optimizer, New Norm: An Anthology. 2024. arXiv: 2409.20325 [cs.LG].
[Bla72]: R. Blahut. “Computation of channel capacity and rate-distortion functions”. IEEE Transactions on Information Theory 18.4 (1972), pp. 460–473.
[BRL+23]: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, and Karsten Kreis. “Align your latents: High-resolution video synthesis with latent diffusion models”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 22563–22575.
[BAV25]: Nicholas M Boffi, Michael S Albergo, and Eric Vanden-Eijnden. “How to build a consistency model: Learning flow maps via self-distillation”. arXiv preprint arXiv:2505.18825 (2025).
[BBF+01]: Lorna Booth, Jehoshua Bruck, M. Franceschetti, and Ronald Meester. “Covering Algorithms, Continuum Percolation and the Geometry of Wireless Networks”. Ann. Appl. Probab. 13 (July 2001).
[Bor97]: Vivek S Borkar. “Stochastic approximation with two time scales”. Systems & Control Letters 29.5 (1997), pp. 291–294.
[BDS16]: Vivek S Borkar, Raaz Dwivedi, and Neeraja Sahasrabudhe. “Gaussian approximations in high dimensional estimation”. Systems & Control Letters 92 (2016), pp. 42–45.
[Bos50]: R. Boscovich. De calculo probabilitatum que respondent diversis valoribus summe errorum post plures observationes, quarum singule possient esse erronee certa quadam quantitate. 1750.
[Bou23]: Nicolas Boumal. An Introduction to Optimization on Smooth Manifolds. Cambridge University Press, Mar. 2023.
[BV04]: Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[BN24b]: Arwen Bradley and Preetum Nakkiran. “Classifier-Free Guidance is a Predictor-Corrector”. arXiv [cs.LG] (Aug. 2024). arXiv: 2408.09000 [cs.LG].
[BN20]: Guy Bresler and Dheeraj Nagaraj. “Sharp representation theorems for relu networks with precise dependence on depth”. Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20 Article 897. Red Hook, NY, USA: Curran Associates Inc., Dec. 2020, pp. 10697–10706.
[BB11]: Haim Brezis and Haim Brézis. Functional analysis, Sobolev spaces and partial differential equations. Vol. 2. 3. Springer, 2011.
[BEJ25]: Paige Bright, Alan Edelman, and Steven G Johnson. “Matrix Calculus (for Machine Learning and Beyond)”. arXiv preprint arXiv:2501.14787 (2025).
[BDS19]: Andrew Brock, Jeff Donahue, and Karen Simonyan. “Large Scale GAN Training for High Fidelity Natural Image Synthesis”. International Conference on Learning Representations (ICLR). 2019.
[BMR+20]: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. “Language models are few-shot learners”. arXiv preprint arXiv:2005.14165 (2020).
[BDE+24]: Jake Bruce, Michael D Dennis, Ashley Edwards, Jack Parker-Holder, Yuge Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, et al. “Genie: Generative interactive environments”. Forty-first International Conference on Machine Learning. 2024.
[BM13]: Joan Bruna and Stéphane Mallat. “Invariant Scattering Convolution Networks”. IEEE Transactions on Pattern Analysis and Machine Intelligence 35.8 (2013), pp. 1872–1886.
[BGW21]: Sam Buchanan, Dar Gilboa, and John Wright. “Deep Networks and the Multiple Manifold Problem”. International Conference on Learning Representations. 2021.
[BPM+25]: Sam Buchanan, Druv Pai, Yi Ma, and Valentin De Bortoli. “On the edge of memorization in diffusion models”. arXiv preprint arXiv:2508.17689 (2025).
[CD91]: M. Frank Callier and A. Charles Desoer. Linear System Theory. Springer-Verlag, 1991.
[Can06]: E. Candès. “Compressive sampling”. Proceedings of the International Congress of Mathematicians. 2006.
[CT05a]: E. Candès and T. Tao. “Decoding by linear programming”. IEEE Transactions on Information Theory 51.12 (2005).
[CT05b]: E. Candès and T. Tao. “Error Correction via Linear Programming”. IEEE Symposium on FOCS (2005), pp. 295–308.
[CMM+21]: Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. 2021. arXiv: 2006.09882 [cs.CV].
[CTM+21]: Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. “Emerging properties in self-supervised vision transformers”. Proceedings of the IEEE/CVF international conference on computer vision. 2021, pp. 9650–9660.
[Cha66]: Gregory J. Chaitin. “On the Length of Programs for Computing Finite Binary Sequences”. J. ACM 13.4 (Oct. 1966), pp. 547–569.
[CYY+22]: Kwan Ho Ryan Chan, Yaodong Yu, Chong You, Haozhi Qi, John Wright, and Yi Ma. “ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction”. Journal of Machine Learning Research 23.114 (2022), pp. 1–103.
[CJG+15]: Tsung-Han Chan, Kui Jia, Shenghua Gao, Jiwen Lu, Zinan Zeng, and Yi Ma. “PCANet: A simple deep learning baseline for image classification?” TIP (2015).
[CFG+15]: Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. “Shapenet: An information-rich 3d model repository”. arXiv preprint arXiv:1512.03012 (2015).
[CWM+17]: Jianlong Chang, Lingfeng Wang, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. “Deep adaptive image clustering”. Proceedings of the IEEE international conference on computer vision. 2017, pp. 5879–5887.
[CT17]: Le Chang and Doris Tsao. “The Code for Facial Identity in the Primate Brain”. Cell 169 (June 2017), 1013–1028.e14.
[CMP+21]: Sneha Chaudhari, Varun Mithal, Gungor Polatkan, and Rohan Ramanath. “An attentive survey of attention models”. ACM Transactions on Intelligent Systems and Technology (TIST) 12.5 (2021), pp. 1–32.
[CRE+19]: Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and Marc’Aurelio Ranzato. “On tiny episodic memories in continual learning”. arXiv preprint arXiv:1902.10486 (2019).
[CXE+24]: Anpei Chen, Haofei Xu, Stefano Esposito, Siyu Tang, and Andreas Geiger. “LaRa: Efficient Large-Baseline Radiance Fields”. European Conference on Computer Vision (ECCV). 2024.
[CHZ+23]: Minshuo Chen, Kaixuan Huang, Tuo Zhao, and Mengdi Wang. “Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data”. International Conference on Machine Learning. PMLR. 2023, pp. 4672–4712.
[CRB+18]: Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. “Neural ordinary differential equations”. Advances in neural information processing systems 31 (2018).
[CCL+23]: Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru Zhang. “Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions”. The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
[CZG+24]: Siyi Chen, Huijie Zhang, Minzhe Guo, Yifu Lu, Peng Wang, and Qing Qu. “Exploring low-dimensional subspace in diffusion models for controllable image editing”. Advances in Neural Information Processing Systems 37 (2024), pp. 27340–27371.
[CZL+25]: Siyi Chen, Yimeng Zhang, Sijia Liu, and Qing Qu. “The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning”. arXiv preprint arXiv:2504.21307 (2025).
[CKN+20]: Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. “A simple framework for contrastive learning of visual representations”. arXiv preprint arXiv:2002.05709 (2020).
[CLH+24]: Xiangning Chen, Chen Liang, Da Huang, Esteban Real, Kaiyuan Wang, Hieu Pham, Xuanyi Dong, Thang Luong, Cho-Jui Hsieh, Yifeng Lu, et al. “Symbolic discovery of optimization algorithms”. Advances in neural information processing systems 36 (2024).
[CAP20]: Julian Chibane, Thiemo Alldieck, and Gerard Pons-Moll. “Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion”. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020, pp. 6968–6979.
[Cho17]: Francois Chollet. “Xception: Deep Learning with Depthwise Separable Convolutions”. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, pp. 1800–1807.
[CKM+23]: Hyungjin Chung, Jeongsol Kim, Michael Thompson Mccann, Marc Louis Klasky, and Jong Chul Ye. “Diffusion Posterior Sampling for General Noisy Inverse Problems”. The Eleventh International Conference on Learning Representations. 2023.
[CW16a]: Taco Cohen and Max Welling. “Group equivariant convolutional networks”. International Conference on Machine Learning. 2016, pp. 2990–2999.
[CW16b]: Taco Cohen and Max Welling. “Group equivariant convolutional networks”. International conference on machine learning. PMLR. 2016, pp. 2990–2999.
[CW16c]: Taco S. Cohen and Max Welling. “Group Equivariant Convolutional Networks”. CoRR abs/1602.07576 (2016). arXiv: 1602.07576.
[CV95]: Corinna Cortes and Vladimir Vapnik. “Support-Vector Networks”. Mach. Learn. 20.3 (1995), pp. 273–297.
[CT91]: T. Cover and J. Thomas. Elements of Information Theory. Wiley Series in Telecommunications, 1991.
[Cov64]: Thomas Cover. “Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition”. IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS (1964).
[Cyb89]: George V. Cybenko. “Approximation by superpositions of a sigmoidal function”. Mathematics of Control, Signals and Systems 2 (1989), pp. 303–314.
[D D00]: D. Donoho. “High-dimensional data analysis: The curses and blessings of dimensionality”. AMS Math Challenges Lecture (2000).
[DM03]: D. Donoho and M. Elad. “Optimally sparse representation in general (nonorthogonal) dictionaries via \(\ell ^1\) minimization”. PNAS 100.5 (2003), pp. 2197–2202.
[DTL+22]: Xili Dai, Shengbang Tong, Mingyang Li, Ziyang Wu, Michael Psenka, Kwan Ho Ryan Chan, Pengyuan Zhai, Yaodong Yu, Xiaojun Yuan, Heung-Yeung Shum, and Yi Ma. “CTRL: Closed-Loop Transcription to an LDR via Minimaxing Rate Reduction”. Entropy 24.4 (2022).
[Dan02]: George B Dantzig. “Linear Programming”. Operations Research 50.1 (2002), pp. 42–47.
[DSD+23a]: Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alex Dimakis, and Adam Klivans. “Ambient Diffusion: Learning Clean Distributions from Corrupted Data”. Thirty-seventh Conference on Neural Information Processing Systems. 2023.
[DSD+23b]: Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alex Dimakis, and Adam Klivans. “Ambient Diffusion: Learning Clean Distributions from Corrupted Data”. Advances in Neural Information Processing Systems. Ed. by A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine. Vol. 36. Curran Associates, Inc., 2023, pp. 288–313.
[DGG+25]: Valentin De Bortoli, Alexandre Galashov, J Swaroop Guntupalli, Guangyao Zhou, Kevin Murphy, Arthur Gretton, and Arnaud Doucet. “Distributional Diffusion Models with Scoring Rules”. arXiv preprint arXiv:2502.02483 (2025).
[DCM+23]: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, and Konstantin Mishchenko. “Optimal linear decay learning rate schedules and further refinements”. arXiv preprint arXiv:2310.07831 (2023).
[DLW+23]: Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, and Ali Farhadi. “Objaverse-XL: a universe of 10M+ 3D objects”. Proceedings of the 37th International Conference on Neural Information Processing Systems. NIPS ’23. New Orleans, LA, USA: Curran Associates Inc., 2023.
[DDS22]: Julie Delon, Agnes Desolneux, and Antoine Salmona. “Gromov–Wasserstein distances between Gaussian distributions”. Journal of Applied Probability 59.4 (2022), pp. 1178–1198.
[DDS+09a]: J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. “ImageNet: A Large-Scale Hierarchical Image Database”. CVPR09. 2009.
[DDS+09b]: Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. “ImageNet: A Large-Scale Hierarchical Image Database”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2009, pp. 248–255.
[DCL+19]: Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “Bert: Pre-training of deep bidirectional transformers for language understanding”. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019, pp. 4171–4186.
[DN21a]: Prafulla Dhariwal and Alexander Nichol. “Diffusion Models Beat GANs on Image Synthesis”. Advances in Neural Information Processing Systems (NeurIPS). Vol. 34. 2021.
[DN21b]: Prafulla Dhariwal and Alexander Quinn Nichol. “Diffusion Models Beat GANs on Image Synthesis”. Advances in Neural Information Processing Systems. Ed. by A Beygelzimer, Y Dauphin, P Liang, and J Wortman Vaughan. 2021.
[Don01]: D L Donoho. “Sparse components of images and optimal atomic decompositions”. Constructive approximation 17.3 (Jan. 2001), pp. 353–382.
[DVD+98]: D L Donoho, M Vetterli, R A DeVore, and I Daubechies. “Data compression and harmonic analysis”. IEEE transactions on information theory / Professional Technical Group on Information Theory 44.6 (Oct. 1998), pp. 2435–2476.
[Don05]: David L Donoho. “Neighborly polytopes and sparse solutions of underdetermined linear equations”. Stanford Technical Report 2005-04 (2005).
[DBK+21]: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”. 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
[DFK+22]: Laura Downs, Anthony Francis, Nate Koenig, Brandon Kinman, Ryan Hickman, Krista Reymann, Thomas B McHugh, and Vincent Vanhoucke. “Google scanned objects: A high-quality dataset of 3d scanned household items”. 2022 International Conference on Robotics and Automation (ICRA). IEEE. 2022, pp. 2553–2560.
[DSC22]: Shiv Ram Dubey, Satish Kumar Singh, and Bidyut Baran Chaudhuri. “Activation functions in deep learning: A comprehensive survey and benchmark”. Neurocomputing 503 (2022), pp. 92–108.
[DSK17]: Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. “A Learned Representation for Artistic Style”. arXiv preprint arXiv:1610.07629 (2017).
[EY36]: Carl Eckart and Gale Young. “The approximation of one matrix by another of lower rank”. Psychometrika 1.3 (Sept. 1936), pp. 211–218.
[EAS98]: A Edelman, T Arias, and S Smith. “The Geometry of Algorithms with Orthogonality Constraints”. SIAM Journal on Matrix Analysis and Applications 20.2 (Jan. 1998), pp. 303–353.
[EA06]: Michael Elad and Michal Aharon. “Image denoising via sparse and redundant representations over learned dictionaries”. IEEE transactions on image processing: a publication of the IEEE Signal Processing Society 15.12 (Dec. 2006), pp. 3736–3745.
[ELP+97]: Yuval Eldar, Michael Lindenbaum, Moshe Porat, and Yehoshua Y Zeevi. “The farthest point strategy for progressive image sampling”. IEEE transactions on image processing 6.9 (1997), pp. 1305–1315.
[EHO+22a]: Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. “Toy models of superposition”. arXiv preprint arXiv:2209.10652 (2022).
[EHO+22b]: Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, Roger Grosse, Sam McCandlish, Jared Kaplan, Dario Amodei, Martin Wattenberg, and Christopher Olah. “Toy Models of Superposition”. Transformer Circuits Thread (2022).
[ETT+17]: Logan Engstrom, Brandon Tran, Dimitris Tsipras, Ludwig Schmidt, and Aleksander Madry. “A rotation and a translation suffice: Fooling CNNs with simple transformations”. arXiv preprint arXiv:1712.02779 (2017).
[EKB+24]: Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. “Scaling rectified flow transformers for high-resolution image synthesis”. Forty-first international conference on machine learning. 2024.
[ERO21]: Patrick Esser, Robin Rombach, and Bjorn Ommer. “Taming transformers for high-resolution image synthesis”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021, pp. 12873–12883.
[EGW+10]: Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. The PASCAL Visual Object Classes (VOC) Challenge. 2010. arXiv: 0909.5206 [cs.CV].
[FPH+25]: Tyler Farghly, Peter Potaptchik, Samuel Howard, George Deligiannidis, and Jakiw Pidstrigach. “Diffusion models and the manifold hypothesis: Log-domain smoothing is geometry adaptive”. arXiv preprint arXiv:2510.02305 (2025).
[FZS22]: William Fedus, Barret Zoph, and Noam Shazeer. “Switch transformers: scaling to trillion parameter models with simple and efficient sparsity”. J. Mach. Learn. Res. 23.1 (Jan. 2022).
[Fel49]: William Feller. “On the Theory of Stochastic Processes, with Particular Reference to Applications”. 1949.
[FCR20]: Tanner Fiez, Benjamin Chasnov, and Lillian Ratliff. “Implicit learning dynamics in stackelberg games: Equilibria characterization, convergence analysis, and empirical study”. International Conference on Machine Learning. PMLR. 2020, pp. 3133–3144.
[FCR19]: Tanner Fiez, Benjamin Chasnov, and Lillian J Ratliff. “Convergence of learning dynamics in stackelberg games”. arXiv preprint arXiv:1906.01217 (2019).
[Fuk69]: Kunihiko Fukushima. “Visual Feature Extraction by a Multilayered Network of Analog Threshold Elements”. IEEE Transactions on Systems Science and Cybernetics 5.4 (1969), pp. 322–333.
[Fuk80]: Kunihiko Fukushima. “Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position”. Biological Cybernetics 36 (1980), pp. 193–202.
[GIF+23]: Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, and Ludwig Schmidt. DataComp: In search of the next generation of multimodal datasets. 2023. arXiv: 2304.14108 [cs.CV].
[GTT+25]: Leo Gao, Tom Dupre la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, and Jeffrey Wu. “Scaling and evaluating sparse autoencoders”. The Thirteenth International Conference on Learning Representations. 2025.
[GHH+24]: Ruiqi Gao, Aleksander Hoły’nski, Philipp Henzler, Arthur Brussee, Ricardo Martin-Brualla, Pratul Srinivasan, Jonathan T. Barron, and Ben Poole. “CAT3D: Create Anything in 3D with Multi-View Diffusion Models”. Advances in Neural Information Processing Systems (NeurIPS). Oral. 2024.
[GG23]: Guillaume Garrigos and Robert M Gower. “Handbook of convergence theorems for (stochastic) gradient methods”. arXiv preprint arXiv:2301.11235 (2023).
[GWX+25]: Zheng Geng, Nan Wang, Shaocong Xu, Chongjie Ye, Bohan Li, Zhaoxi Chen, Sida Peng, and Hao Zhao. “One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation”. arXiv preprint arXiv:2509.07978 (2025).
[GDB+25]: Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. “Mean flows for one-step generative modeling”. arXiv preprint arXiv:2505.13447 (2025).
[GVM25]: Anand Jerry George, Rodrigo Veiga, and Nicolas Macris. “Denoising score matching with random features: Insights on diffusion models from precise learning curves”. arXiv preprint arXiv:2502.00336 (2025).
[GRS+23]: Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D Lee, and Dimitris Papailiopoulos. “Looped transformers as programmable computers”. International Conference on Machine Learning. PMLR. 2023, pp. 11398–11442.
[Gil61]: E. N. Gilbert. “Random Plane Networks”. Journal of the Society for Industrial and Applied Mathematics 9.4 (1961), pp. 533–543. eprint: https://doi.org/10.1137/0109045.
[GC19]: Aaron Gokaslan and Vanya Cohen. OpenWebText Corpus. http://Skylion007.github.io/OpenWebTextCorpus. 2019.
[GPM+14a]: Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative Adversarial Nets”. Advances in Neural Information Processing Systems. Ed. by Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger. Vol. 27. Curran Associates, Inc., 2014.
[GPM+14b]: Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. “Generative adversarial nets”. Advances in neural information processing systems. 2014, pp. 2672–2680.
[GDG+17]: Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. “Accurate, large minibatch sgd: Training imagenet in 1 hour”. arXiv preprint arXiv:1706.02677 (2017).
[GWT+23]: Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, et al. “Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives”. arXiv preprint arXiv:2311.18259 (2023).
[GL10]: Karol Gregor and Yann LeCun. “Learning fast approximations of sparse coding”. Proceedings of the 27th International Conference on International Conference on Machine Learning. 2010, pp. 399–406.
[Gri11]: Rémi Gribonval. “Should Penalized Least Squares Regression be Interpreted as Maximum A Posteriori Estimation?” IEEE transactions on signal processing: a publication of the IEEE Signal Processing Society 59.5 (May 2011), pp. 2405–2410.
[GJB15]: Remi Gribonval, Rodolphe Jenatton, and Francis Bach. “Sparse and spurious: Dictionary learning with noise and outliers”. IEEE transactions on information theory 61.11 (Nov. 2015), pp. 6298–6319.
[Gro87]: Stephen Grossberg. “Competitive Learning: From Interactive Activation to Adaptive Resonance”. Cogn. Sci. 11 (1987), pp. 23–63.
[GD24]: Albert Gu and Tri Dao. “Mamba: Linear-time sequence modeling with selective state spaces”. First conference on language modeling. 2024.
[GDP+23]: Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, and Ye Wang. “On memorization in diffusion models”. arXiv preprint arXiv:2310.02664 (2023).
[GYR+23]: Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, and Bo Dai. “Animatediff: Animate your personalized text-to-image diffusion models without specific tuning”. arXiv preprint arXiv:2307.04725 (2023).
[HCL06]: Raia Hadsell, Sumit Chopra, and Yann LeCun. “Dimensionality reduction by learning an invariant mapping”. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006. 2006, pp. 1735–1742.
[HY01]: M.H. Hansen and B. Yu. “Model Selection and the Principle of Minimum Description Length”. Journal of American Statistical Association 96 (2001), pp. 746–774.
[Har15]: Yuval Noah Harari. Sapiens: A Brief History of Humankind. Harper, 2015.
[HTF09]: Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Second. Springer, 2009.
[Haw21]: Jeff Hawkins. A Thousand Brains: A New Theory of Intelligence. Basic Books, 2021.
[HCX+22]: Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. “Masked autoencoders are scalable vision learners”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 16000–16009.
[HFW+19]: Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. “Momentum contrast for unsupervised visual representation learning”. arXiv preprint arXiv:1911.05722 (2019).
[HZR+16a]: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep Residual Learning for Image Recognition”. 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 2016, pp. 770–778.
[HZR+16b]: Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. “Deep residual learning for image recognition”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 770–778.
[Hea+56]: Thomas Little Heath et al. The thirteen books of Euclid’s Elements. Courier Corporation, 1956.
[HRU+17a]: Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. “GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium”. Advances in Neural Information Processing Systems. Ed. by I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Vol. 30. Curran Associates, Inc., 2017.
[HRU+17b]: Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. “Gans trained by a two time-scale update rule converge to a local nash equilibrium”. Advances in neural information processing systems 30 (2017).
[HS06]: G. E. Hinton and R. R. Salakhutdinov. “Reducing the Dimensionality of Data with Neural Networks”. Science 313.5786 (2006), pp. 504–507. eprint: https://science.sciencemag.org/content/313/5786/504.full.pdf.
[HZ93]: Geoffrey E. Hinton and Richard S. Zemel. “Autoencoders, minimum description length and Helmholtz free energy”. Proceedings of the 6th International Conference on Neural Information Processing Systems. NIPS’93. Denver, Colorado: Morgan Kaufmann Publishers Inc., 1993, pp. 3–10.
[HJA20]: Jonathan Ho, Ajay Jain, and Pieter Abbeel. “Denoising Diffusion Probabilistic Models”. Advances in Neural Information Processing Systems (NeurIPS). Vol. 33. 2020, pp. 6840–6851.
[HS21]: Jonathan Ho and Tim Salimans. “Classifier-Free Diffusion Guidance”. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2021.
[HS22a]: Jonathan Ho and Tim Salimans. “Classifier-Free Diffusion Guidance”. arXiv [cs.LG] (July 2022). arXiv: 2207.12598 [cs.LG].
[HS22b]: Jonathan Ho and Tim Salimans. “Classifier-Free Diffusion Guidance”. NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications. 2022.
[HS97]: Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-term Memory”. Neural computation 9 (Dec. 1997), pp. 1735–80.
[HSD20]: David Hong, Yue Sheng, and Edgar Dobriban. Selecting the number of components in PCA via random signflips. 2020. arXiv: 2012.02985 [math.ST].
[HZG+23]: Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. “Lrm: Large reconstruction model for single image to 3d”. arXiv preprint arXiv:2311.04400 (2023).
[Hot33]: H. Hotelling. “Analysis of a Complex of Statistical Variables into Principal Components”. Journal of Educational Psychology (1933).
[HZM+23]: Cheng-Yu Hsieh, Jieyu Zhang, Zixian Ma, Aniruddha Kembhavi, and Ranjay Krishna. “SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality”. Thirty-Seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2023.
[HDZ+25]: Binbin Huang, Haobin Duan, Yiqun Zhao, Zibo Zhao, Yi Ma, and Shenghua Gao. “CUPID: Generative 3D Reconstruction via Joint Object and Pose Modeling”. arXiv preprint arXiv:2510.20776 (2025).
[HYH+22]: Chun-Hao P. Huang, Hongwei Yi, Markus Höschle, Matvey Safroshkin, Tsvetelina Alexiadis, Senya Polikovsky, Daniel Scharstein, and Michael J. Black. “Capturing and Inferring Dense Full-Body Human-Scene Contact”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 13274–13285.
[HLV+17]: Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q Weinberger. “Densely Connected Convolutional Networks”. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, pp. 2261–2269.
[HM99]: Jinggang Huang and David Mumford. “Statistics of natural images and models”. Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149). Vol. 1. IEEE. 1999, pp. 541–547.
[HW59]: D.H. Hubel and T.N. Wiesel. “eceptive fields of single neurones in the cat’s striate cortex”. J. Physiol. 148.3 (1959), pp. 574–591.
[HCS+24]: Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. “Sparse Autoencoders Find Highly Interpretable Features in Language Models”. The Twelfth International Conference on Learning Representations. 2024.
[HKV19]: Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, eds. Automatic Machine Learning: Methods, Systems, Challenges. Springer, 2019.
[HKJ+21]: Uiwon Hwang, Heeseung Kim, Dahuin Jung, Hyemi Jang, Hyungyu Lee, and Sungroh Yoon. “Stein Latent Optimization for Generative Adversarial Networks”. arXiv preprint arXiv:2106.05319 (2021).
[Hyv05]: Aapo Hyvärinen. “Estimation of Non-Normalized Statistical Models by Score Matching”. Journal of Machine Learning Research 6.24 (2005), pp. 695–709.
[HO97]: Aapo Hyvärinen and Erkki Oja. “A Fast Fixed-Point Algorithm for Independent Component Analysis”. Neural Computation 9.7 (1997), pp. 1483–1492.
[HO00a]: Aapo Hyvärinen and Erkki Oja. “Independent Component Analysis: Algorithms and Applications”. Neural Networks 13.4-5 (2000), pp. 411–430.
[HO00b]: Aapo Hyvärinen and Erkki Oja. “Independent component analysis: algorithms and applications”. Neural Networks 13.4 (2000), pp. 411–430.
[IWW+21]: Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. OpenCLIP. Version 0.1. If you use this software, please cite it as below. July 2021.
[IS15]: Sergey Ioffe and Christian Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML. 2015, pp. 448–456.
[JGB+21]: Andrew Jaegle, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. “Perceiver: General Perception with Iterative Attention”. Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 4651–4664.
[JAD+21]: Ajil Jalal, Marius Arvinte, Giannis Daras, Eric Price, Alexandros G Dimakis, and Jon Tamir. “Robust Compressed Sensing MRI with Deep Generative Priors”. Advances in Neural Information Processing Systems. Ed. by M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan. Vol. 34. Curran Associates, Inc., 2021, pp. 14938–14954.
[Jam15]: G J O Jameson. “A simple proof of Stirling’s formula for the gamma function”. The Mathematical Gazette 99.544 (Mar. 2015), pp. 68–74.
[JRR+24]: Yibo Jiang, Goutham Rajendran, Pradeep Kumar Ravikumar, Bryon Aragam, and Victor Veitch. “On the Origins of Linear Representations in Large Language Models”. International Conference on Machine Learning. Vol. 235. PMLR, 2024, pp. 21879–21911.
[JJT+24]: Haian Jin, Hanwen Jiang, Hao Tan, Kai Zhang, Sai Bi, Tianyuan Zhang, Fujun Luan, Noah Snavely, and Zexiang Xu. “Lvsm: A large view synthesis model with minimal 3d inductive bias”. arXiv preprint arXiv:2410.17242 (2024).
[Jol02]: I. Jollife. Principal Component Analysis. 2nd. Springer-Verlag, 2002.
[Jol86]: I. Jolliffe. Principal Component Analysis. New York, NY: Springer-Verlag, 1986.
[Jon82]: Douglas Samuel Jones. The theory of generalised functions. Cambridge University Press, 1982.
[JJB+]: Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cecista, Laker Newhouse, and Jeremy Bernstein. “Muon: An optimizer for hidden layers in neural networks, 2024”. URL https://kellerjordan. github. io/posts/muon 6 ().
[JT20]: Sheena A. Josselyn and Susumu Tonegawa. “Memory engrams: Recalling the past and imagining the future”. Science 367 (2020).
[KS21]: Z Kadkhodaie and E P Simoncelli. “Stochastic solutions for linear inverse problems using the prior implicit in a denoiser”. Adv. Neural Information Processing Systems (NeurIPS). Ed. by M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan. Vol. 34. Curran Associates, Inc., 2021.
[Kal60]: Rudolph Emil Kalman. “A new approach to linear filtering and prediction problems” (1960).
[KG24]: Mason Kamb and Surya Ganguli. “An analytic theory of creativity in convolutional diffusion models”. arXiv preprint arXiv:2412.20292 (2024).
[KMH+20]: Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. “Scaling Laws for Neural Language Models”. arXiv preprint arXiv:2001.08361 (2020).
[Kar22a]: Andrej Karpathy. nanoGPT. https://github.com/karpathy/nanoGPT. 2022.
[Kar22b]: Andrej Karpathy. The spelled-out intro to neural networks and backpropagation: building micrograd. YouTube. Aug. 16, 2022. url: https://www.youtube.com/watch?v=VMj-3S1tku0 (visited on 08/17/2025).
[KK18]: Ronald Kemker and Christopher Kanan. “FearNet: Brain-Inspired Model for Incremental Learning”. International Conference on Learning Representations. 2018.
[KKL+23]: Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. “3D Gaussian splatting for real-time radiance field rendering.” ACM Trans. Graph. 42.4 (2023), pp. 139–1.
[KB14]: Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. arXiv preprint arXiv:1412.6980 (2014).
[KW13a]: Diederik P Kingma and Max Welling. “Auto-Encoding Variational Bayes”. arXiv [stat.ML] (Dec. 2013). arXiv: 1312.6114v11 [stat.ML].
[KW13b]: Diederik P Kingma and Max Welling. “Auto-encoding variational bayes”. arXiv preprint arXiv:1312.6114 (2013).
[KW19]: Diederik P Kingma and Max Welling. “An introduction to variational autoencoders”. Foundations and Trends® in Machine Learning 12.4 (2019), pp. 307–392.
[KPR+17]: James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. “Overcoming catastrophic forgetting in neural networks”. Proceedings of the national academy of sciences 114.13 (2017), pp. 3521–3526.
[KUM+17]: Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. “Self-normalizing neural networks”. Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, pp. 972–981.
[Kli11]: Ronald Kline. “Cybernetics, Automata Studies, and the Dartmouth Conference on Artificial Intelligence”. IEEE Annals of the History of Computing 33.4 (2011), pp. 5–16.
[KTV18]: Artemy Kolchinsky, Brendan D Tracey, and Steven Van Kuyk. “Caveats for information bottleneck in deterministic scenarios”. arXiv preprint arXiv:1808.07593 (2018).
[Kol98]: Andrei N. Kolmogorov. “On Tables of Random Numbers (Reprinted from ”Sankhya: The Indian Journal of Statistics”, Series A, Vol. 25 Part 4, 1963)”. Theor. Comput. Sci. 207 (1998), pp. 387–395.
[KS12]: Irwin Kra and Santiago R Simanca. “On Circulant Matrices”. Notices of the American Mathematical Society 59 (2012), pp. 368–377.
[Kra91]: Mark A Kramer. “Nonlinear principal component analysis using autoassociative neural networks”. AIChE Journal 37.2 (1991), pp. 233–243.
[KH+09]: Alex Krizhevsky, Geoffrey Hinton, et al. “Learning multiple layers of features from tiny images” (2009).
[KNH14]: Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. “The CIFAR-10 dataset”. online: http://www.cs.toronto.edu/kriz/cifar.html 55 (2014).
[KSH12]: Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “Imagenet classification with deep convolutional neural networks”. Advances in neural information processing systems. 2012, pp. 1097–1105.
[KZZ+23]: Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. “Multi-concept customization of text-to-image diffusion”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 1931–1941.
[Lab24]: Black Forest Labs. FLUX. https://github.com/black-forest-labs/flux. 2024.
[LBB+25]: Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. “FLUX.1 Kontext: Flow matching for in-context image generation and editing in latent space”. arXiv [cs.GR] (June 2025). arXiv: 2506.15742 [cs.GR].
[LRM+12]: Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeff Dean, and Andrew Y. Ng. “Building high-level features using large scale unsupervised learning”. Proceedings of the 29th International Coference on International Conference on Machine Learning. ICML’12. Edinburgh, Scotland: Omnipress, 2012, pp. 507–514.
[LBD+89]: Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. “Backpropagation Applied to Handwritten Zip Code Recognition”. Neural Computation 1.4 (1989), pp. 541–551.
[LBB+98a]: Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition”. Proceedings of the IEEE 86.11 (1998), pp. 2278–2324.
[LBB+98b]: Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. “Gradient-based learning applied to document recognition”. Proceedings of the IEEE 86.11 (1998), pp. 2278–2324.
[LCH+06]: Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu-Jie Huang. “A Tutorial on Energy-Based Learning”. Jan. 2006.
[LPM03]: Ann Lee, Kim Pedersen, and David Mumford. “The Nonlinear Statistics of High-Contrast Patches in Natural Images”. International Journal of Computer Vision 54 (Aug. 2003).
[LSJ+16]: Jason D Lee, Max Simchowitz, Michael I Jordan, and Benjamin Recht. “Gradient descent only converges to minimizers”. Conference on learning theory. PMLR. 2016, pp. 1246–1257.
[Lee02]: John M. Lee. “Introduction to Smooth Manifolds”. 2002.
[LMH+18]: Jaakko Lehtinen, Jacob Munkberg, Jon Hasselgren, Samuli Laine, Tero Karras, Miika Aittala, and Timo Aila. “Noise2Noise: Learning Image Restoration without Clean Data”. Proceedings of the 35th International Conference on Machine Learning. Ed. by Jennifer Dy and Andreas Krause. Vol. 80. Proceedings of Machine Learning Research. PMLR, July 2018, pp. 2965–2974.
[LY24]: Gen Li and Yuling Yan. “O (d/T) convergence theory for diffusion probabilistic models under minimal assumptions”. arXiv preprint arXiv:2409.18959 (2024).
[LFD+22]: Haochuan Li, Farzan Farnia, Subhro Das, and Ali Jadbabaie. “On convergence of gradient descent ascent: A tight local analysis”. International Conference on Machine Learning. PMLR. 2022, pp. 12717–12740.
[Li17]: Xi-Lin Li. “Preconditioned stochastic gradient descent”. IEEE transactions on neural networks and learning systems 29.5 (2017), pp. 1454–1466.
[LTP25]: Qinyu Li, Yee Whye Teh, and Razvan Pascanu. “NoProp: Training Neural Networks without Back-propagation or Forward-propagation”. arXiv preprint arXiv:2503.24322 (2025).
[LH25]: Tianhong Li and Kaiming He. “Back to basics: Let denoising generative models denoise”. arXiv preprint arXiv:2511.13720 (2025).
[LZQ24]: Wenda Li, Huijie Zhang, and Qing Qu. “Shallow diffuse: Robust and invisible watermarking through low-dimensional subspaces in diffusion models”. arXiv preprint arXiv:2410.21088 (2024).
[LWQ25]: Xiang Li, Rongrong Wang, and Qing Qu. “Towards Understanding the Mechanisms of Classifier-Free Guidance”. arXiv [cs.CV] (May 2025). arXiv: 2505.19210 [cs.CV].
[LB19]: Yanjun Li and Yoram Bresler. “Multichannel sparse blind deconvolution on the sphere”. IEEE Transactions on Information Theory 65.11 (2019), pp. 7415–7436.
[LCW+24]: Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, and Ying Shan. “Photomaker: Customizing realistic human photos via stacked id embedding”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024, pp. 8640–8650.
[LRZ+12]: Xiao Liang, Xiang Ren, Zhengdong Zhang, and Yi Ma. “Repairing Sparse Low-Rank Texture”. Computer Vision – ECCV 2012. Ed. by Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 482–495.
[LMB+14]: Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. “Microsoft coco: Common objects in context”. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer. 2014, pp. 740–755.
[LZ94]: T Linder and R Zamir. “On the asymptotic tightness of the Shannon lower bound”. IEEE transactions on information theory 40.6 (1994), pp. 2026–2031.
[Lin88]: R. Linsker. “Self-organization in a perceptual network”. Computer 21.3 (1988), pp. 105–117.
[LCB+23]: Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. “Flow Matching for Generative Modeling”. The Eleventh International Conference on Learning Representations. 2023.
[LMR17]: Anna V. Little, Mauro Maggioni, and Lorenzo Rosasco. “Multiscale geometric methods for data sets I: Multiscale SVD, noise and curvature”. Applied and Computational Harmonic Analysis 43.3 (2017), pp. 504–567.
[LMZ+24]: Jiacheng Liu, Sewon Min, Luke Zettlemoyer, Yejin Choi, and Hannaneh Hajishirzi. “Infini-gram: Scaling unbounded n-gram language models to a trillion tokens”. arXiv preprint arXiv:2401.17377 (2024).
[LSY+25]: Jingyuan Liu, Jianlin Su, Xingcheng Yao, Zhejun Jiang, Guokun Lai, Yulun Du, Yidao Qin, Weixin Xu, Enzhe Lu, Junjie Yan, et al. “Muon is Scalable for LLM Training”. arXiv preprint arXiv:2502.16982 (2025).
[LV09]: Z. Liu and L. Vandenberghe. “Semidefinite programming methods for system realization and identification”. Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference. 2009, pp. 4676–4681.
[LV10]: Zhang. Liu and Lieven. Vandenberghe. “Interior-Point Method for Nuclear Norm Approximation with Application to System Identification”. SIAM Journal on Matrix Analysis and Applications 31.3 (2010), pp. 1235–1256. eprint: https://doi.org/10.1137/090755436.
[LMR+15]: Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J Black. “SMPL: A skinned multi-person linear model”. ACM Transactions on Graphics (TOG) 34.6 (2015), pp. 1–16.
[LC98]: William E. Lorensen and Harvey E. Cline. “Marching cubes: a high resolution 3D surface construction algorithm”. Seminal Graphics: Pioneering Efforts That Shaped the Field, Volume 1. New York, NY, USA: Association for Computing Machinery, 1998, pp. 347–353.
[LH17]: Ilya Loshchilov and Frank Hutter. “Decoupled Weight Decay Regularization”. International Conference on Learning Representations. 2017.
[LH19]: Ilya Loshchilov and Frank Hutter. “Decoupled Weight Decay Regularization”. arXiv preprint arXiv:1711.05101 (2019).
[MYP+10]: M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre. “Generalized power method for sparse principal component analysis”. Journal of Machine Learning Research 11 (2010), pp. 517–553.
[MDH+07a]: Y. Ma, H. Derksen, W. Hong, and J. Wright. “Segmentation of multivariate mixed data via lossy coding and compression”. To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence (2007).
[MKS+04]: Y. Ma, J. Košecká, S. Soatto, and S. Sastry. An Invitation to 3-D Vision, From Images to Models. New York: Springer-Verlag, 2004.
[MDH+07b]: Yi Ma, Harm Derksen, Wei Hong, and John Wright. “Segmentation of multivariate mixed data via lossy data coding and compression”. IEEE transactions on pattern analysis and machine intelligence 29.9 (2007), pp. 1546–1562.
[MHN13]: Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. “Rectifier nonlinearities improve neural network acoustic models”. Proc. ICML. Vol. 30. Citeseer. 2013, p. 3.
[MGT+19]: Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Gerard Pons-Moll, and Michael J Black. “AMASS: Archive of motion capture as surface shapes”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, pp. 5442–5451.
[MBP14]: Julien Mairal, Francis Bach, and Jean Ponce. “Sparse Modeling for Image and Vision Processing”. Foundations and Trends® in Computer Graphics and Vision 8.2-3 (2014), pp. 85–283.
[MSM93]: Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. “Building a Large Annotated Corpus of English: The Penn T reebank”. Computational Linguistics 19.2 (1993). Ed. by Julia Hirschberg, pp. 313–330.
[Mar06]: Andrei Andreevich Markov. “An example of statistical investigation of the text Eugene Onegin concerning the connection of samples in chains”. Science in Context 19.4 (2006), pp. 591–600.
[MBD+21]: James Martens, Andy Ballard, Guillaume Desjardins, Grzegorz Swirszcz, Valentin Dalibard, Jascha Sohl-Dickstein, and Samuel S Schoenholz. “Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping”. arXiv [cs.LG] (Oct. 2021). arXiv: 2110.01765 [cs.LG].
[MS19]: David McAllester and Karl Stratos. Formal Limitations on the Measurement of Mutual Information. 2019.
[MMR+06]: John McCarthy, Marvin L. Minsky, Nathaniel Rochester, and Claude E. Shannon. “A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence: August 31, 1955”. AI Mag. 27.4 (Dec. 2006), pp. 12–14.
[MC89]: Michael McCloskey and Neal J Cohen. “Catastrophic interference in connectionist networks: The sequential learning problem”. Psychology of learning and motivation. Vol. 24. Elsevier, 1989, pp. 109–165.
[MP43]: Warren McCulloch and Walter Pitts. “A Logical Calculus of the Ideas Immanent in Nervous Activity”. Bulletin of Mathematical Biophysics 5 (1943), pp. 115–133.
[MM70]: Jerry M. Mendel and Robert W. Mclaren. “Reinforcement-learning control and pattern recognition systems”. In Mendel, J. M. and Fu, K. S., editors, Adaptive, Learning and Pattern Recognition Systems: Theory and Applications (1970), pp. 287–318.
[MXB+16]: Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer Sentinel Mixture Models. 2016. arXiv: 1609.07843 [cs.CL].
[MM12]: Stephan Mertens and Cristopher Moore. “Continuum percolation thresholds in two dimensions”. Phys. Rev. E 86 (6 Dec. 2012), p. 061109.
[MON+19]: Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas Geiger. “Occupancy Networks: Learning 3D Reconstruction in Function Space”. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019, pp. 4455–4465.
[MSS+22]: Beren Millidge, Tommaso Salvatori, Yuhang Song, Rafal Bogacz, and Thomas Lukasiewicz. “Predictive coding: Towards a future of deep learning beyond backpropagation?” arXiv preprint arXiv:2202.09467 (2022).
[Min54]: Marvin Minsky. “Theory of Neural-Analog Reinforcement Systems and its Application to the Brain-Model Problem”. PhD thesis. Princeton Univerity, 1954.
[MP69]: Marvin Minsky and Seymour Papert. Perceptrons: An Introduction to Computational Geometry. the MIT Press, 1969.
[Mir60]: L Mirsky. “SYMMETRIC GAUGE FUNCTIONS AND UNITARILY INVARIANT NORMS”. The Quarterly Journal of Mathematics 11.1 (Jan. 1960), pp. 50–59.
[MCS+22]: Paritosh Mittal, Yen-Chi Cheng, Maneesh Singh, and Shubham Tulsiani. “Autosdf: Shape priors for 3d completion, reconstruction and generation”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022, pp. 306–315.
[Miy61]: K. Miyasawa. “An empirical bayes estimator of the mean of a normal population”. Bull. Inst. Internat. Statist. 38 (1961).
[MKK+18]: Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. “Spectral normalization for generative adversarial networks”. arXiv preprint arXiv:1802.05957 (2018).
[MRY+11]: Hossein Mobahi, Shankar Rao, Allen Yang, Shankar Sastry, and Yi Ma. “Segmentation of Natural Images by Texture and Boundary Compression”. the International Journal of Computer Vision 95.1 (2011), pp. 86–98.
[MLE19]: Vishal Monga, Yuelong Li, and Yonina C Eldar. “Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing”. arXiv preprint arXiv:1912.10557 (2019).
[MÁM+24]: Edmund RR Moody, Sandra Álvarez-Carretero, Tara A Mahendrarajah, James W Clark, Holly C Betts, Nina Dombrowski, Lénárd L Szánthó, Richard A Boyle, Stuart Daines, Xi Chen, et al. “The nature of the last universal common ancestor and its impact on the early Earth system”. Nature Ecology & Evolution 8.9 (2024), pp. 1654–1666.
[MKW+22]: Norman Mu, Alexander Kirillov, David Wagner, and Saining Xie. “SLIP: Self-supervision Meets Language-Image Pre-training”. Computer Vision – ECCV 2022. Ed. by Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner. Cham: Springer Nature Switzerland, 2022, pp. 529–544.
[MKH19]: Rafael Müller, Simon Kornblith, and Geoffrey E Hinton. “When does label smoothing help?” Advances in neural information processing systems 32 (2019).
[Mum96]: David Mumford. “The Statistical Description of Visual Signals”. 1996.
[MG99]: David Mumford and Basilis Gidas. “Stochastic Models for Generic Images”. Quarterly of Applied Mathematics 59 (July 1999).
[MK07]: Joseph F Murray and Kenneth Kreutz-Delgado. “Learning sparse overcomplete codes for images”. The Journal of VLSI Signal Processing Systems for Signal Image and Video Technology 46.1 (Mar. 2007), pp. 1–13.
[MLS94]: R. Murray, Zexiang Li, and S. Sastry. A Mathematical Introduction to Robotic Manipulation. Florida: CRC Press, 1994.
[NDE+13]: S. Nam, M.E. Davies, M. Elad, and R. Gribonval. “The cosparse analysis model and algorithms”. Applied and Computational Harmonic Analysis 34.1 (2013), pp. 30–56.
[NGE+20]: Charlie Nash, Yaroslav Ganin, SM Ali Eslami, and Peter Battaglia. “Polygen: An autoregressive generative model of 3d meshes”. International conference on machine learning. PMLR. 2020, pp. 7220–7229.
[NMR44]: John von Neumann, Oskar Morgenstern, and Ariel Rubinstein. Theory of Games and Economic Behavior (60th Anniversary Commemorative Edition). Princeton University Press, 1944.
[Neu58]: John von Neumann. The computer and the brain. USA: Yale University Press, 1958.
[NJD+22]: Alex Nichol, Heewoo Jun, Prafulla Dhariwal, Pamela Mishkin, and Mark Chen. “Point-e: A system for generating 3d point clouds from complex prompts”. arXiv preprint arXiv:2212.08751 (2022).
[ND21]: Alexander Quinn Nichol and Prafulla Dhariwal. “Improved Denoising Diffusion Probabilistic Models”. International Conference on Machine Learning (ICML). 2021.
[NZM+24]: Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy, and Frank Wood. “Towards a Mechanistic Explanation of Diffusion Model Generalization”. arXiv preprint arXiv:2411.19339 (2024).
[NMM19]: Oliver Nina, Jamison Moody, and Clarissa Milligan. “A Decoder-Free Approach for Unsupervised Clustering and Manifold Learning with Random Triplet Mining”. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE Computer Society. 2019, pp. 3987–3994.
[NW06]: Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.
[NIG+18]: Chigozie Nwankpa, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall. “Activation functions: Comparison of trends in practice and research for deep learning”. arXiv preprint arXiv:1811.03378 (2018).
[Oja82]: Erkki Oja. “A simplified neuron model as a principal component analyzer”. Journal of Mathematical Biology 15 (1982), pp. 267–273.
[OLC+25]: Kazusato Oko, Licong Lin, Yuhang Cai, and Song Mei. “A statistical theory of contrastive pre-training and multimodal generative AI”. arXiv [cs.LG] (Jan. 2025). arXiv: 2501.04641 [cs.LG].
[OF97]: B A Olshausen and D J Field. “Sparse coding with an overcomplete basis set: a strategy employed by V1?” Vision research 37.23 (Dec. 1997), pp. 3311–3325.
[OF96]: Bruno A Olshausen and David J Field. “Emergence of simple-cell receptive field properties by learning a sparse code for natural images”. Nature 381 (June 1996), p. 607.
[OLV18]: Aaron van den Oord, Yazhe Li, and Oriol Vinyals. “Representation Learning with Contrastive Predictive Coding”. arXiv [cs.LG] (July 2018). arXiv: 1807.03748 [cs.LG].
[OVK17]: Aaron van den Oord, Oriol Vinyals, and Koray Kavukcuoglu. “Neural discrete representation learning”. arXiv [cs.LG] (Nov. 2017). arXiv: 1711.00937 [cs.LG].
[Ope24]: OpenAI. Sora: Creating video from text. https://openai.com/sora. 2024.
[ODM+23]: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. “Dinov2: Learning robust visual features without supervision”. arXiv preprint arXiv:2304.07193 (2023).
[ODM+24]: Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy Vo, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick Labatut, Armand Joulin, and Piotr Bojanowski. DINOv2: Learning Robust Visual Features without Supervision. 2024. arXiv: 2304.07193 [cs.CV].
[PBW+24]: Druv Pai, Sam Buchanan, Ziyang Wu, Yaodong Yu, and Yi Ma. “Masked Completion via Structured Diffusion with White-Box Transformers”. The Twelfth International Conference on Learning Representations. 2024.
[PPC+23]: Druv Pai, Michael Psenka, Chih-Yuan Chiu, Manxi Wu, Edgar Dobriban, and Yi Ma. “Pursuit of a discriminative representation for multiple subspaces via sequential games”. Journal of the Franklin Institute 360.6 (2023), pp. 4135–4171.
[PCY+23]: Xiaqing Pan, Nicholas Charron, Yongqian Yang, Scott Peters, Thomas Whelan, Chen Kong, Omkar Parkhi, Richard Newcombe, and Yuheng Carl Ren. “Aria Digital Twin: A New Benchmark Dataset for Egocentric 3D Machine Perception”. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023, pp. 20133–20143.
[PKL+16]: Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. The LAMBADA dataset: Word prediction requiring a broad discourse context. 2016. arXiv: 1606.06031 [cs.CL].
[PHD20]: Vardan Papyan, XY Han, and David L Donoho. “Prevalence of Neural Collapse during the terminal phase of deep learning training”. arXiv preprint arXiv:2008.08186 (2020).
[PRE17]: Vardan Papyan, Yaniv Romano, and Michael Elad. “Convolutional neural networks analyzed via convolutional sparse coding”. The Journal of Machine Learning Research 18.1 (2017), pp. 2887–2938.
[PCV24]: Kiho Park, Yo Joong Choe, and Victor Veitch. “The Linear Representation Hypothesis and the Geometry of Large Language Models”. International Conference on Machine Learning. PMLR. 2024, pp. 39643–39666.
[Par04]: Andrew Parker. In The Blink Of An Eye: How Vision Sparked The Big Bang Of Evolution. Basic Books, 2004.
[PSR+24]: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, and Jitendra Malik. “Reconstructing Hands in 3D with Transformers”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024.
[Pea09]: Judea Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2009.
[Pea01]: K. Pearson. “On Lines and Planes of Closest Fit to Systems of Points in Space”. Philosophical Magazine 2.6 (1901), pp. 559–572.
[PX23]: William Peebles and Saining Xie. “Scalable diffusion models with transformers”. Proceedings of the IEEE/CVF international conference on computer vision. 2023, pp. 4195–4205.
[PV25]: Liangzu Peng and René Vidal. “Mathematics of continual learning”. arXiv preprint arXiv:2504.17963 (2025).
[PNM+20]: Songyou Peng, Michael Niemeyer, Lars Mescheder, Marc Pollefeys, and Andreas Geiger. “Convolutional Occupancy Networks”. Computer Vision – ECCV 2020. Ed. by Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm. Cham: Springer International Publishing, 2020, pp. 523–540.
[PSV+18]: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. “FiLM: Visual Reasoning with a General Conditioning Layer”. AAAI Conference on Artificial Intelligence. 2018.
[PRR+22]: Ed Pizzi, Sreya Dutta Roy, Sugosh Nagavara Ravindra, Priya Goyal, and Matthijs Douze. “A self-supervised descriptor for image copy detection”. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022, pp. 14532–14542.
[Pla99]: S. E. Plamer. Vision Science: Photons to Phenomenology. The MIT Press, 1999.
[PEL+23]: Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. “Sdxl: Improving latent diffusion models for high-resolution image synthesis”. arXiv preprint arXiv:2307.01952 (2023).
[PW22]: Yuri Poliyanski and Yihong Wu. Information Theory: From Coding to Learning. Cambridge University Press, 2022.
[PPR+24]: Michael Psenka, Druv Pai, Vishal Raman, Shankar Sastry, and Yi Ma. “Representation Learning via Manifold Flattening and Reconstruction”. Journal of Machine Learning Research 25.132 (2024), pp. 1–47.
[QSM+17]: Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. “Pointnet: Deep learning on point sets for 3d classification and segmentation”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017, pp. 652–660.
[QYS+17]: Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. “PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space”. Advances in Neural Information Processing Systems. Ed. by I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Vol. 30. Curran Associates, Inc., 2017.
[QLZ19]: Qing Qu, Xiao Li, and Zhihui Zhu. “A nonconvex approach for exact and efficient multichannel sparse blind deconvolution”. Advances in Neural Information Processing Systems. 2019, pp. 4017–4028.
[QLZ20]: Qing Qu, Xiao Li, and Zhihui Zhu. “Exact Recovery of Multichannel Sparse Blind Deconvolution via Gradient Descent”. SIAM Journal on Imaging Sciences 13.3 (2020), pp. 1630–1652.
[QZL+20a]: Qing Qu, Yuexiang Zhai, Xiao Li, Yuqian Zhang, and Zhihui Zhu. “Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning”. International Conference on Learning Representations. 2020.
[QZL+20b]: Qing Qu, Zhihui Zhu, Xiao Li, Manolis C. Tsakiris, John Wright, and René Vidal. Finding the Sparsest Vectors in a Subspace: Theory, Algorithms, and Applications. 2020. arXiv: 2001.06970 [cs.LG].
[RD03]: R. Basri and D. Jacobs. “Lambertian reflectance and linear subspaces”. IEEE Transactions on Pattern Analysis and Machine Intelligence 25.2 (2003), pp. 218–233.
[RKH+21a]: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. “Learning Transferable Visual Models From Natural L anguage Supervision”. Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 8748–8763.
[RKH+21b]: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. “Learning Transferable Visual Models From Natural Language Supervision”. International Conference on Machine Learning (ICML). 2021.
[RKH+21c]: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. “Learning Transferable Visual Models From Natural Language Supervision”. Proceedings of the 38th International Conference on Machine Learning. Ed. by Marina Meila and Tong Zhang. Vol. 139. Proceedings of Machine Learning Research. PMLR, 2021, pp. 8748–8763.
[RMC16]: Alec Radford, Luke Metz, and Soumith Chintala. “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”. arXiv preprint arXiv:1511.06434 (2016). arXiv: 1511.06434 [cs.LG].
[RWC+19]: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. “Language models are unsupervised multitask learners”. OpenAI blog 1.8 (2019), p. 9.
[RSR+20]: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. “Exploring the limits of transfer learning with a unified text-to-text transformer”. Journal of machine learning research 21.140 (2020), pp. 1–67.
[RDN+22]: Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. “Hierarchical Text-Conditional Image Generation with CLIP Latents”. arXiv [cs.CV] (Apr. 2022). arXiv: 2204.06125 [cs.CV].
[RPC+06]: Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. “Efficient Learning of Sparse Representations with an Energy-Based Model”. Advances in Neural Information Processing Systems. Ed. by B Schölkopf, J Platt, and T Hoffman. Vol. 19. MIT Press, 2006.
[RS11]: M Raphan and E P Simoncelli. “Least squares estimation without priors or supervision”. Neural Computation 23.2 (2011). Published online, Nov 2010., pp. 374–420.
[RKS+17]: Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. “iCaRL: Incremental classifier and representation learning”. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017, pp. 2001–2010.
[RAG+24]: Lucas Relic, Roberto Azevedo, Markus Gross, and Christopher Schroers. “Lossy image compression with foundation diffusion models”. European Conference on Computer Vision. Springer. 2024, pp. 303–319.
[RBK18]: Erwin Riegler, Helmut Bolcskei, and Gunther Koliander. “Rate-distortion theory for general sets and measures”. 2018 IEEE International Symposium on Information Theory (ISIT). IEEE, June 2018, pp. 101–105.
[RKB23]: Erwin Riegler, Günther Koliander, and Helmut Bölcskei. “Lossy compression of general random variables”. Information and inference: a journal of the IMA 12.3 (Apr. 2023), pp. 1759–1829.
[Ris78]: J. Rissanen. “Paper: Modeling by shortest data description”. Automatica 14.5 (1978), pp. 465–471.
[Rob56]: Herbert E. Robbins. “An Empirical Bayes Approach to Statistics”. 1956.
[RBL+22]: Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. “High-Resolution Image Synthesis with Latent Diffusion M odels”. IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 2022, pp. 10674–10685.
[RTB17]: Javier Romero, Dimitrios Tzionas, and Michael J Black. “Embodied hands: modeling and capturing hands and bodies together”. ACM Transactions on Graphics (TOG) 36.6 (2017), pp. 1–17.
[RFB15a]: Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Springer, 2015, pp. 234–241.
[RFB15b]: Olaf Ronneberger, Philipp Fischer, and Thomas Brox. “U-net: Convolutional networks for biomedical image segmentation”. International Conference on Medical image computing and computer-assisted intervention. Springer. 2015, pp. 234–241.
[Ros57]: Frank Rosenblatt. The perceptron: A perceiving and recognizing automaton. Report. Project PARA, Cornell Aeronautical Laboratory, Jan. 1957.
[RRD+23]: Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alex Dimakis, and Sanjay Shakkottai. “Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models”. Thirty-seventh Conference on Neural Information Processing Systems. 2023.
[Row97]: Sam T. Roweis. “EM Algorithms for PCA and SPCA”. Advances in Neural Information Processing Systems 10, [NIPS Conference, Denver, Colorado, USA, 1997]. The MIT Press, 1997, pp. 626–632.
[RAL+24]: François Rozet, Gérôme Andry, Francois Lanusse, and Gilles Louppe. “Learning Diffusion Priors from Observations by Expectation Maximization”. The Thirty-eighth Annual Conference on Neural Information Processing Systems. 2024.
[RE14]: R. Rubinstein and M. Elad. “Dictionary Learning for Analysis-Synthesis Thresholding”. IEEE Transactions on Signal Processing 62.22 (2014), pp. 5962–5972.
[RLJ+23]: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 22500–22510.
[RHW86a]: D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning internal representations by error propagation”. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1: Foundations. Cambridge, MA, USA: MIT Press, 1986, pp. 318–362.
[RHW86b]: David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. “Learning representations by back-propagating errors”. Nature 323.6088 (Oct. 1986), pp. 533–536.
[SCS+22]: Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Seyed Kamyar Seyed Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. “Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding”. NeurIPS. 2022.
[SGZ+16]: Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. “Improved Techniques for Training GANs”. Advances in Neural Information Processing Systems. Ed. by D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett. Vol. 29. Curran Associates, Inc., 2016.
[Sas99]: Shankar Sastry. Nonlinear Systems: Analysis, Stability, and Control. Springer, 1999.
[SDL+25]: Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li, Sanjiv Kumar, and Sashank J Reddi. “Reasoning with latent thoughts: On the power of looped transformers”. arXiv preprint arXiv:2502.17416 (2025).
[SHT+25]: Fabian Schaipp, Alexander Hägele, Adrien Taylor, Umut Simsekli, and Francis Bach. “The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training”. arXiv preprint arXiv:2501.18965 (2025).
[SMB10]: Dominik Scherer, Andreas Müller, and Sven Behnke. “Evaluation of pooling operations in convolutional architectures for object recognition”. International conference on artificial neural networks. Springer. 2010, pp. 92–101.
[Sch14]: Denise Schmandt-Besserat. “The evolution of writing”. International encyclopedia of social and behavioral sciences (2014), pp. 1–15.
[SBV+22]: Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, Patrick Schramowski, Srivatsa Kundurthy, Katherine Crowson, Ludwig Schmidt, Robert Kaczmarczyk, and Jenia Jitsev. LAION-5B: An open large-scale dataset for training next generation image-text models. 2022. arXiv: 2210.08402 [cs.CV].
[SBZ+25]: Jay Shah, Ganesh Bikshandi, Ying Zhang, Vijay Thakkar, Pradeep Ramani, and Tri Dao. “Flashattention-3: Fast and accurate attention with asynchrony and low-precision”. Advances in Neural Information Processing Systems 37 (2025), pp. 68658–68685.
[Sha48]: C. E. Shannon. “A mathematical theory of communication”. The Bell System Technical Journal 27.3 (1948), pp. 379–423.
[Sha59]: Claude E Shannon. “Coding theorems for a discrete source with a fidelity criterion”. IRE Nat. Conv. Rec 4.142-163 (1959), p. 1.
[SMM+17]: Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. “Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer”. ICLR. 2017.
[SPX+22]: Zifan Shi, Sida Peng, Yinghao Xu, Andreas Geiger, Yiyi Liao, and Yujun Shen. “Deep generative models on 3d representations: A survey”. arXiv preprint arXiv:2210.15663 (2022).
[SZ14]: Karen Simonyan and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition”. arXiv preprint arXiv:1409.1556 (2014).
[SZ15]: Karen Simonyan and Andrew Zisserman. “Very Deep Convolutional Networks for Large-Scale Image Recognition”. International Conference on Learning Representations. 2015.
[SWM+15]: Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. “Deep Unsupervised Learning using Nonequilibrium Thermodynamics”. Proceedings of the 32nd International Conference on Machine Learning. Ed. by Francis Bach and David Blei. Vol. 37. Proceedings of Machine Learning Research. Lille, France: PMLR, 2015, pp. 2256–2265.
[SKZ+24]: Bowen Song, Soo Min Kwon, Zecheng Zhang, Xinyu Hu, Qing Qu, and Liyue Shen. “Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency”. The Twelfth International Conference on Learning Representations. 2024.
[SME20]: Jiaming Song, Chenlin Meng, and Stefano Ermon. “Denoising diffusion implicit models”. arXiv preprint arXiv:2010.02502 (2020).
[SKC+25]: Kiwhan Song, Jaeyeon Kim, Sitan Chen, Yilun Du, Sham Kakade, and Vincent Sitzmann. “Selective Underfitting in Diffusion Models”. arXiv preprint arXiv:2510.01378 (2025).
[SE19]: Yang Song and Stefano Ermon. “Generative Modeling by Estimating Gradients of the Data Distribution”. Advances in Neural Information Processing Systems. Ed. by H. Wallach, H. Larochelle, A. Beygelzimer, F. d ’Alché-Buc, E. Fox, and R. Garnett. Vol. 32. Curran Associates, Inc., 2019.
[SSX+22]: Yang Song, Liyue Shen, Lei Xing, and Stefano Ermon. “Solving Inverse Problems in Medical Imaging with Score-Based Generative Models”. International Conference on Learning Representations. 2022.
[SSK+21]: Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. “Score-Based Generative Modeling through Stochastic Differential Equations”. 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
[SWW12]: Daniel A Spielman, Huan Wang, and John Wright. “Exact Recovery of Sparsely-Used Dictionaries”. Proceedings of the 25th Annual Conference on Learning Theory. Ed. by Shie Mannor, Nathan Srebro, and Robert C Williamson. Vol. 23. Proceedings of Machine Learning Research. Edinburgh, Scotland: PMLR, 2012, pp. 37.1–37.18.
[SHK+14]: Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. “Dropout: a simple way to prevent neural networks from overfitting”. The journal of machine learning research 15.1 (2014), pp. 1929–1958.
[SQW15]: Ju Sun, Qing Qu, and John Wright. “When are nonconvex problems not scary?” arXiv preprint arXiv:1510.06096 (2015).
[SQW17a]: Ju Sun, Qing Qu, and John Wright. “Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture”. IEEE Transactions on Information Theory 63.2 (2017), pp. 853–884.
[SQW17b]: Ju Sun, Qing Qu, and John Wright. “Complete dictionary recovery over the sphere I: Overview and the geometric picture”. IEEE Transactions on Information Theory 63.2 (2017), pp. 853–884.
[SB18]: Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book, 2018.
[SLJ+14]: Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, D. Erhan, Vincent Vanhoucke, and Andrew Rabinovich. “Going deeper with convolutions”. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014), pp. 1–9.
[TZL+25]: Kimi Team, Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, et al. “Kimi Linear: An Expressive, Efficient Attention Architecture”. arXiv preprint arXiv:2510.26692 (2025).
[Tea]: Moonshot Team.
[Tel16]: Matus Telgarsky. “Benefits of depth in neural networks”. 29th Annual Conference on Learning Theory. Ed. by Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir. Vol. 49. Proceedings of Machine Learning Research. Columbia University, New York, New York, USA: PMLR, 2016, pp. 1517–1539.
[Til15]: Andreas M Tillmann. “On the computational intractability of exact and approximate dictionary learning”. IEEE signal processing letters 22.1 (Jan. 2015), pp. 45–49.
[TB99]: M. Tipping and C. Bishop. “Probabilistic principal component analysis”. Journal of Royal Statistical Society: Series B 61.3 (1999), pp. 611–622.
[TZ15]: Naftali Tishby and Noga Zaslavsky. “Deep learning and the information bottleneck principle”. 2015 IEEE Information Theory Workshop (ITW). IEEE. 2015, pp. 1–5.
[TDC+24]: Shengbang Tong, Xili Dai, Yubei Chen, Mingyang Li, ZENGYI LI, Brent Yi, Yann LeCun, and Yi Ma. “Unsupervised Learning of Structured Representation via Closed-Loop Transcription”. Conference on Parsimony and Learning. Ed. by Yuejie Chi, Gintare Karolina Dziugaite, Qing Qu, Atlas Wang Wang, and Zhihui Zhu. Vol. 234. Proceedings of Machine Learning Research. PMLR, Jan. 2024, pp. 440–457.
[TDW+23]: Shengbang Tong, Xili Dai, Ziyang Wu, Mingyang Li, Brent Yi, and Yi Ma. “Incremental Learning of Structured Memory via Closed-Loop Transcription”. The Eleventh International Conference on Learning Representations. 2023.
[TCD+20]: Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. “Training data-efficient image transformers & distillation through attention. arXiv 2020”. arXiv preprint arXiv:2012.12877 2.3 (2020).
[Tu07]: Zhuowen Tu. “Learning Generative Models via Discriminative Approaches”. 2007 IEEE Conference on Computer Vision and Pattern Recognition. 2007, pp. 1–8.
[Tur50]: Alan Turing. “Computing Machinary and Intelligence”. Mind 59 (1950), pp. 433–460.
[Tur36]: Alan M. Turing. “On Computable Numbers, with an Application to the E ntscheidungsproblem”. Proceedings of the London Mathematical Society 2.42 (1936), pp. 230–265.
[UVL16]: Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. “Instance normalization: The missing ingredient for fast stylization”. arXiv preprint arXiv:1607.08022 (2016).
[Ume02]: Shinji Umeyama. “Least-squares estimation of transformation parameters between two point patterns”. IEEE Transactions on pattern analysis and machine intelligence 13.4 (2002), pp. 376–380.
[VM96]: P. Van Overschee and B. de Moor. Subspace Identification for Linear Systems. Kluwer Academic, 1996.
[VSP+17a]: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention is All you Need”. Advances in Neural Information Processing Systems. Ed. by I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Vol. 30. Curran Associates, Inc., 2017.
[VSP+17b]: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention is all you need”. Advances in neural information processing systems 30 (2017).
[VST+20]: Gido M Ven, Hava T Siegelmann, Andreas S Tolias, et al. “Brain-inspired replay for continual learning with artificial neural networks”. Nature Communications 11.1 (2020), pp. 1–14.
[VJO+21]: Luca Venturi, Samy Jelassi, Tristan Ozuch, and Joan Bruna. “Depth separation beyond radial functions”. Journal of machine learning research: JMLR 23 (Feb. 2021), 122:1–122:56. eprint: 2102.01621.
[Ver18]: Roman Vershynin. High-dimensional probability: An introduction with applications in data science. Vol. 47. Cambridge University Press, 2018.
[VM04]: R. Vidal and Y. Ma. “A unified algebraic approach to 2-D and 3-D motion segmentation”. Proceedings of the European Conference on Computer Vision. 2004.
[VMS16]: Rene Vidal, Yi Ma, and S. S. Sastry. Generalized Principal Component Analysis. 1st. Springer Publishing Company, Incorporated, 2016.
[VMS05]: Rene Vidal, Yi Ma, and Shankar Sastry. “Generalized principal component analysis”. IEEE transactions on pattern analysis and machine intelligence 27.12 (2005), pp. 1945–1959.
[Vin11]: Pascal Vincent. “A Connection Between Score Matching and Denoising Autoencoders”. Neural Computation 23.7 (2011), pp. 1661–1674.
[WWG+12]: Andrew Wagner, John Wright, Arvind Ganesh, Zihan Zhou, Hossein Mobahi, and Yi Ma. “Toward a practical face recognition system: Robust alignment and illumination by sparse representation”. IEEE Transactions on Pattern Analysis and Machine Intelligence 34.2 (2012), pp. 372–386.
[WB68]: C. Wallace and D. Boulton. “An Information Measure for Classification”. The Computer Journal 11 (1968), pp. 185–194.
[WD99]: C. Wallace and D. Dowe. “Minimum message length and Kolmogorov complexity”. The Computer Journal 42.4 (1999), pp. 270–283.
[WF65]: Marion Dwain Waltz and King-Sun Fu. “A heuristic approach to reinforcement learning control systems”. IEEE Transactions on Automatic Control 10 (1965), pp. 390–398.
[WVM+25]: Deng Wang, Jean Vannier, José M Martín-Durán, María Herranz, and Chiyang Yu. “Preservation and early evolution of scalidophoran ventral nerve cord”. Science Advances 11.2 (2025), eadr0896.
[WCK+25]: Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. “Vggt: Visual geometry grounded transformer”. Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 5294–5306.
[WLP+24]: Peng Wang, Huikang Liu, Druv Pai, Yaodong Yu, Zhihui Zhu, Qing Qu, and Yi Ma. “A Global Geometric Analysis of Maximal Coding Rate Reduction”. Forty-first International Conference on Machine Learning. 2024.
[WLY+25]: Peng Wang, Yifu Lu, Yaodong Yu, Druv Pai, Qing Qu, and Yi Ma. “Attention-Only Transformers via Unrolled Subspace Denoising”. Forty-second International Conference on Machine Learning. 2025.
[WZZ+24]: Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, and Qing Qu. “Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering”. arXiv preprint arXiv:2409.02426 (2024).
[WBW+24]: Qixun Wang, Xu Bai, Haofan Wang, Zekui Qin, Anthony Chen, Huaxia Li, Xu Tang, and Yao Hu. “Instantid: Zero-shot identity-preserving generation in seconds”. arXiv preprint arXiv:2401.07519 (2024).
[WXD+25]: Ruicheng Wang, Sicheng Xu, Cassie Dai, Jianfeng Xiang, Yu Deng, Xin Tong, and Jiaolong Yang. “Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision”. Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 5261–5271.
[WI20]: Tongzhou Wang and Phillip Isola. “Understanding contrastive representation learning through alignment and uniformity on the hypersphere”. Proceedings of the 37th International Conference on Machine Learning. ICML’20. JMLR.org, 2020.
[WGY+23]: Xudong Wang, Rohit Girdhar, Stella X Yu, and Ishan Misra. “Cut and learn for unsupervised object detection and instance segmentation”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023, pp. 3124–3134.
[Wer74]: P. J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD Thesis, Applied Mathematics Dept., Harvard Univ., 1974.
[Wer94]: Paul J. Werbos. “The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting”. 1994.
[WB18]: T. Wiatowski and H. Bölcskei. “A Mathematical Theory of Deep Convolutional Neural Networks for Feature Extraction”. IEEE Transactions on Information Theory (2018).
[Wie42]: Norbert Wiener. “The interpolation, extrapolation and smoothing of stationary time series”. Report of the Services 19, Research Project DIC-6037 MIT (1942).
[Wie48]: Norbert Wiener. Cybernetics: Or Control and Communication in the Animal and the Machine. the MIT Press, 1948.
[Wie49]: Norbert Wiener. Extrapolation, Interpolation, and Smoothing of Stationary Time Series. New York: Wiley, 1949.
[Wie61]: Norbert Wiener. Cybernetics: Or Control and Communication in the Animal and the Machine. 2nd ed. the MIT Press, 1961.
[WM22]: John Wright and Yi Ma. High-Dimensional Data Analysis with Low-Dimensional M odels: Principles, Computation, and Applications. Cambridge University Press, 2022.
[WTL+08]: John Wright, Yangyu Tao, Zhouchen Lin, Yi Ma, and Heung-Yeung Shum. “Classification via minimum incremental coding length (MICL)”. Advances in Neural Information Processing Systems. 2008, pp. 1633–1640.
[WYG+09]: John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. “Robust Face Recognition via Sparse Representation”. IEEE Trans. Pattern Anal. Mach. Intell. 31.2 (Feb. 2009), pp. 210–227.
[WX20]: Denny Wu and J. Xu. “On the Optimal Weighted \(\ell _2\) Regularization in Overparameterized Linear Regression”. ArXiv abs/2006.05800 (2020).
[WLW+19]: Jianlong Wu, Keyu Long, Fei Wang, Chen Qian, Cheng Li, Zhouchen Lin, and Hongbin Zha. “Deep comprehensive correlation mining for image clustering”. Proceedings of the IEEE/CVF international conference on computer vision. 2019, pp. 8150–8159.
[WTN+23]: Luhuan Wu, Brian L. Trippe, Christian A Naesseth, John Patrick Cunningham, and David Blei. “Practical and Asymptotically Exact Conditional Sampling in Diffusion Models”. Thirty-seventh Conference on Neural Information Processing Systems. 2023.
[WZG+25]: Tianhao Wu, Chuanxia Zheng, Frank Guan, Andrea Vedaldi, and Tat-Jen Cham. “Amodal3r: Amodal 3d reconstruction from occluded 2d images”. arXiv preprint arXiv:2503.13439 (2025).
[WCL+24]: Yuchen Wu, Minshuo Chen, Zihao Li, Mengdi Wang, and Yuting Wei. “Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models”. arXiv [cs.LG] (Mar. 2024). arXiv: 2403.01639 [cs.LG].
[WH20]: Yuxin Wu and Kaiming He. “Group Normalization”. International Journal of Computer Vision 128.3 (2020). Originally arXiv:1803.08494, 2018, pp. 742–755.
[WDL+25]: Ziyang Wu, Tianjiao Ding, Yifu Lu, Druv Pai, Jingyuan Zhang, Weida Wang, Yaodong Yu, Yi Ma, and Benjamin David Haeffele. “Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction”. The Thirteenth International Conference on Learning Representations. 2025.
[WZP+25]: Ziyang Wu, Jingyuan Zhang, Druv Pai, XuDong Wang, Chandan Singh, Jianwei Yang, Jianfeng Gao, and Yi Ma. “Simplifying DINO via Coding Rate Regularization”. Forty-second International Conference on Machine Learning. 2025.
[XLX+25]: Jianfeng Xiang, Zelong Lv, Sicheng Xu, Yu Deng, Ruicheng Wang, Bowen Zhang, Dong Chen, Xin Tong, and Jiaolong Yang. “Structured 3d latents for scalable and versatile 3d generation”. Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, pp. 21469–21480.
[XGF16]: Junyuan Xie, Ross Girshick, and Ali Farhadi. “Unsupervised deep embedding for clustering analysis”. International conference on machine learning. PMLR. 2016, pp. 478–487.
[XGD+17]: Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. “Aggregated Residual Transformations for Deep Neural Networks”. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, pp. 5987–5995.
[XWC+15]: Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. “Empirical evaluation of rectified activations in convolutional network”. arXiv preprint arXiv:1505.00853 (2015).
[XZS+20]: Yilun Xu, Shengjia Zhao, Jiaming Song, Russell Stewart, and Stefano Ermon. “A theory of usable information under computational constraints”. arXiv preprint arXiv:2002.10689 (2020).
[XGX+23]: Le Xue, Mingfei Gao, Chen Xing, Roberto Martín-Martín, Jiajun Wu, Caiming Xiong, Ran Xu, Juan Carlos Niebles, and Silvio Savarese. “ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding”. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023, pp. 1179–1189.
[YHB+22]: Greg Yang, Edward J Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, and Jianfeng Gao. “Tensor programs v: Tuning large neural networks via zero-shot hyperparameter transfer”. arXiv preprint arXiv:2203.03466 (2022).
[YH21]: Greg Yang and J E Hu. “Tensor Programs IV: Feature learning in infinite-width neural networks”. International Conference on Machine Learning 139 (2021). Ed. by Marina Meila and Tong Zhang, pp. 11727–11737.
[YPB16]: Jianwei Yang, Devi Parikh, and Dhruv Batra. “Joint unsupervised learning of deep representations and image clusters”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 5147–5156.
[YWZ+25]: Jingfeng Yang, Ziyang Wu, Yue Zhao, and Yi Ma. Language-Image Alignment with Fixed Text Encoders. 2025. arXiv: 2506.04209 [cs.CV].
[YLN+23]: Liu Yang, Kangwook Lee, Robert Nowak, and Dimitris Papailiopoulos. “Looped transformers are better at learning learning algorithms”. arXiv preprint arXiv:2311.12424 (2023).
[YKH24]: Songlin Yang, Jan Kautz, and Ali Hatamizadeh. “Gated delta networks: Improving mamba2 with delta rule”. arXiv preprint arXiv:2412.06464 (2024).
[YWZ+24]: Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, and Yoon Kim. “Parallelizing linear transformers with the delta rule over sequence length”. Advances in neural information processing systems 37 (2024), pp. 115491–115522.
[YYY+20]: Zitong Yang, Yaodong Yu, Chong You, Jacob Steinhardt, and Yi Ma. “Rethinking Bias-Variance Trade-off for Generalization of Neural Networks”. International Conference on Machine Learning. 2020.
[YZY+25]: Kaixin Yao, Longwen Zhang, Xinhao Yan, Yan Zeng, Qixuan Zhang, Lan Xu, Wei Yang, Jiayuan Gu, and Jingyi Yu. CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image. 2025. arXiv: 2502.12894 [cs.CV].
[YWZ+22]: Can Yaras, Peng Wang, Zhihui Zhu, Laura Balzano, and Qing Qu. “Neural collapse with normalized features: A geometric analysis over the Riemannian manifold”. Advances in Neural Information Processing Systems. Vol. 35. 2022, pp. 11547–11560.
[YCH+24]: Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, Hsiang-Tsung Kung, and Yubei Chen. “Gen4gen: Generative data pipeline for generative multi-concept composition”. arXiv preprint arXiv:2402.15504 (2024).
[YZB+23]: Brent Yi, Weijia Zeng, Sam Buchanan, and Yi Ma. “Canonical Factors for Hybrid Neural Fields”. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Oct. 2023, pp. 3414–3426.
[YCK+23]: TaeHo Yoon, Joo Young Choi, Sehyun Kwon, and Ernest K Ryu. “Diffusion probabilistic models generalize when they fail to memorize”. ICML 2023 workshop on structured probabilistic inference \(\{\)\( \backslash \)&\(\}\) generative modeling. 2023.
[YLR+16]: Chong You, Chun-Guang Li, Daniel P Robinson, and René Vidal. “Oracle based active set algorithm for scalable elastic net subspace clustering”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2016, pp. 3928–3937.
[YBP+24]: Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Hao Bai, Yuexiang Zhai, Benjamin D Haeffele, and Yi Ma. “White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is?” Journal of Machine Learning Research 25.300 (2024), pp. 1–128.
[YBP+23]: Yaodong Yu, Sam Buchanan, Druv Pai, Tianzhe Chu, Ziyang Wu, Shengbang Tong, Benjamin Haeffele, and Yi Ma. “White-box transformers via sparse rate reduction”. Advances in Neural Information Processing Systems 36 (2023).
[YCY+20]: Yaodong Yu, Kwan Ho Ryan Chan, Chong You, Chaobing Song, and Yi Ma. “Learning Diverse and Discriminative Representations via the Principle of Maximal Coding Rate Reduction”. Advances in neural information processing systems. 2020.
[YCO+21]: Zeyu Yun, Yubei Chen, Bruno A Olshausen, and Yann LeCun. “Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors”. arXiv preprint arXiv:2103.15949 (2021).
[ZKR+17]: Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. “Deep Sets”. Advances in Neural Information Processing Systems 30. Ed. by I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett. Curran Associates, Inc., 2017, pp. 3391–3401.
[ZAK24]: Moslem Zamani, Hadi Abbaszadehpeivasti, and Etienne de Klerk. “Convergence rate analysis of the gradient descent–ascent method for convex–concave saddle-point problems”. Optimization Methods and Software 39.5 (2024), pp. 967–989.
[ZDP+25]: Chong Zeng, Yue Dong, Pieter Peers, Hongzhi Wu, and Xin Tong. “RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination”. ACM SIGGRAPH 2025 Conference Papers. 2025.
[ZMZ+20]: Yuexiang Zhai, Hermish Mehta, Zhengyuan Zhou, and Yi Ma. “Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness”. 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020.
[ZNW22a]: Biao Zhang, Matthias Nießner, and Peter Wonka. “3dilg: Irregular latent grids for 3d generative modeling”. Advances in Neural Information Processing Systems 35 (2022), pp. 21871–21885.
[ZNW22b]: Biao Zhang, Matthias Niessner, and Peter Wonka. “3DILG: Irregular Latent Grids for 3D Generative Modeling”. Advances in Neural Information Processing Systems. Ed. by S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh. Vol. 35. Curran Associates, Inc., 2022, pp. 21871–21885.
[ZTN+23]: Biao Zhang, Jiapeng Tang, Matthias Nießner, and Peter Wonka. “3DShape2VecSet: A 3D Shape Representation for Neural Fields and Generative Diffusion Models”. ACM Trans. Graph. 42.4 (July 2023).
[ZCB+24]: Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, and Yang Song. “Improving diffusion inverse problem solving with decoupled noise annealing”. arXiv [cs.LG] (July 2024). arXiv: 2407.01521 [cs.LG].
[ZBH+17]: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. “Understanding deep learning requires rethinking generalization”. International Conference on Learning Representations. 2017.
[ZZL+24]: Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, and Qing Qu. “The Emergence of Reproducibility and Consistency in Diffusion Models”. Proceedings of the 41st International Conference on Machine Learning. Vol. 235. Proceedings of Machine Learning Research. PMLR, 2024, pp. 60558–60590.
[ZRA23a]: Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. “Adding Conditional Control to Text-to-Image Diffusion Models”. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). 2023, pp. 3813–3824.
[ZRA23b]: Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. “Adding conditional control to text-to-image diffusion models”. Proceedings of the IEEE/CVF international conference on computer vision. 2023, pp. 3836–3847.
[ZIE+18]: Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. “The unreasonable effectiveness of deep features as a perceptual metric”. Proceedings of the IEEE conference on computer vision and pattern recognition. 2018, pp. 586–595.
[ZLL+25]: Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, and Qing Qu. “Generalization of Diffusion Models Arises with a Balanced Representation Space”. arXiv preprint arXiv:2512.20963 (2025).
[ZLG+10]: Zhengdong Zhang, Xiao Liang, Arvind Ganesh, and Yi Ma. “TILT: Transform Invariant Low-Rank Textures”. International Journal of Computer Vision 99 (2010), pp. 1–24.
[ZLC+23a]: Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, BIN FU, Tao Chen, Gang YU, and Shenghua Gao. “Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation”. Thirty-seventh Conference on Neural Information Processing Systems. 2023.
[ZLC+23b]: Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, BIN FU, Tao Chen, Gang Yu, and Shenghua Gao. “Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation”. Advances in Neural Information Processing Systems. Ed. by A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine. Vol. 36. Curran Associates, Inc., 2023, pp. 73969–73982.
[ZFZ19]: Hao Zheng, Faming Fang, and Guixu Zhang. “Cascaded Dilated Dense Network with Two-step Data Consistency for MRI Reconstruction”. Advances in Neural Information Processing Systems. Ed. by H. Wallach, H. Larochelle, A. Beygelzimer, F. d ’Alché-Buc, E. Fox, and R. Garnett. Vol. 32. Curran Associates, Inc., 2019.
[ZCZ+25]: Hongkai Zheng, Wenda Chu, Bingliang Zhang, Zihui Wu, Austin Wang, Berthy Feng, Caifeng Zou, Yu Sun, Nikola Borislavov Kovachki, Zachary E Ross, Katherine Bouman, and Yisong Yue. “InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences”. The Thirteenth International Conference on Learning Representations. 2025.
[ZLG+]: Ziyan Zheng, Chin Wa Lau, Nian Guo, Xiang Shi, and Shao-Lun Huang. “White-box error correction code transformer”. The Second Conference on Parsimony and Learning (Proceedings Track).
[ZZ20]: Bo Zhou and S. Kevin Zhou. “ DuDoRNet: Learning a Dual-Domain Recurrent Network for Fast MRI Reconstruction With Deep T1 Prior ”. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, June 2020, pp. 4272–4281.
[ZM97a]: Song Chun Zhu and David Mumford. “Prior Learning and Gibbs Reaction-Diffusion”. IEEE Trans. Pattern Anal. Mach. Intell. 19.11 (1997), pp. 1236–1250.
[ZWM97]: Song Chun Zhu, Ying Nian Wu, and David Mumford. “Minimax Entropy Principle and Its Application to Texture Modeling”. Neural Computation 9.8 (1997), pp. 1627–1660.
[ZM97b]: Song-Chun Zhu and David Mumford. “Learning generic prior models for visual computation”. Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (1997), pp. 463–469.
[ZDZ+21]: Zhihui Zhu, Tianyu Ding, Jinxin Zhou, Xiao Li, Chong You, Jeremias Sulam, and Qing Qu. “A geometric analysis of neural collapse with unconstrained features”. Advances in Neural Information Processing Systems 34 (2021), pp. 29820–29834.
[ZL17]: Barret Zoph and Quoc V. Le. “Neural Architecture Search with Reinforcement Learning”. 2017.