Preface to Version 2.0

In this chapter

Preface to Version 2.0

Preface to Version 2.0

As an open-source book, the first version, entitled “Learning Deep Representations of Data Distributions”, released on August 18, 2025 has been frequently and constantly updated in the past six months. Meanwhile, based on our experience from using the book for a new course taught at the University of Hong Kong in Fall 2025 and feedback that we have received from colleagues, teaching assistants and students, we have identified numerous points throughout the book that merit further revision and expansion.

Hence, we have decided to make some substantial changes and upgrades of the content and organization of the book with a new Version 2.0 and changed its title to:

Principles and Practice of Deep Representation Learning
or A Mathematical Theory of Memory

The new version also allows us to explicitly reveal strong conceptual and technical connections among materials across different chapters and sections within the book, so that the overall pedagogical value of the book, we believe, has been improved significantly over the first version.

In our opinion, the new version gives a much more unified, complete, and stream-lined presentation of this subject. The new version also incorporates newly developed theoretical insights as well as a growing number of practical applications for real-world data that have not been properly documented and systematically explained elsewhere. The Version 2.0 serves as a timely remedy to this situation.

Major Changes in Version 2.0

We have split Chapter 3 in the first version into to two chapters, now Chapter 3 and 4. The new Chapter 3 focuses on the denoising process for learning low-dimensional distributions. We have added a new section that characterizes conditions under which the process leads to generalization or memorization. The new Chapter 4 focuses on representation learning based on a lossy coding approach and the principle of maximizing information gain. Since the old version only illustrated how to apply the principle to learn image representations in the supervised setting, the new version has added a case study with the unsupervised setting. It also provides theoretical justification for popular unsupervised learning methods such as contrastive learning and DINO in particular.
In the updated Chapter 5 on designing deep representations via unrolled optimization, we have added a principled derivation of a causal version for the white-box transformer architecture CRATE, which is important for processing sequential data such as texts as we often see in applications.
In the updated Chapter 6 on consistent representation learning, we added a new section that further elucidates the relationships between distribution learning and representation learning. This provides theoretical clarification and justification for popular practical autoencoding methods such as variational autoencoding and representation autoencoders.
In the updated Chapter 7, we have added a section that provides a principled explanation of representation learning with paired data and conditioned generation through the perspective of mutual information. It provides theoretical justification for popular practical methods such as CLIP and cross attention.
The expanded application Chapter 8 now features many more detailed implementations of distribution learning and representation learning for real-world data, including natural 2D images, natural 3D objects, human body motions, and natural languages, under almost all popular and practical settings, supervised, weakly supervised, unsupervised, and conditioned.
The final Chapter 9 about open directions has also been significantly rewritten. We attempt to give a clearer taxonomy of different levels of intelligence so that we can clarify what we have done and understood about intelligence (with this book) and what we have not. We believe that open problems associated with more advanced forms of intelligence can be well posed so that they can be studied qualitatively and quantitatively via scientific and mathematical means, instead of remaining as a mysterious subject or merely a philosophical topic.

Contributors for Version 2.0

Besides the authors, many students and colleagues have joined this project and contributed valuable content to different parts of the book during the preparation of Version 2.0. Below is an incomplete list of people and their specific contributions, in alphabetical order:

Tianzhe Chu: Experiments in Section 8.4, and AI tooling for the book.
Prof. Shenghua Gao: Sections 8.8 and 8.9.
Bingbing Huang: Sections 8.8 and 8.9.
Kerui Min: Chinese translation.
Prof. Qing Qu: Section 3.3.
Shengbang Peter Tong: Sections 6.3 and 8.6.2.
Chengyu Wang: Sections 8.8 and 8.9.
Ziyang Robin Wu: Section 8.3 and website development.
Chun-Hsiao Daniel Yeh: Sections 8.6 and 8.7.
Brent Yi: Section 8.10.
Jingfeng Yang: Section 8.3.
Dr. Yaodong Yu: Chapter 5 is based on his PhD thesis.
Dr. Zibo Zhao: Section 8.8.

We also thank Dr. Kevin Murphy and Dr. Bill Mark, for extensive technical feedback on the manuscript; Jan Cavel, for contributing an unofficial Romanian translation; and Stephen Butterfill and Jeroen Van Goey, for contributing corrections and fixes to the manuscript.