Chinese Journal of Applied Chemistry ›› 2022, Vol. 39 ›› Issue (1): 3-17.DOI: 10.19894/j.issn.1000-0518.210479

• Review • Previous Articles     Next Articles

Protein Sequence Design Using Generative Models

WU Qing-Lin1, REN Yu-Bin2, ZHAI Xiao-Wei1, CHEN Dong1(), LIU Kai2()   

  1. 1.College of Energy Engineering,Zhejiang University,Hangzhou 310012,China
    2.Department of Chemistry,Tsinghua University,Beijing 100084,China
  • Received:2021-09-26 Accepted:2021-11-11 Published:2022-01-01 Online:2022-01-10
  • Contact: Dong CHEN,Kai LIU
  • About
  • Supported by:
    the National Natural Science Foundation of China(21878258);Zhejiang Provincial Natural Science Foundation of China(Y20B060027)


Protein is the material basis of all livings, which is the main bearer of life activity and participates in the regulation of physiological functions. Designing proteins with specific functions is of great significance in the fields of protein engineering, biomedicine, and material science. Protein sequence design refers to the design and identification of amino acid sequences that can fold into the desired structure with the desired function. Protein sequence design is the core of rational protein engineering and has great potentials for research and application. With the exponential growth of protein sequence data and the rapid development of deep learning technology, generative models are increasingly used in protein sequence design. This review briefly introduces the significance of protein sequence design and the methods developed for protein sequence design. The principles of the four main generative models used for protein sequence design are discussed in detail. Reports on the latest research and application of generative models in protein sequence representation, generation, and optimization over the past several years are presented. Finally, the future developments of protein sequence design are outlooked.

Key words: Protein sequence design, Generative model, Variational autoencoder, Generative adversarial network, Representation learning, Reinforcement learning

CLC Number: