应用化学 ›› 2022, Vol. 39 ›› Issue (1): 3-17.DOI: 10.19894/j.issn.1000-0518.210479

• 综合评述 • 上一篇    下一篇

生成模型在蛋白质序列设计中的应用

伍青林1, 任玉彬2, 翟小威1, 陈东1(), 刘凯2()   

  1. 1.浙江大学能源工程学院,杭州 310012
    2.清华大学化学系,北京 100084
  • 收稿日期:2021-09-26 接受日期:2021-11-11 出版日期:2022-01-01 发布日期:2022-01-10
  • 通讯作者: 陈东,刘凯
  • 基金资助:
    国家自然科学基金(21878258┫浙江省自然科学基金┣Y20B060027);资助

Protein Sequence Design Using Generative Models

WU Qing-Lin1, REN Yu-Bin2, ZHAI Xiao-Wei1, CHEN Dong1(), LIU Kai2()   

  1. 1.College of Energy Engineering,Zhejiang University,Hangzhou 310012,China
    2.Department of Chemistry,Tsinghua University,Beijing 100084,China
  • Received:2021-09-26 Accepted:2021-11-11 Published:2022-01-01 Online:2022-01-10
  • Contact: Dong CHEN,Kai LIU
  • About author:kailiu@mail.tsinghua.edu.cn
    kailiu@mail.tsinghua.edu.cn
  • Supported by:
    the National Natural Science Foundation of China(21878258);Zhejiang Provincial Natural Science Foundation of China(Y20B060027)

摘要:

蛋白质是一切生命体的物质基础,是生命活动的主要承担者,参与各种生理功能的调节。设计具有特定功能的蛋白质在蛋白质工程、生物医药、材料科学等领域具有重要意义。蛋白质序列设计的目标是设计能够折叠成期望结构并具有相应功能的氨基酸序列,是所有理性蛋白质工程的核心问题,具有极其重要的研究和应用潜力。随着蛋白质序列数据的指数型增长和深度学习技术的快速发展,生成模型越来越多地被应用于蛋白质序列设计。本文简要介绍了蛋白质序列设计的重要意义和主要方法,概述了应用于蛋白质序列设计的主要生成模型,介绍了近年来生成模型在蛋白质序列表示、生成和优化方面的最新研究和应用现状,并对未来的发展方向进行讨论与展望。

关键词: 蛋白质序列设计, 生成模型, 变分自动编码器, 生成对抗网络, 表示学习, 强化学习

Abstract:

Protein is the material basis of all livings, which is the main bearer of life activity and participates in the regulation of physiological functions. Designing proteins with specific functions is of great significance in the fields of protein engineering, biomedicine, and material science. Protein sequence design refers to the design and identification of amino acid sequences that can fold into the desired structure with the desired function. Protein sequence design is the core of rational protein engineering and has great potentials for research and application. With the exponential growth of protein sequence data and the rapid development of deep learning technology, generative models are increasingly used in protein sequence design. This review briefly introduces the significance of protein sequence design and the methods developed for protein sequence design. The principles of the four main generative models used for protein sequence design are discussed in detail. Reports on the latest research and application of generative models in protein sequence representation, generation, and optimization over the past several years are presented. Finally, the future developments of protein sequence design are outlooked.

Key words: Protein sequence design, Generative model, Variational autoencoder, Generative adversarial network, Representation learning, Reinforcement learning

中图分类号: