Artificial intelligence (AI) has come a long way, from imitating human language to even replicating biological processes such as evolution. In a recent study published in the journal Nature Biotechnology, researchers tested the ability of a language model called ProGen, developed by Salesforce Research, to generate amino acid sequences, or enzymes, that could work in real-life situations. The study was a collaboration between Salesforce Research and researchers at the University of California-San Francisco and the University of California-Berkeley.
Proteins can be represented as a language made up of amino acids, which are the 20 molecules that make up every protein. In the same way that words are strung together one-by-one to form sentences, amino acids are strung together one-by-one to make proteins. Using this insight, the team applied neural language modeling to proteins to generate realistic yet novel protein sequences.
The team trained ProGen on 280 million proteins and fine-tuned it with a dataset of 56,000 proteins from five different families. After generating one million artificial sequences, the team tested 100 of them to compare their functionality to natural proteins and see if they followed the “grammar” of amino acid composition. Of those 100 proteins, the team tested the functionality of five artificial proteins in cells and compared them to an enzyme in chicken eggs called hen egg white lysozyme (HEWL). To their surprise, two of the proteins showed activity similar to HEWL, breaking down bacteria’s cell walls.
Also Read: Google LayOff New Parents: Married Couple and 4-Month-Old Baby Blindsided

The results of the study highlight the potential of AI to revolutionize the field of protein design. Proteins play a crucial role in human biology and are involved in everything from the functioning of the body to the development of diseases. By using AI to design new proteins, researchers hope to more effectively treat diseases or even prevent them in the first place.
Also Read: Apple Developing Easier Method for AR Application Creation
The methods described in the paper are available on GitHub, enabling the research community to build upon this work and accelerate research on AI for protein design. In addition, the team’s use of conditional language models allows for significantly more control over the types of sequences generated, making them more useful for designing proteins with specific properties.
While AI has been used to generate proteins in the past, this study sets itself apart from prior research by demonstrating the potential of language models for protein design. The team’s success in generating artificial proteins that work as well as proteins that have evolved over millions of years of evolution is a testament to the growing power of AI and its ability to replicate complex biological processes.
In conclusion, the study published in Nature Biotechnology highlights the potential of AI to revolutionize the field of protein design. By using AI to design new proteins, researchers hope to more effectively treat diseases or even prevent them in the first place. The methods described in the paper are available on GitHub, enabling the research community to build upon this work and accelerate research on AI for protein design. The team’s use of conditional language models also allows for more control over the types of sequences generated, making them more useful for designing proteins with specific properties. The future looks bright for AI in the field of biology, and we can’t wait to see what comes next.