SiMologics — Protein Sequence Analysis

Model Attribution

SiMologics is built on top of state-of-the-art open-source antibody AI models. We acknowledge and thank the research teams behind each model.

Outputs generated using these models are for research purposes only. Please cite the original papers when publishing results obtained via SiMologics.

AntiBERTy

Licence: MIT

Sequence Analysis (embed, classify, fill, log-likelihood)

AntiBERTy is a BERT-based antibody language model trained on 558 million unpaired antibody sequences from the Observed Antibody Space (OAS) database. It produces 512-dimensional per-residue and per-sequence embeddings that encode evolutionary, structural, and functional information. Used on SiMologics for sequence embedding, species and chain-type classification, masked residue prediction, and log-likelihood scoring.

Citation: Ruffolo, J. A., Gray, J. J., & Sulam, J. (2022). Deciphering antibody affinity maturation with language models and weakly supervised learning. arXiv:2112.07782.

Source: https://github.com/jeffreyruffolo/AntiBERTy

ProGen2-OAS

ProGen2 (OAS fine-tune)

Licence: Apache 2.0

Sequence Generation

ProGen2 is a family of protein language models trained on hundreds of millions of protein sequences using a causal (autoregressive) transformer architecture. The OAS fine-tuned variant is specialised for antibody variable region generation. SiMologics uses it to extend user-provided seed sequences and generate novel antibody sequences via temperature-controlled nucleus sampling.

Citation: Nijkamp, E., et al. (2023). ProGen2: Exploring the space of protein sequence likelihood models. Cell Systems, 14(12).

Source: https://github.com/salesforce/progen

IgCraft

latest

Licence: MIT

Antibody Design (generate, inpaint, inverse fold, CDR graft)

IgCraft is an antibody-specific generative model that supports unconditional antibody generation, region inpainting (redesigning selected CDR and framework regions given IMGT-formatted input), inverse folding (predicting sequence from a PDB structure), and CDR grafting (transplanting donor CDR sequences onto an acceptor scaffold).

Citation: IgCraft — internal and/or preprint. See GitHub for latest citation guidance.

Source: https://github.com/oxpig/IgCraft

BioPhi (Sapiens)

Sapiens

Licence: CC BY 4.0

Humanisation & Humanness Scoring

BioPhi incorporates Sapiens, a BERT-based model trained on human antibody repertoire data to score the humanness of each residue in a sequence. SiMologics uses it for two tasks: Humanise (suggest mutations to increase humanness while preserving CDRs) and Score (compute per-residue humanness without modifying the sequence).

Citation: Prihoda, D., et al. (2022). BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning. mAbs, 14(1).

Source: https://github.com/Merck/BioPhi

SiMologics does not claim ownership of the above models. Each model is used in accordance with its original licence. Contact info@simologics.com with any licence questions.