What is RNAmigos?

RNAmigos is a Graph Neural Network for predicting RNA small molecule ligands.
We trained RNAmigos on a set of RNA-small molecule complexes from the PDB DataBank to predict the associated ligand given only the binding site structure. The input to RNAmigos is an RNA binding site structure. We can accept atomic coordinates, which are internally converted to a base pairing network, or we can take a base pairing network directly. From the base pairing network, we predict a molecular fingerprint. The resulting fingerprint can be used to identify similar molecules in a ligand database and thus accelerate virtual screening. For full details see article.

What isn't RNAmigos?

RNAmigos is not a docking tool, or an affinity predictor, hence we do not require a docked complex as input, nor do we return an affinity score. We also do not scan full RNAs for binding sites, this tool assumes that the input structure already represents a binding site.

Uploading Structure Data

We accept two input formats: mmCIF (.cif) and JSON (.json). mmCIF is a format for describing atomic cooridnates, and we accept graphs encoded in the JSON node-link format. The graph must have a node attribute 'nt' which stores one of 5 possible nucleotides as characters 'A', 'U', 'C', 'G', 'N' and the edges must have a 'label' attribute which stores one of 13 possible edge labels, according to the Leontis-Westhof nomenclature. Here is a sample node-link JSON:

{'directed': False,
 'multigraph': False,
 'graph': {},
 'nodes': [{'nt': 'A', 'id': ('A', 5)},
  {'nt': 'G', 'id': ('A', 6)},
  {'nt': 'U', 'id': ('A', 26)},
  {'nt': 'A', 'id': ('A', 7)},
  {'nt': 'C', 'id': ('A', 25)},
  {'nt': 'U', 'id': ('A', 9)},
  {'nt': 'U', 'id': ('A', 8)},
  {'nt': 'U', 'id': ('A', 24)},
  {'nt': 'A', 'id': ('A', 11)},
  {'nt': 'G', 'id': ('A', 10)},
  {'nt': 'C', 'id': ('A', 23)},
  {'nt': 'U', 'id': ('A', 22)},
  {'nt': 'G', 'id': ('A', 12)},
  {'nt': 'C', 'id': ('A', 13)},
  {'nt': 'C', 'id': ('A', 21)},
  {'nt': 'G', 'id': ('A', 20)}],
 'links': [{'label': 'B53', 'source': ('A', 5), 'target': ('A', 6)},
  {'label': 'CWW', 'source': ('A', 5), 'target': ('A', 26)},
  {'label': 'B53', 'source': ('A', 6), 'target': ('A', 7)},
  {'label': 'CWW', 'source': ('A', 6), 'target': ('A', 25)},
  {'label': 'TSW', 'source': ('A', 6), 'target': ('A', 9)},
  {'label': 'CSS', 'source': ('A', 26), 'target': ('A', 9)},
  {'label': 'B53', 'source': ('A', 26), 'target': ('A', 25)},
  {'label': 'B53', 'source': ('A', 7), 'target': ('A', 8)},
  {'label': 'CWW', 'source': ('A', 7), 'target': ('A', 24)},
  {'label': 'B53', 'source': ('A', 25), 'target': ('A', 24)},
  {'label': 'B53', 'source': ('A', 9), 'target': ('A', 8)},
  {'label': 'B53', 'source': ('A', 9), 'target': ('A', 10)},
  {'label': 'CHW', 'source': ('A', 8), 'target': ('A', 11)},
  {'label': 'B53', 'source': ('A', 24), 'target': ('A', 23)},
  {'label': 'B53', 'source': ('A', 11), 'target': ('A', 10)},
  {'label': 'B53', 'source': ('A', 11), 'target': ('A', 12)},
  {'label': 'CWW', 'source': ('A', 11), 'target': ('A', 22)},
  {'label': 'CWW', 'source': ('A', 10), 'target': ('A', 23)},
  {'label': 'CSW', 'source': ('A', 10), 'target': ('A', 22)},
  {'label': 'B53', 'source': ('A', 23), 'target': ('A', 22)},
  {'label': 'B53', 'source': ('A', 22), 'target': ('A', 21)},
  {'label': 'B53', 'source': ('A', 12), 'target': ('A', 13)},
  {'label': 'CWW', 'source': ('A', 12), 'target': ('A', 21)},
  {'label': 'CWW', 'source': ('A', 13), 'target': ('A', 20)},
  {'label': 'B53', 'source': ('A', 21), 'target': ('A', 20)}]}

The 'id' attribute can be whatever you like. In this case, it represents a chain and position in a PDB. IMPORTANT: file must not exceed 1MB. We do not process whole RNAs, just binding sites, which should be a handful of residues large.

Uploading Ligand Database

If you provide your own ligand database to screen, we will use the prediction to return the most similar ligands in the database to the prediction. The format for each line is [SMILES] [some data].
More info on the SMILES format here. Here is a sample file with 10 ligands.

c1ccc(cc1)[C@@H](C(=O)O)N 004
c1nc(c2c(n1)n3c(n2)[C@@H]([C@@H]4[C@H](C[C@H]3O4)O)OP(=O)(O)O)N 02I
C1COCCN1CC2=CC[C@H](NC2)C(=O)O 04X
c1nc(c2c(n1)n(cn2)[C@@H]3[C@H]([C@H]([C@@H](O3)COP(=O)(O)O)O)O)N 0A
C1=CN(C(=O)N=C1N)[C@@H]2[C@H]([C@H]([C@@H](O2)COP(=O)(O)O)O)O 0C
COc1cc2c(cc1OC)nc(nc2N)N3CCNCC3 0EC
c1nc2c(n1[C@@H]3[C@H]([C@H]([C@@H](O3)COP(=O)(O)O)O)O)N=C(NC2=O)N 0G
c1ccc(cc1)C[C@H](C(=O)N2CCC[C@H]2C(=O)N[C@@H](CCCNC(=[NH2+])N)[C@@H](CCl)O)N 0G6
CS[C@@H]([C@@H](C(=O)O)N)C(=O)O 0TD
C1=CN(C(=O)NC1=O)[C@@H]2[C@H]([C@H]([C@@H](O2)COP(=O)(O)O)O)O 0U

The default library consists of all ligands that have been found co-crystallized with RNA in the PDB DataBank.

Interpreting output

When your query completes, we display a list of the 30 most similar ligands to the prediction, chosen from the ligand library. You can then download the full list of distances to each element in the library by clicking the 'Download' button. When you download the results, you will get hits.csv with a list of ligands and their distances to the prediction, graph.json which contains a JSON description of the base pairing network used to make the prediction, fingerprint.txt with is the predicted MACCS fingerprint.

Source Code

GitLab
GitHub

Citing


@article {Oliver701326,
	author = {Oliver, Carlos and Mallet, Vincent and Sarrazin Gendron, 
	Roman and Reinharz, Vladimir and Hamilton, 
	William L. and Moitessier, Nicolas and Waldisp{\"u}hl, J{\'e}r{\^o}me},
	title = {Augmented base pairing networks encode RNA-small molecule binding preferences},
	elocation-id = {701326},
	year = {2020},
	doi = {10.1101/701326},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2020/02/01/701326},
	eprint = {https://www.biorxiv.org/content/early/2020/02/01/701326.full.pdf},
	journal = {bioRxiv}
}

Contact

rnamigos@cs.mcgill.ca