Showing posts with label molecule. Show all posts
Showing posts with label molecule. Show all posts

Monday, November 23, 2015

Live Molecule search: a quick hack using JSME, Cactus, AngularJS and BootStrap

For an internal project, I needed a web control that will display the structure of the molecule as I type in the name (or SMILES, CAS number, IUPAC name or a host of other ways you can textually describe a molecule in one line). Now this is not really easy, but thanks to the some open APIs provided by the CADD group at NIH/NCI (http://cactus.nci.nih.gov), it was really easy to get this done.

Here is a quick overview of the hack I wrote. The code for the same is available on GitHub (https://github.com/tovganesh/mol-auto-view.js)

The final control. You give a molecule description, the molecule appears in draw area, click "Save to list" and it gets saved to a list that is depicted on the right hand side.


The Cactus (http://cactus.nci.nih.gov/chemical/structure)

The molecule description can be given in various ways, and it usually a very tiresome task to think through and write interpreters. One can of course use tools like Open Babel or CDK, but a more cleaner interface is a web service offered by NCI, called the Cactus. A simple API call as follows,
gives you a mol file that can be viewed:

http://cactus.nci.nih.gov/chemical/structure/[Molecule Description]/pdf


JSME (http://peter-ertl.com/jsme/)

Next to view this structure, I use the simple but powerful 2D molecule drawing tool by Peter Ertl and Bruno Bienfait (http://peter-ertl.com/jsme/)

I, particularly use the following API:

jsmeApplet.reset();
jsmeApplet.readMolFile(dataFromCactusAPICall);


The above call is the clear the drawing area, and read in the new molecule file obtained from call to the Cactus API.

Next, I need to get the molecule file data from JSME and well as a graphics representation to show in a list. This is does using the following API call:

molFile: jsmeApplet.molFile(),
molImage: btoa(jsmeApplet.getMolecularAreaGraphicsString())


The first call is for getting the molecule data, as a mol file. The second call is to get the graphics of the molecule as a SVG data, that is then encoded to base64.

Putting it all together: Angular JS and Bootstrap

Putting it all together required using Angular JS and some basic Bootstrap. The following is used for the text field that takes the molecule description:



The above code is used to fetch the mol data from giving a call to the cactus API, as you type the molecule text. The model change even is delayed by 750ms, so that the API call is only fired, when the user pauses for some time. 

Thats it! I hope you find it useful. To see the full code see Github: https://github.com/tovganesh/mol-auto-view.js



Saturday, July 12, 2014

A quick and dirty algorithm for Maximum Common Substructure

While there is vast amount of literature dedicated to algorithms for Maximum Common Substructure (MCS) search, particularly with respect to molecular graph (see for instance: http://scholar.google.co.in/scholar?hl=en&q=maximum+common+substructure&btnG=), I wanted to quickly write an algorithm without reading any literature on the subject. And thus this algorithm was written, typically over the weekend.

First consider finding an MCS between two molecular graphs:

1. Convert the molecule into a bit representation. This is akin to generating fingerprint of the molecule, but in this case a fingerprint is generated for each Atom in the following way:

  • Each Atom is represented as a bit-set of 64 bits, that will be able to capture a maximum of seven bonds that are connected to the atom under consideration.
  • Each set of 9-bits represents a connection: 7 bits for storing atomic number of connected atom type, 2 bits for bond type (0-single, 1-double, 2-triple, 3-aromatic).  
  • The order in which the bits are stored in the bit-set is always in sorted order: first sorted on atomic number and then on the bond type.
2. Match the corresponding bit atom representations (generated above)  among the two molecular graphs, and build a pair list. This matching is done in two stage: the first stage involves exact bit pattern match, where as the second stage involves partial match. When doing the partial match, the atom pair match that has the maximum bit matches is taken. The pair list thus generated becomes the seed for 'growing' the graph that would hopefully be the maximum common substructure.

3. For each of the pair list generated in step 2, start growing the graph using a breadth first traversal method, at each stage checking if the corresponding expansion in the paired molecule is also possible. The pair stops growing, when no such expansion is possible. And the algorithm stops when there is no growth for any of the pairs in the list. The pair with the maximum number of elements then forms the maximum common substructure. Use this pair to next generate the template molecular sub-graph representing MCS.  

For finding MCS among a set of molecules: 

4. Find the molecule with least number of atoms from the set, and mark it as reference molecule. Follow steps 1 to 3 for the pair of reference molecule and every other molecule in the set. Instead of just picking the pair with maximum elements in step 3, save the complete list (L). Next, pick the pair list that has least number of elements as reference (Lp). For a member in this list (iLp), search the complete list (L), to see if every list has the member iLp, if yes add it to a new list (Lnew). Repeat this procedure for every member of list Lp. Finally, the list with maximum number of elements in Lnew is the MCS for the set of molecules. 

That is it. I have tested the algorithm on different sets of congeneric series of molecules, and it pretty much works for all. For non-congeneric set however, one needs to build a cluster algorithm to find the MCS in sub group of molecules, that is work for some other weekend. 

For now, an implementation of this is available as a scripting function within VLifeMDS (www.vlifesciences.com, visit https://portal.vlifesciences.com/ to buy a license). I will also hopefully make an implementation available with MeTA Studio soon.