Chemical Quantum Images: Simplified molecular input line entry specification

Monday, 17 March 2008

Simplified molecular input line entry specification

The whole name is almost as catchy as "SMILES". I used to think it was a strange way of representing molecules for computers. But it actually seems like a more straight forward way than IUPAC nomenclature. It's also shorter, and there's a possibility to have unique names. (It's more difficult to pronounce though.)

You can try out SMILES strings at this page it's kind of fun. How to do it is described on wikipedia for example.

Ethane is just CC.

Add double and triple bonds like this:
C#CC=C for butenyne.

Add a branch in parentheses:
CC(C)CCC for 2-Methyl-n-pentane

If you want a ring add a number after the two atoms to be joined together:
C1C(C)CCC1 for Methyl-cyclo-pentane

Add a pyridyl group to the C next to the methyl group (aromatic atoms are written in lower case, and you have to include a second ring closure)
C1(c2ncccc2)C(C)CCC1 for (2-Pyridyl-)-2-methyl-c-pentane

You can add an extra oxirane ring:
C1(c2ncccc2)C(C)CCC13OC3

You can mess with stereochemistry (using @ and @@)
C1(c2ncccc2)[C@@H](C)CC[C@]13OC3

If you still haven't had enough, you can add a double bond in E configuration to the pyridyl ring:
C1(c2nc(/C=C(Cl)\C)ccc2)[C@@H](C)CC[C@]13OC3

or Z configuration
C1(c2nc(/C=C(Cl)/C)ccc2)[C@@H](C)CC[C@]13OC3

Go SMILES!

9 comments:

Ψ*Ψ said...: Cool! I always kinda wondered how that worked.; 18 March 2008 at 01:16
Egon Willighagen said...: It might interest you that OpenSMILES.org is working on a open standard for SMILES.; 18 March 2008 at 08:15
Felix said...: Ψ*Ψ: I am glad they forced me into that cheminformatics class, and told me what it's about

egon: it seems cool that so much is open source these days. maybe eventually I will use SMILES, this was just for curiosity; 18 March 2008 at 12:28
Lightnir said...: You find drawing structures by using SMILES cool? Yeah, right... When you want to do something really cool try superimposing molecules by using Smiles Arbitrary Target Specification (SMARTS). It's a kind of query language based on SMILES that allows you to search molecule fragments within SMILES strings. Check obfit from the Open Babel package for more info.; 19 March 2008 at 08:09
Felix said...: I am waiting until they teach me that in class ...; 19 March 2008 at 10:27
Georg-Martin Krapper said...: One of the coolest things about SMILES is that there are algorithms for generating the canonical SMILES for a molecule. In general you can write a number of equally valid SMILES for a molecule. The canonicalisation algorithms identify one of these SMILES as the canonical SMILES. You can identify duplicate molecules in databases by simple string matching. SMILES notation also makes it very easy to build molecular models using obscure modelling tools such as the emacs editor.; 20 March 2008 at 23:35
Lightnir said...: Felix: Why wait if you can learn it faster by yourself. You may be interested in this short article.
GMC2007: The emacs part sounds like fun to me ]:); 21 March 2008 at 18:04
Georg-Martin Krapper said...: A couple of responses to lightnir's comment.

Although SMARTS notation can be used to specify how molecules should be overlaid, it goes well beyond that. The SMARTS language enables powerful and general definition of substructures. And that is the basis of chemistry. BTW good to see the use of recursive SMARTS in your example.

Although canonicalisation of SMILES is very useful, you are not obliged to write SMILES in their canonical form. If you need to build structures from scratch, it's usually quicker to type them in as SMILES (hence my reference to emacs). This makes it easy to force a particular ordering of atoms since many 3D structure generators will maintain SMILES order. If you're doing covalent docking with something like GOLD, you will typically need to have ordered the molecules so that the covalent link atom always occurs at a fixed point (e.g. first atom) in the molecule.; 22 March 2008 at 19:11
Shawn Wilkinson said...: Emacs isn't obscure. It's the best thing ever. Ever.

/soapbox; 9 April 2008 at 02:18

Chemical Quantum Images

Monday, 17 March 2008

Simplified molecular input line entry specification

9 comments:

About Me

Popular Posts

Search This Blog

Blogs

Labels

Blog Archive

Software