Introduction

Top  Next

Introduction

 

SMILES is an acronym for Simplified Molecular Input Line Entry System.  It is a chemical notation system  used to represent a molecular structure by a linear string of symbols. In this regard, it is similar to the Wiswesser Line Notation (Wiswesser, 1954).  However, the SMILES notation system was specifically designed for computer use by chemists (Weininger, 1988).  The SMILES system is much easier to master than Wiswesser Line Notation.  The encoding rules for SMILES can be learned quickly and easily by anyone with any type of chemistry background.  The history of SMILES notation as a chemical language and the basic encoding rules for SMILES have been presented by David Weininger (Weininger, 1988).

 

This help file outlines the basic rules used to formulate a SMILES notation for a chemical structure.  It also discusses several differences that exist between the SMILES interpreting software used by the EPI Suite programs and the SMILES interpreting software used by other programs (such as BioByte Corporation’s ClogP (tm) Program). The encoding rules outlined in this help file consider the ECOSAR and EPI Suite software programs.

 

Learning to write a SMILES notation for most chemicals is not difficult.  However, writing a SMILES notation for a complicated ring system can be tricky and time-consuming.  The CAS Number Data Base (SMILECAS, which is part of the EPI Suite) is extremely helpful and time-efficient in obtaining SMILES notations.  This database contains the SMILES notation and chemical names for 112,000 CAS numbers. When the CAS Number data base is used, the SMILES notation and chemical name are imported directly into EPI Suite programs by simply entering the CAS number of the chemical.

 

What is a SMILES Notation:

 

A SMILES notation depicts a molecular structure as a two-dimensional picture as if drawn on a piece of paper.  No attempt is made to represent the structure in three dimensions (Weininger, 1988).  A two-dimensional drawing of a single chemical structure is possible in many different forms.  That is, a single structure can be depicted correctly by many different drawings.  In a similar manner, a single structure can be depicted correctly by many different SMILES notations.  In fact, any modestly large structure has literally dozens of SMILES notations that will correctly depict the structure.  Any one of the correct depictions is acceptable for computer interaction.

 

SMILES notations are comprised of atoms (designated by atomic symbols), bonds, parentheses (used to show branching), and numbers (used to designate ring opening and closing positions). With the exception of designating ring positions, numbers are not used in SMILES notation.

 

See the Example SMILES section for example SMILES notation of some common chemical structures.

 

For methods of entering SMILES into the program, see the Entering SMILES section.