Molecular Structure Manipulation Tools
Scripts for manipulation, comparison and conversion of molecular structures.
Installation
- Extract the compressed file.
'tar -xf struc_manip.tar.gz
'
- Check for the packages which are required by some of the scripts in the distribution.
They may be either found as part of the UNIX distribution (try yum install <package>
) or they can be downloaded from the URLs specified.
- Set up for quick access
either
- Add the contained
lib
directory to you PYTHONPATH
environment variable
(e.g. 'setenv PYTHONPATH <PATH>/lib
' in the .cshrc)
- add the
scripts
directory to your path
or
set up the <PATH>/struc_caller
routine by modifying the path inside
Description
Syntax: '[<PATH>/struc_caller] <script_name>.py <arguments>'
for a short instruction: call '<script_name>.py' without any arguments
File handling is based upon the Open Babel conversion tools. Therefore all file types available in openbabel can be accessed (try babel -H). Additionally there is support for Columbus/Newton-X format ("col") and Tinker format ("txyz2").
superimpose.py
Superposition of different structures of one molecule using a quaternion fit [1].
Syntax: 'superimpose.py <ref_struc> <struc1> <struc2> ... -t<file_type>'
e.g. superimpose.py ref.xyz struc1.xyz struc2.xyz -txyz
Structures <struc1>, <struc2>, ... are superimposed onto <ref_struc>.
<file_type> is a file type supported by openbabel (type 'babel -help' for a complete list).
Superposition is done through aligning the centers of mass for translation, and a quaternion fit for rotations.
Only proper rotations are considered.
The numbering of the atoms is crucial. Renumbering is not implemented in this script. It could be done manually or through python programming with usage of struc_linalg.py.
babel.py
File format conversion utility, based on and similar to the babel
program but with additional support for Columbus/Newton-X ("col") and Tinker ("txyz2") formats.
Syntax: 'babel.py <in_format> <in_file> <out_format> <out_file>'
e.g. babel.py tmol coord col geom
(to convert between Turbomole and Columbus formats)
combine_vib.py
Insert molecules into a larger template file and optionally combine the molecular vibrations.
Syntax: 'combine_vib.py'
Requirements: Structure files for the template and the molecules to insert. Normal modes (if required) are given as molden format files.
Execution: Follow the explanations on the screen. The non-trivial task is to identify the corresponding atoms between the structures. For this it is suggested to open both molecules at the same time in a molecular structure viewing program to compare the indices. Note that the program asks for the indices of the atoms in the (larger) template file corresponding to indices in the molecule inserted (putting the indices in the opposite order will give a wrong result).
Output:
combined_struc.xyz: Structure file with inserted structures.
combined_vib.mld: Molden file with combined normal modes.
arc2pot.py
Conversion utility from a Tinker archive to a set of point charges in Columbus format. The structure can be separated into QM and MM regions and the functionality for adding link atoms is supplied. The script can be used to extract several frames in order to perform a calculation in an averaged potential. The point charges are created in Columbus format as potential.xyz (Bohr). Additionally the geometry of the core region for the first frame is written to geom.xyz in standard xyz format (Angstrom).
Syntax: 'arc2pot.py <tinker-arc> <prm-file>'
<tinker-arc> is the Tinker archive with the structures. Point charges for the atom types are parsed from <prm-file>.
The remaining input is parsed from a file arc2pot.in with the following information. Only values with no specified default have to be entered.
Keyword | Default | Description |
at_list | | Specification of the atoms which shall be converted to point charges (i.e. the MM region). Use to(<start>,<end>) to specify ranges. |
frames | | Specification of the dynamics frames to use for conversions. |
link_atom | empty | Use link_atom[<QM_atom>]=[<MM_atom>,<ratio>,<symbol>] to introduce a link atom with atomic symbol <symbol> into geom.xyz. This link atom is placed on the connecting line between the QM and MM atoms at <ratio> times their distance away from the QM atom. Per default the point charge of the MM_atom is set to zero use zero_scatter for charge redistribution. |
zero_scatter | empty | Specify charges to redistribute. Point charges of atoms in <zero_i> are set to zero and their summed up charge is distributed over <scatter_i>. If an atom is specified in <zero_i> which is not in <at_list>, then its charge is added to the scatter charge. If an empty list [] is specified for <scatter_i>, then the charges are discarded.
It is specified as a nested list, i.e. zero_scatter=[ [<zero_0>,<scatter_0>], [<zero_1>,<scatter_1>], ...] , where <zero_i> and <scatter_i> are each lists as well. Typical choices for <zero_i> are either just the MM link atom, or both the QM region and the MM link atom. A typical choice for <scatter_i> would consist of all the atoms bonded to the MM link atom. |
new_charge | empty | Manual modification of a charge. Use new_charge[<index>]=<charge> to modify the charge of the atom indexed with <index> in the output potential.xyz file. new_charge is applied before zero_scatter , in case they should be used in connection. |
sep | False | If this is set to True then the information is read in from separate input files. |
scale | 1. | Scaling factor of the charges. The default 1. means that charges are divided by the number of frames. A value different from 1. would only be needed in special cases (for example when a symmetrization is performed later). |
lvprt | 0 | Print level for debug and verification purposes. |
- Please use square brackets to enclose lists and to(<start>,<end>)
for specifying ranges. Use "+=" when specifying zero_scatter
Example arc2pot.in
at_list = [1,3,5] + to(20,9230) + [9260,9262]
frames = to(1,50)
# link atom, where only the charge of the MM link atom is moved
link_atom[9263] = [9262, 0.68, 'H']
zero_scatter+=[[ [9262], [9251,9253,9258] ]]
# charge consistent link atom, considering the partial charge of the whole QM region as well as the MM link atom
link_atom[9231] = [9230, 0.69, 'H']
zero_scatter+=[[ to(9230,9244), [9219,9221,9226] ]]
at_list here contains the indices 1, 3, 5, 20-9230, 9260, 9262.
Formally the file is parsed as Python code (with an additional to(s,e)=range(s,e+1) function). Any other Python syntax may be used for constructing the at_list, frames, and zero_scatter lists and the link_atom and new_charge dictionaries.
pot_reduce.py
Reduce a file pot_old.xyz (Bohr) of point charges in Columbus format by combining close lying charges. A dynamic threshold is employed with the goal to describe areas close to the region of interest very accurately and to preserve the large scale electrostatic moments of a significantly larger system. Two point charges i and j of the same sign are combined into their center of charge if the distance dij between them is
where ri and rj are their respective distances to the origin, r0 should approximately define the molecular cavity, and C is a global scaling constant.
Within the Columbus program package, it is also possible to restore gradients for all of the original point charges. It is therefore possible to compute direct QM/MM dynamics in this fashion.
Syntax: 'pot_reduce.py [<mode>]'
If <mode> is left out or mode=reduce
, a reduction is performed. If mode=expand
, a gradient computed in the reduced set of point charges is expanded to the whole set of input charges. This expansion is implemented within the Columbus program package.
The input is parsed from a file pot_reduce.in with the following information. Only values with no specified default have to be entered.
Keyword | Default | Description |
C | 0.0 | Scaling threshold. The lower this threshold is, the more point charges are retained. Suggested values: 0.01≤C≤0.2 |
r0 | 0.0 | Radius (Bohr) of the molecular cavity. |
orig | [0.,0.,0.] | Specify the origin in Bohr radii. ri, rj and r0 are taken relative to this. Typically the origin should lie at the center of mass of the QM region. |
scale | [1.,1.,1.] | Scaling factor to allow specifying an elliptical cavity. Internally all quantities are rescaled by this factor. If for example the system extends 12 Bohr in the x-direction, 6 Bohr in the y-direction, and 4 Bohr in the z direction. You may specify r0=4.; scale=[3.,1.5,1.] |
formout | 'col' | Output format. 'col' will produce potential.xyz in Columbus format, 'tmol' pointcharges in Turbomole format. |
- Please use square brackets to enclose lists and to(<start>,<end>)
for specifying ranges.
Example pot_reduce.in
C=0.1
r0=0.
orig=[3.0,-4.7,5.0]
formout='col'
interpolate.py
Linear interpolation between two cartesian molecular structures of the same molecule.
Syntax: 'interpolate.py <start_struc> <end_struc> <out_dir> <steps_nr> <file_type>'
e.g. interpolate.py start.xyz end.xyz interpolate 10 xyz
<end_stuc> is superimposed onto <start_struc>, then <steps_nr> intermediate structures are constructed and
put into the directory <out_dir>.
Linear interpolation is a clearly defined way of finding intermediate structures. But well chosen
internal coordinates are probably more physical, especially when rotations play an important role.
distances.py
Computation of the distances (i.e. norm of the difference vector, RMSD) between a set of structures of one molecule.
Syntax: 'distances.py <struc1> <struc2> ... [-t<file_type>] [-mwp<mass_weight_power>] [-fit<fitting>]'
e.g. distances.py struc1.xyz struc2.xyz -txyz
A table with the distances between the specified structures is printed out.
<file_type> is a file type supported by openbabel (type 'babel -help' for a complete list).
The default for <mass_weight_power> is -mwp1 which leads to regular mass weighting (i.e. Å2amu for the squared norm and Åamu1/2 for the distance); -mwp0 means cartesian coordinates.
-fit1 which is the default specifies that structures are initially fitted onto the first structure, -fit0 leaves the structures unchanged.
int_coor.py, int_coor_multi.py
Print out values of internal coordinates.
Syntax: 'int_coor.py <coors> <file1> [<file2> ...]'
<coors> contains the types of internal coordinates (dist, bend, tors) considered and the atoms involved.
Example: int_coor.py dist 4 5 bend 1 2 3 tors 5 6 7 8 struc.xyz struc2.xyz
int_coor_multi.py
may be used to analyze a file containing multiple structures.
renumber_xyz.py
Renumbering of an xyz file.
Syntax: 'renumber_xyz.py <in_file> <renumber_file> <out_file>'
e.g. renumber_xyz.py struc.xyz renumber.txt struc_renumber.xyz
Renumber the xyz-file <in_file>. <renumber_file> contains the old numbers of the atoms in the new order (if optionally some
of the atoms are left out, they are added at the end). The result is written into <out_file>.
Python subroutine libraries
The underlying python subroutine libraries can be used as a basis for more complex tasks. For more information, look at the documentation included in the python files (either in the source code or with the python 'help(...)' command).
- struc_linalg.py - package for performing linear algebra operations on structures
- superposition.py - superposition of molecules with a quaternion fit [1]
- file_handler.py - basic file operations
[1] Karney, C. F. F. Journal of Molecular Graphics & Modelling 2007, 25, 595.
Contact
Felix Plasser (email: felix.plasser "at" univie.ac.at)
University of Vienna, Institute for Theoretical Chemistry
Währingerstr. 17, 1090, Vienna, Austria