Quick Start

This section covers fundamentals and command line interface (CLI) of Ta-dah!

For installation instructions see Installation.

If you are interested to use it as a C++ library have a look at the API Examples and browse through the API documentation. Still this section might prove useful.

Overview

To train a model, a config file (see Config file) and a dataset (see Dataset format) are required.

For prediction, a potential (trained model) and a dataset are required. The potential file is an output of the training process. The potential file is technically the same as the config file.

Training

To train a model run the following command in a terminal:

ta-dah train -c config.train

This will train a model using energies only. To also train on forces add -F flag, for stresses -S flag, or add FORCE true and STRESS true keys to the config file. Note that flags take priority over config values.

Here config.train is a Config file. Training dataset(s), descriptors, cutoffs, model and all parameters are specified in the config file. An output of this command is a trained model as a pot.tadah file.

Here is the minimal example of the config.train file:

DBFILE     db.train
INIT2B     true
RCUT2B     5.3
TYPE2B     D2_LJ
RCTYPE2B   Cut_Dummy
MODEL      M_KRR     Kern_Linear

For the explanation of KEYS see Config.

An example dataset can be downloaded from here.

Prediction

To predict energies using an existing pot.tadah model run:

ta-dah predict -p pot.tadah -d db.predict

To also predict forces add -F flag, for stresses -S flag.

Alternatively, a config file can be used to specify a prediction datasets (DBFILE) and whether forces and stresses are meant to be calculated (FORCE and STRESS keys).

config.pred example:

DBFILE     db.predict
FORCE      false
STRESS     true

To predict using pot.tadah model and config.predict:

ta-dah predict -p pot.tadah -c config.predict

The output of predict subcommand are three files:

energy.pred There are two columns. The first column lists datasets energy/atom, the second predicted energy/atom. The ordering of rows follows ordering of datasets in a config file or with -d flag. First all energies from the first dataset are listed then from the second and so on.

forces.pred Similar idea as above but now forces are listed. First row is a force on the first atom in the x-direcion from the first dataset, second row is the force on the first atom in the y-direction, and so on.

stress.pred First 6 rows list components of the stress tenosor from the first configuration, followed by 6 components from the second configuration… Ordering is xx,xy,xz,yy,yz,zz

Built in help

Ta-dah! provides some basic help. Try to run it with -h flag.

ta-dah -h

To read more about particular subcommand try

ta-dah train -h

Units and Ta-dah!

In principle Ta-dah! will work with any units. The units used are determined by the units in a training datasets. So if your dataset has energy units of electronvolt and distance in Angstrom then created model will have the same units. The unit of force must be eV/A in this case. The stress tensor has units of energy (pressure*volume). In other words, the units of stress are the same as energy and unit of force is energy/distance.

The units selected by LAMMPS must be consistent with the model units. In this case they would correspond to metal units in LAMMPS.

Note

Ta-dah! have been tested with units of eV and Angstrom. Unless you have a good reason to use different units those are the recommended ones.

Config file

For a list and explanation of supported KEY-VALUE(S) pairs see Config

Dataset format

Dataset(s) are included using DBFILE key in a Config file. More than one dataset can be specified.

There is no restriction on a number of atoms in different structures, so it’s ok to have a structure with 12 atoms and another one with 24 atoms in the same dataset.

The dataset has the following structure:

Comment line
eweight fweight sweight
ENERGY
cell vector a
cell vector b
cell vector c
stress tensor row s_1
stress tensor row s_2
stress tensor row s_3
Element px py pz fx fy fz
...
<blank line>
Comment line
eweight fweight sweight
...

First line is a comment line; it will be used as a label for a structure.
eweight fweight sweight are (optional) weighting parameters used for training. If this line is missing it defaults to 1.0 1.0 1.0. Do not leave blank line.
Each cell vector contains 3 numbers.
Each stress tensor row contains 3 numbers.
Number of lines beginning with Element is equal to the number of atoms in a structure.
Element is an atom label, usually chemical element symbol.
px, py and pz are Cartesian coordinates of the atom position.
fx, fy and fz are components of the force vector acting on the atom.
Each configuration is separated by the blank line.

If forces and/or stresses are not available they can be set to zero to satisfy parser.

Also See Units and Ta-dah!

Ta-dah! and LAMMPS

Once trained the pot.tadah file can be used with LAMMPS like any other pair potential.

pair_style      tadah/tadah
pair_coeff      * * /path/to/pot.tadah ELEMENT

Here is an example lammps script file and pot.file.

See Installing LAMMPS interface for interface installation instructions.

Suport for muli-species systems

Ta-dah! is capable of generating machine-learned potentials for mono- and multi-component systems.