Quick Start

This section covers fundamentals and command line interface (CLI) of Ta-dah!

For installation instructions see Installation.

If you are interested to use it as a C++ library have a look at the API Examples and browse through the API documentation. Still this section might prove useful.

Overview

To train a model, a config file (see Config file) and a dataset (see Dataset format) are required.

For prediction, a potential (trained model) and a dataset are required. The potential file is an output of the training process. The potential file is technically the same as the config file.

Training

To train a model run the following command in a terminal:

ta-dah train -c config.train

This will train a model using energies only. To also train on forces add -F flag, for stresses -S flag, or add FORCE true and STRESS true keys to the config file. Note that flags take priority over config values.

Here config.train is a Config file. Training dataset(s), descriptors, cutoffs, model and all parameters are specified in the config file. An output of this command is a trained model as a pot.tadah file.

Here is the minimal example of the config.train file:

DBFILE     db.train
INIT2B     true
RCUT2B     5.3
TYPE2B     D2_LJ
RCTYPE2B   Cut_Dummy
MODEL      M_KRR     Kern_Linear

For the explanation of KEYS see Config.

An example dataset can be downloaded from here.

Prediction

To predict energies using an existing pot.tadah model run:

ta-dah predict -p pot.tadah -d db.predict

To also predict forces add -F flag, for stresses -S flag.

Alternatively, a config file can be used to specify a prediction datasets (DBFILE) and whether forces and stresses are meant to be calculated (FORCE and STRESS keys).

config.pred example:

DBFILE     db.predict
FORCE      false
STRESS     true

To predict using pot.tadah model and config.predict:

ta-dah predict -p pot.tadah -c config.predict

The output of predict subcommand are three files:

  • energy.pred There are two columns. The first column lists datasets energy/atom, the second predicted energy/atom. The ordering of rows follows ordering of datasets in a config file or with -d flag. First all energies from the first dataset are listed then from the second and so on.

  • forces.pred Similar idea as above but now forces are listed. First row is a force on the first atom in the x-direcion from the first dataset, second row is the force on the first atom in the y-direction, and so on.

  • stress.pred First 6 rows list components of the stress tenosor from the first configuration, followed by 6 components from the second configuration… Ordering is xx,xy,xz,yy,yz,zz

Built in help

Ta-dah! provides some basic help. Try to run it with -h flag.

ta-dah -h

To read more about particular subcommand try

ta-dah train -h

Units and Ta-dah!

In principle Ta-dah! will work with any units. The units used are determined by the units in a training datasets. So if your dataset has energy units of electronvolt and distance in Angstrom then created model will have the same units. The unit of force must be eV/A in this case. The stress tensor has units of energy (pressure*volume). In other words, the units of stress are the same as energy and unit of force is energy/distance.

The units selected by LAMMPS must be consistent with the model units. In this case they would correspond to metal units in LAMMPS.

Note

Ta-dah! have been tested with units of eV and Angstrom. Unless you have a good reason to use different units those are the recommended ones.

Config file

For a list and explanation of supported KEY-VALUE(S) pairs see Config

Dataset format

Dataset(s) are included using DBFILE key in a Config file. More than one dataset can be specified.

There is no restriction on a number of atoms in different structures, so it’s ok to have a structure with 12 atoms and another one with 24 atoms in the same dataset.

The dataset has the following structure:

Comment line
eweight fweight sweight
ENERGY
cell vector a
cell vector b
cell vector c
stress tensor row s_1
stress tensor row s_2
stress tensor row s_3
Element px py pz fx fy fz
...
<blank line>
Comment line
eweight fweight sweight
...
  • First line is a comment line; it will be used as a label for a structure.

  • eweight fweight sweight are (optional) weighting parameters used for training. If this line is missing it defaults to 1.0 1.0 1.0. Do not leave blank line.

  • Each cell vector contains 3 numbers.

  • Each stress tensor row contains 3 numbers.

  • Number of lines beginning with Element is equal to the number of atoms in a structure.

  • Element is an atom label, usually chemical element symbol.

  • px, py and pz are Cartesian coordinates of the atom position.

  • fx, fy and fz are components of the force vector acting on the atom.

  • Each configuration is separated by the blank line.

If forces and/or stresses are not available they can be set to zero to satisfy parser.

Also See Units and Ta-dah!

Ta-dah! and LAMMPS

Once trained the pot.tadah file can be used with LAMMPS like any other pair potential.

pair_style      tadah/tadah
pair_coeff      * * /path/to/pot.tadah ELEMENT

Here is an example lammps script file and pot.file.

See Installing LAMMPS interface for interface installation instructions.

Suport for muli-species systems

Ta-dah! is capable of generating machine-learned potentials for mono- and multi-component systems.