Config KEYS

This section describes the format of the configurational file as used by Ta-dah!.

The primary building block of a configurational file is a KEY/VALUE pair. Each KEY/VALUE pair must be on the same line - no more than one key on the same line. The KEY is always a <string> type and must appear first on the line followed by the VALUE. The format and type of a VALUE is specific to a given KEY.

It is usually the case that only a small subset of KEYs is being used to train the model. Ta-dah! will throw an error if required key is missing:

[user@host:~] $ ta-dah train -c config.train
terminate called after throwing an instance of 'std::runtime_error'
  what():  Key not found: DBFILE
Aborted (core dumped)

Above code indicates that the DBFILE KEY has not been specified by the user in the config.train file. Adding the DBFILE key and the corresponding value to the config.train will solve the issue.

Note that the meaning for some of the KEYs may vary depending on what model or descriptor is being used. The documentation of each model or descriptor lists what KEYs are required and provides some further explanation.

The ‘#’ symbol can be used for comments.

The Max number of values: line indicates the maximum values which can be appended to the KEY. It can be done by either:

KEY VALUE1 VALUE2 VALUE3

or by repeating the KEY

KEY VALUE1
KEY VALUE2
KEY VALUE3

If the number of values exceed maximum allowed the code will throw an error. For example, defining RCTYPE2B twice where only one value is allowed results in the following error message:

[user@host:~] $ ta-dah train -c config.train
terminate called after throwing an instance of 'std::runtime_error'
  what():  Repeated key RCTYPE2B Cut_Cos
Aborted (core dumped)

AGRID3B

Description:

Dummy. See INIT3B.

AGRIDMB

AGRIDMB [int] N

Max number of values: 1

Description:

This KEY controls the ‘size’ of an angular grid for some many-body descriptors. The exact meaning depends on the particular descriptor.

Example: 1

AGRIDMB 2

ALPHA

ALPHA [double] N

Max number of values: 1

Default: 1.0

Description:

Weight precision hyper-parameter. Starting guess used in evidence approximation algorithm.

Example: 1

ALPHA 0.23

BASIS

BASIS [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

Basis vectors used by non-linear KRR model.

BETA

BETA [double] N

Max number of values: 1

Default: 1.0

Description:

Noise precision hyper-parameter. Starting guess used in evidence approximation algorithm.

Example: 1

BETA 0.0001

BIAS

BIAS [bool] true | false

Max number of values: 1

Default: true

Description:

Controls whether to append 1 to every descriptor. Increases DSIZE by 1.

Example: 1

BIAS false

CEMBFUNC

CEMBFUNC [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

A number of position parameters of the embedding function. It is used by some many-bod descriptors, e.g., It controls where x-intercept is in F_RLR.

Example: 1

CEMBFUNC 0.14 0.45 1.00 1.1

CGRID2B

Max number of values: 2147483647

Description:

This KEY controls the position parameters used by the radial basis functions of a two-body descriptor, e.g., position of a Gaussian function. The parameter list can be provided manually or generated automatically. This key is often used together with SGRID2B. It is usually the case that both CGRID2B and SGRID2B must be the same size. In most cases the maximum value should be smaller than the cutoff distance used for the two-body descriptor. Note, that not all descriptors use this parameter, e.g., D2_LJ has a fixed grid.

CGRID2B [int] -A [int] N [double] min [double] max

Description:

Generate grid using one of two available algorithms. The grid contains N points between min and max positions, inclusive. Both min and max values must be positive. The algorithm -A is selected as either -1 or -2. Note the minus in front of -A. Algorithm -1 provides evenly spaced points between min and max on a linear scale. Algorithm -2 as above but on the log scale.

Example: 1

CGRID2B -1 6 1.0 6.73

Example: 2

CGRID2B -2 6 1.0 6.73
CGRID2B [double] N1 N2 N3 …

Description:

Provide grid manually. Each number must be greater than zero.

Example: 1

CGRID2B 1.14 3.45 4.55 6.73

CGRID3B

Description:

Dummy. See INIT3B.

CGRIDMB

Description:

See CGRID2B for a description.

CHECKPRESS

CHECKPRESS [bool] true | false

Max number of values: 1

Description:

This KEY is used for debugging purposes. When true LAMMPS interface will throw an error on NaN pressure. This KEY is only valid when potential file is used with LAMMPS. It has no effect on training or prediction.

DBFILE

DBFILE [string] /path/to/dbfile …

Max number of values: 2147483647

Description:

Absolute or relative path to the database file. The relative path is to the script working directory. More than one dataset can be included. Either by listing paths in the same line separated by spaces or by repeating KEY multiple times.

Example: 1

DBFILE /path/to/dbfile

Example: 2

DBFILE /path/to/dbfile1 /path/to/dbfile2

DIMER

DIMER [bool] F [double] BOND_LENGTH [bool] B

Max number of values: 3

Default: false 0 false

Description:

Control for DIMER models. User should not modify this key

EWEIGHT

EWEIGHT [double] N

Max number of values: 1

Default: 1.0

Description:

Global energy scalling factor for all configurations used in the training process. Note that energies are always scalled by 1/number of atoms. Individual scalling factors for every configuration can be set in a dataset file. The combined scalling factor is: EWEIGHT*(configuration eweight)/(number of atoms)

Example: 1

EWEIGHT 0.96

FORCE

FORCE [bool] true | false

Max number of values: 1

Default: false

Description:

Set to true to calculate force descriptors and/or use forces during the training process.

FWEIGHT

FWEIGHT [double] N

Max number of values: 1

Default: 1.0

Description:

Global force scalling factor for all configurations used in the training process. Note that each force component is always scalled by 1/(number of atoms)/3. Individual scalling factors for every configuration can be set in a dataset file. The combined scalling factor is: FWEIGHT*(configuration fweight)/(number of atoms)/3

Example: 1

FWEIGHT 1e-2

HPOEVERY

HPOEVERY [string] DIR [int] N

Max number of values: 2

Description:

<EXPERIMENTAL>. This KEY is used during hyperparameter optimisation with HPO. It prints potential file every N steps to directory DIR.

Example: 1

HPOEVERY potfiles 100

INIT2B

INIT2B [bool] true | false

Max number of values: 1

Default: false

Description:

If set to true the two-body descriptor will be calculated.

Example: 1

INIT2B true

INIT3B

INIT3B [bool] true | false

Max number of values: 1

Default: false

Description:

This is a dummy flag as Ta-dah! does not calculate three body descriptors. Three body interactions can be included with some of the many-body descriptors.

INITMB

INITMB [bool] true | false

Max number of values: 1

Default: false

Description:

If set to true the many-body descriptor will be calculated.

Example: 1

INITMB true

LAMBDA

LAMBDA [int | double] N

Max number of values: 1

Default: 0

Description:

This KEY controls regularisation parameter \(\lambda\) for both M_BLR and M_KRR. If N=0 than no regularisation is being applied. If N>0 than \(\lambda\) is set to this value of N. If N<0 than evidence-approximation algorithm is used to estimate the value of \(\lambda\).

Example: 1

LAMBDA -1

Example: 2

LAMBDA 1e-4

MODEL

MODEL [string] MODEL [string] FUNCTION

Max number of values: 2

Description:

This key defines the MODEL to be used for training. MODEL can be any class which inherits from M_Base. FUNCTION any child class of :cpp:class`Func_Base`.

Example: 1

MODEL M_BLR BF_Linear

Example: 2

MODEL M_BLR BF_Polynomial2

Example: 3

MODEL M_KRR Kern_Linear

MPARAMS

MPARAMS [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

List of parameters used by some models. See model description for more details. Note that many models do not require this parameter at all.

Example: 1

MPARAMS 0.1

NMEAN

NMEAN [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

Mean values for the columns of the DesignMatrix. This vector is obained during standardisation (see NORM) of the DesignMatrix.

NORM

NORM [bool] true | false

Max number of values: 1

Default: false

Description:

Set true to standardise descriptors. Note that this usually make sense only when energies are being used for fitting.

Example: 1

NORM true

NSTDEV

NSTDEV [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

Standard deviations obtained during standardisation (see NORM) of the columns of the DesignMatrix. The size of the vector is equal to the number of columns of the DesignMatrix.

OUTPREC

OUTPREC [int] N

Max number of values: 1

Default: 10

Description:

Number of decimal places used when dumping a potential file.

Example: 1

OUTPREC 12

RCTYPE2B

RCTYPE2B [string] Cut_NAME

Max number of values: 1

Description:

Cutoff type to be used with a two-body descriptor.

Example: 1

RCTYPE2B Cut_Cos

RCTYPE3B

RCTYPE3B [string] Cut_NAME

Max number of values: 1

Description:

Dummy. See INIT3B.

RCTYPEMB

RCTYPEMB [string] Cut_NAME

Max number of values: 1

Description:

Cutoff type to be used with a many-body descriptor.

Example: 1

RCTYPEMB Cut_Cos

RCUT2B

RCUT2B [double] N

Max number of values: 1

Description:

Cutoff distance used by the two-body descriptor.

Example: 1

RCUT2B 6.7

RCUT3B

RCUT3B [double] N

Max number of values: 1

Description:

Dummy. See INIT3B.

RCUTMB

RCUTMB [double] N

Max number of values: 1

Description:

Cutoff distance used by the many-body descriptor.

Example: 1

RCUTMB 4.9

SBASIS

SBASIS [int] N

Max number of values: 1

Description:

The number of basis functions to use when constructing the DesignMatrix. Note that many models do not require this parameter at all.

Example: 1

SBASIS 10

SEMBFUNC

SEMBFUNC [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

A number of shape parameters of the embedding function. It is used by some many-bod descriptors, e.g., It controls the depth of the function in F_RLR.

Example: 1

SEMBFUNC 0.14 0.45 1.00 1.1

SETFL

SETFL [string] /path/to/dbfile

Max number of values: 1

Description:

Path to setfl file with eam potential.

Example: 1

SETFL Ta1_Ravelo_2013.eam.alloy

SGRID2B

Max number of values: 2147483647

Description:

Control the number of shape parameters for the radial basis functions for a two-body descriptor, e.g., width of the Gaussian function. Similarly to CGRID2B, the parameter list can be provided or generated automatically. This KEY is usually employed together with CGRID2B.

SGRID2B [int] -A [int] N [double] min [double] max

Description:

Generate grid using one of two available algorithms. The grid contains N points between min and max, inclusive. Both min and max values must be positive. The algorithm -A is selected as either -1 or -2. Note the minus in front of -A. Algorithm -1 provides evenly spaced points between min and max on a linear scale. Algorithm -2 as above but on the log scale.

Example: 1

SGRID2B -1 6 1.0 6.73

Example: 2

SGRID2B -2 6 1.0 6.73
SGRID2B [double] N1 N2 N3 …

Description:

Provide grid manually. Each number must be greater than zero.

Example: 1

SGRID2B 1.14 3.45 4.55 6.73

SGRID3B

Description:

Dummy. See INIT3B.

SGRIDMB

Description:

See SGRID2B for a description.

SIGMA

SIGMA [int] N [double] D …

Max number of values: 2147483647

Description:

The \(\Sigma\) matrix used in Bayesian Linear Regression M_BLR. The \(\Sigma\) matrix is \(N\times N\). It is stored in column major order, i.e., it is stored column by column.

Example: 1

`Sigma 2 1.2 2.2 2.3 3.3`

STRESS

STRESS [bool] true | false

Max number of values: 1

Default: false

Description:

Set to true to calculate stress descriptors and/or use virial stress during the training process.

SWEIGHT

SWEIGHT [double] N

Max number of values: 1

Default: 1.0

Description:

Global stress scalling factor for all configurations used in the training process. Note that each stress component is always scalled by 1/6.Individual scalling factors for every configuration can be set in a dataset file.The combined scalling factor is:SWEIGHT*(configuration sweight)/6

Example: 1

SWEIGHT 1e-1

TYPE2B

TYPE2B [string] D2_NAME

Max number of values: 1

Description:

Type of a two-body descriptor to be used. Every two-body descriptor inherits from D2_Base.

Example: 1

TYPE2B D2_LJ

Example: 2

TYPE2B D2_Blip

TYPE3B

TYPE3B [string] D3_NAME

Max number of values: 1

Description:

Dummy. See INIT3B.

TYPEMB

TYPEMB [string] DM_NAME

Max number of values: 1

Description:

Type of a many-body descriptor to be used. Every many-body descriptor inherits from DM_Base.

Example: 1

TYPEMB DM_EAD

VERBOSE

VERBOSE [int] N

Max number of values: 1

Default: 0

Description:

Set verbosity level.For N>0 it provides detailed output for diagnostic purposes.

Example: 1

VERBOSE 1

WEIGHTS

WEIGHTS [double] N1 N2 N3 …

Max number of values: 2147483647

Description:

The machine learned coefficients for the model.

Example: 1

WEIGHTS 0.12 1.2 0.3