Config KEYS
This section describes the format of the configurational file as used by Ta-dah!.
The primary building block of a configurational file is a KEY/VALUE pair. Each KEY/VALUE pair must be on the same line - no more than one key on the same line. The KEY is always a <string> type and must appear first on the line followed by the VALUE. The format and type of a VALUE is specific to a given KEY.
It is usually the case that only a small subset of KEYs is being used to train the model. Ta-dah! will throw an error if required key is missing:
[user@host:~] $ ta-dah train -c config.train
terminate called after throwing an instance of 'std::runtime_error'
what(): Key not found: DBFILE
Aborted (core dumped)
Above code indicates that the DBFILE KEY has not been specified by the user in
the config.train
file. Adding the DBFILE key and the corresponding
value to the config.train
will solve the issue.
Note that the meaning for some of the KEYs may vary depending on what model or descriptor is being used. The documentation of each model or descriptor lists what KEYs are required and provides some further explanation.
The ‘#’ symbol can be used for comments.
The Max number of values: line indicates the maximum values which can be appended to the KEY. It can be done by either:
KEY VALUE1 VALUE2 VALUE3
or by repeating the KEY
KEY VALUE1
KEY VALUE2
KEY VALUE3
If the number of values exceed maximum allowed the code will throw an error. For example, defining RCTYPE2B twice where only one value is allowed results in the following error message:
[user@host:~] $ ta-dah train -c config.train
terminate called after throwing an instance of 'std::runtime_error'
what(): Repeated key RCTYPE2B Cut_Cos
Aborted (core dumped)
AGRID3B
Description:
Dummy. See INIT3B.
AGRIDMB
- AGRIDMB [int] N
Max number of values: 1
Description:
This KEY controls the ‘size’ of an angular grid for some many-body descriptors. The exact meaning depends on the particular descriptor.
Example: 1
AGRIDMB 2
ALPHA
- ALPHA [double] N
Max number of values: 1
Default: 1.0
Description:
Weight precision hyper-parameter. Starting guess used in evidence approximation algorithm.
Example: 1
ALPHA 0.23
BASIS
- BASIS [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
Basis vectors used by non-linear KRR model.
BETA
- BETA [double] N
Max number of values: 1
Default: 1.0
Description:
Noise precision hyper-parameter. Starting guess used in evidence approximation algorithm.
Example: 1
BETA 0.0001
BIAS
- BIAS [bool] true | false
Max number of values: 1
Default: true
Description:
Controls whether to append 1 to every descriptor. Increases DSIZE by 1.
Example: 1
BIAS false
CEMBFUNC
- CEMBFUNC [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
A number of position parameters of the embedding function. It is used by some many-bod descriptors, e.g., It controls where x-intercept is in
F_RLR
.Example: 1
CEMBFUNC 0.14 0.45 1.00 1.1
CGRID2B
Max number of values: 2147483647
Description:
This KEY controls the position parameters used by the radial basis functions of a two-body descriptor, e.g., position of a Gaussian function. The parameter list can be provided manually or generated automatically. This key is often used together with SGRID2B. It is usually the case that both CGRID2B and SGRID2B must be the same size. In most cases the maximum value should be smaller than the cutoff distance used for the two-body descriptor. Note, that not all descriptors use this parameter, e.g.,
D2_LJ
has a fixed grid.
- CGRID2B [int] -A [int] N [double] min [double] max
Description:
Generate grid using one of two available algorithms. The grid contains
N
points betweenmin
andmax
positions, inclusive. Bothmin
andmax
values must be positive. The algorithm-A
is selected as either-1
or-2
. Note the minus in front of-A
. Algorithm-1
provides evenly spaced points between min and max on a linear scale. Algorithm-2
as above but on the log scale.Example: 1
CGRID2B -1 6 1.0 6.73
Example: 2
CGRID2B -2 6 1.0 6.73
- CGRID2B [double] N1 N2 N3 …
Description:
Provide grid manually. Each number must be greater than zero.
Example: 1
CGRID2B 1.14 3.45 4.55 6.73
CGRID3B
Description:
Dummy. See INIT3B.
CGRIDMB
Description:
See CGRID2B for a description.
CHECKPRESS
- CHECKPRESS [bool] true | false
Max number of values: 1
Description:
This KEY is used for debugging purposes. When true LAMMPS interface will throw an error on NaN pressure. This KEY is only valid when potential file is used with LAMMPS. It has no effect on training or prediction.
DBFILE
- DBFILE [string] /path/to/dbfile …
Max number of values: 2147483647
Description:
Absolute or relative path to the database file. The relative path is to the script working directory. More than one dataset can be included. Either by listing paths in the same line separated by spaces or by repeating KEY multiple times.
Example: 1
DBFILE /path/to/dbfile
Example: 2
DBFILE /path/to/dbfile1 /path/to/dbfile2
DIMER
- DIMER [bool] F [double] BOND_LENGTH [bool] B
Max number of values: 3
Default: false 0 false
Description:
Control for DIMER models. User should not modify this key
EWEIGHT
- EWEIGHT [double] N
Max number of values: 1
Default: 1.0
Description:
Global energy scalling factor for all configurations used in the training process. Note that energies are always scalled by 1/number of atoms. Individual scalling factors for every configuration can be set in a dataset file. The combined scalling factor is: EWEIGHT*(configuration eweight)/(number of atoms)
Example: 1
EWEIGHT 0.96
FORCE
- FORCE [bool] true | false
Max number of values: 1
Default: false
Description:
Set to true to calculate force descriptors and/or use forces during the training process.
FWEIGHT
- FWEIGHT [double] N
Max number of values: 1
Default: 1.0
Description:
Global force scalling factor for all configurations used in the training process. Note that each force component is always scalled by 1/(number of atoms)/3. Individual scalling factors for every configuration can be set in a dataset file. The combined scalling factor is: FWEIGHT*(configuration fweight)/(number of atoms)/3
Example: 1
FWEIGHT 1e-2
HPOEVERY
- HPOEVERY [string] DIR [int] N
Max number of values: 2
Description:
<EXPERIMENTAL>. This KEY is used during hyperparameter optimisation with HPO. It prints potential file every N steps to directory DIR.
Example: 1
HPOEVERY potfiles 100
INIT2B
- INIT2B [bool] true | false
Max number of values: 1
Default: false
Description:
If set to true the two-body descriptor will be calculated.
Example: 1
INIT2B true
INIT3B
- INIT3B [bool] true | false
Max number of values: 1
Default: false
Description:
This is a dummy flag as Ta-dah! does not calculate three body descriptors. Three body interactions can be included with some of the many-body descriptors.
INITMB
- INITMB [bool] true | false
Max number of values: 1
Default: false
Description:
If set to true the many-body descriptor will be calculated.
Example: 1
INITMB true
LAMBDA
- LAMBDA [int | double] N
Max number of values: 1
Default: 0
Description:
This KEY controls regularisation parameter \(\lambda\) for both
M_BLR
andM_KRR
. IfN=0
than no regularisation is being applied. IfN>0
than \(\lambda\) is set to this value ofN
. IfN<0
than evidence-approximation algorithm is used to estimate the value of \(\lambda\).Example: 1
LAMBDA -1
Example: 2
LAMBDA 1e-4
MODEL
- MODEL [string] MODEL [string] FUNCTION
Max number of values: 2
Description:
This key defines the
MODEL
to be used for training.MODEL
can be any class which inherits fromM_Base
.FUNCTION
any child class of :cpp:class`Func_Base`.Example: 1
MODEL M_BLR BF_Linear
Example: 2
MODEL M_BLR BF_Polynomial2
Example: 3
MODEL M_KRR Kern_Linear
MPARAMS
- MPARAMS [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
List of parameters used by some models. See model description for more details. Note that many models do not require this parameter at all.
Example: 1
MPARAMS 0.1
NMEAN
- NMEAN [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
Mean values for the columns of the
DesignMatrix
. This vector is obained during standardisation (see NORM) of theDesignMatrix
.
NORM
- NORM [bool] true | false
Max number of values: 1
Default: false
Description:
Set true to standardise descriptors. Note that this usually make sense only when energies are being used for fitting.
Example: 1
NORM true
NSTDEV
- NSTDEV [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
Standard deviations obtained during standardisation (see NORM) of the columns of the
DesignMatrix
. The size of the vector is equal to the number of columns of theDesignMatrix
.
OUTPREC
- OUTPREC [int] N
Max number of values: 1
Default: 10
Description:
Number of decimal places used when dumping a potential file.
Example: 1
OUTPREC 12
RCTYPE2B
- RCTYPE2B [string] Cut_NAME
Max number of values: 1
Description:
Cutoff type to be used with a two-body descriptor.
Example: 1
RCTYPE2B Cut_Cos
RCTYPE3B
- RCTYPE3B [string] Cut_NAME
Max number of values: 1
Description:
Dummy. See INIT3B.
RCTYPEMB
- RCTYPEMB [string] Cut_NAME
Max number of values: 1
Description:
Cutoff type to be used with a many-body descriptor.
Example: 1
RCTYPEMB Cut_Cos
RCUT2B
- RCUT2B [double] N
Max number of values: 1
Description:
Cutoff distance used by the two-body descriptor.
Example: 1
RCUT2B 6.7
RCUT3B
- RCUT3B [double] N
Max number of values: 1
Description:
Dummy. See INIT3B.
RCUTMB
- RCUTMB [double] N
Max number of values: 1
Description:
Cutoff distance used by the many-body descriptor.
Example: 1
RCUTMB 4.9
SBASIS
- SBASIS [int] N
Max number of values: 1
Description:
The number of basis functions to use when constructing the
DesignMatrix
. Note that many models do not require this parameter at all.Example: 1
SBASIS 10
SEMBFUNC
- SEMBFUNC [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
A number of shape parameters of the embedding function. It is used by some many-bod descriptors, e.g., It controls the depth of the function in
F_RLR
.Example: 1
SEMBFUNC 0.14 0.45 1.00 1.1
SETFL
- SETFL [string] /path/to/dbfile
Max number of values: 1
Description:
Path to setfl file with eam potential.
Example: 1
SETFL Ta1_Ravelo_2013.eam.alloy
SGRID2B
Max number of values: 2147483647
Description:
Control the number of shape parameters for the radial basis functions for a two-body descriptor, e.g., width of the Gaussian function. Similarly to CGRID2B, the parameter list can be provided or generated automatically. This KEY is usually employed together with CGRID2B.
- SGRID2B [int] -A [int] N [double] min [double] max
Description:
Generate grid using one of two available algorithms. The grid contains
N
points betweenmin
andmax
, inclusive. Bothmin
andmax
values must be positive. The algorithm-A
is selected as either-1
or-2
. Note the minus in front of-A
. Algorithm-1
provides evenly spaced points betweenmin
andmax
on a linear scale. Algorithm-2
as above but on the log scale.Example: 1
SGRID2B -1 6 1.0 6.73
Example: 2
SGRID2B -2 6 1.0 6.73
- SGRID2B [double] N1 N2 N3 …
Description:
Provide grid manually. Each number must be greater than zero.
Example: 1
SGRID2B 1.14 3.45 4.55 6.73
SGRID3B
Description:
Dummy. See INIT3B.
SGRIDMB
Description:
See SGRID2B for a description.
SIGMA
- SIGMA [int] N [double] D …
Max number of values: 2147483647
Description:
The \(\Sigma\) matrix used in Bayesian Linear Regression
M_BLR
. The \(\Sigma\) matrix is \(N\times N\). It is stored in column major order, i.e., it is stored column by column.Example: 1
`Sigma 2 1.2 2.2 2.3 3.3`
STRESS
- STRESS [bool] true | false
Max number of values: 1
Default: false
Description:
Set to true to calculate stress descriptors and/or use virial stress during the training process.
SWEIGHT
- SWEIGHT [double] N
Max number of values: 1
Default: 1.0
Description:
Global stress scalling factor for all configurations used in the training process. Note that each stress component is always scalled by 1/6.Individual scalling factors for every configuration can be set in a dataset file.The combined scalling factor is:SWEIGHT*(configuration sweight)/6
Example: 1
SWEIGHT 1e-1
TYPE2B
- TYPE2B [string] D2_NAME
Max number of values: 1
Description:
Type of a two-body descriptor to be used. Every two-body descriptor inherits from
D2_Base
.Example: 1
TYPE2B D2_LJ
Example: 2
TYPE2B D2_Blip
TYPE3B
- TYPE3B [string] D3_NAME
Max number of values: 1
Description:
Dummy. See INIT3B.
TYPEMB
- TYPEMB [string] DM_NAME
Max number of values: 1
Description:
Type of a many-body descriptor to be used. Every many-body descriptor inherits from
DM_Base
.Example: 1
TYPEMB DM_EAD
VERBOSE
- VERBOSE [int] N
Max number of values: 1
Default: 0
Description:
Set verbosity level.For
N>0
it provides detailed output for diagnostic purposes.Example: 1
VERBOSE 1
WEIGHTS
- WEIGHTS [double] N1 N2 N3 …
Max number of values: 2147483647
Description:
The machine learned coefficients for the model.
Example: 1
WEIGHTS 0.12 1.2 0.3