Config KEYS =========== This section describes the format of the configurational file as used by Ta-dah!. The primary building block of a configurational file is a KEY/VALUE pair. Each KEY/VALUE pair must be on the same line - no more than one key on the same line. The KEY is always a type and must appear first on the line followed by the VALUE. The format and type of a VALUE is specific to a given KEY. It is usually the case that only a small subset of KEYs is being used to train the model. Ta-dah! will throw an error if required key is missing: .. code-block:: bash [user@host:~] $ ta-dah train -c config.train terminate called after throwing an instance of 'std::runtime_error' what(): Key not found: DBFILE Aborted (core dumped) Above code indicates that the DBFILE_ KEY has not been specified by the user in the :file:`config.train` file. Adding the DBFILE key and the corresponding value to the :file:`config.train` will solve the issue. Note that the meaning for some of the KEYs may vary depending on what model or descriptor is being used. The documentation of each model or descriptor lists what KEYs are required and provides some further explanation. The '#' symbol can be used for comments. The `Max number of values:` line indicates the maximum values which can be appended to the KEY. It can be done by either: .. code-block:: bash KEY VALUE1 VALUE2 VALUE3 or by repeating the KEY .. code-block:: bash KEY VALUE1 KEY VALUE2 KEY VALUE3 If the number of values exceed maximum allowed the code will throw an error. For example, defining RCTYPE2B_ twice where only one value is allowed results in the following error message: .. code-block:: bash [user@host:~] $ ta-dah train -c config.train terminate called after throwing an instance of 'std::runtime_error' what(): Repeated key RCTYPE2B Cut_Cos Aborted (core dumped) .. _AGRID3B: AGRID3B ....... *Description*: Dummy. See INIT3B_. .. _AGRIDMB: AGRIDMB ....... AGRIDMB [*int*] N *Max number of values*: 1 *Description*: This KEY controls the 'size' of an angular grid for some many-body descriptors. The exact meaning depends on the particular descriptor. *Example*: 1 :: AGRIDMB 2 .. _ALPHA: ALPHA ..... ALPHA [*double*] N *Max number of values*: 1 *Default*: 1.0 *Description*: Weight precision hyper-parameter. Starting guess used in evidence approximation algorithm. *Example*: 1 :: ALPHA 0.23 .. _BASIS: BASIS ..... BASIS [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: Basis vectors used by non-linear KRR model. .. _BETA: BETA .... BETA [*double*] N *Max number of values*: 1 *Default*: 1.0 *Description*: Noise precision hyper-parameter. Starting guess used in evidence approximation algorithm. *Example*: 1 :: BETA 0.0001 .. _BIAS: BIAS .... BIAS [*bool*] true | false *Max number of values*: 1 *Default*: true *Description*: Controls whether to append 1 to every descriptor. Increases DSIZE by 1. *Example*: 1 :: BIAS false .. _CEMBFUNC: CEMBFUNC ........ CEMBFUNC [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: A number of position parameters of the embedding function. It is used by some many-bod descriptors, e.g., It controls where x-intercept is in :cpp:class:`F_RLR`. *Example*: 1 :: CEMBFUNC 0.14 0.45 1.00 1.1 .. _CGRID2B: CGRID2B ....... *Max number of values*: 2147483647 *Description*: This KEY controls the position parameters used by the radial basis functions of a two-body descriptor, e.g., position of a Gaussian function. The parameter list can be provided manually or generated automatically. This key is often used together with SGRID2B_. It is usually the case that both CGRID2B_ and SGRID2B_ must be the same size. In most cases the maximum value should be smaller than the cutoff distance used for the two-body descriptor. Note, that not all descriptors use this parameter, e.g., :cpp:class:`D2_LJ` has a fixed grid. CGRID2B [*int*] -A [*int*] N [*double*] min [*double*] max *Description*: Generate grid using one of two available algorithms. The grid contains :code:`N` points between :code:`min` and :code:`max` positions, inclusive. Both :code:`min` and :code:`max` values must be positive. The algorithm :code:`-A` is selected as either :code:`-1` or :code:`-2`. Note the minus in front of :code:`-A`. Algorithm :code:`-1` provides evenly spaced points between `min` and `max` on a linear scale. Algorithm :code:`-2` as above but on the log scale. *Example*: 1 :: CGRID2B -1 6 1.0 6.73 *Example*: 2 :: CGRID2B -2 6 1.0 6.73 CGRID2B [*double*] N1 N2 N3 ... *Description*: Provide grid manually. Each number must be greater than zero. *Example*: 1 :: CGRID2B 1.14 3.45 4.55 6.73 .. _CGRID3B: CGRID3B ....... *Description*: Dummy. See INIT3B_. .. _CGRIDMB: CGRIDMB ....... *Description*: See CGRID2B_ for a description. .. _CHECKPRESS: CHECKPRESS .......... CHECKPRESS [*bool*] true | false *Max number of values*: 1 *Description*: This KEY is used for debugging purposes. When true LAMMPS interface will throw an error on NaN pressure. This KEY is only valid when potential file is used with LAMMPS. It has no effect on training or prediction. .. _DBFILE: DBFILE ...... DBFILE [*string*] /path/to/dbfile ... *Max number of values*: 2147483647 *Description*: Absolute or relative path to the database file. The relative path is to the script working directory. More than one dataset can be included. Either by listing paths in the same line separated by spaces or by repeating KEY multiple times. *Example*: 1 :: DBFILE /path/to/dbfile *Example*: 2 :: DBFILE /path/to/dbfile1 /path/to/dbfile2 .. _DIMER: DIMER ..... DIMER [*bool*] F [*double*] BOND_LENGTH [*bool*] B *Max number of values*: 3 *Default*: false 0 false *Description*: Control for DIMER models. User should not modify this key .. _EWEIGHT: EWEIGHT ....... EWEIGHT [*double*] N *Max number of values*: 1 *Default*: 1.0 *Description*: Global energy scalling factor for all configurations used in the training process. Note that energies are always scalled by 1/number of atoms. Individual scalling factors for every configuration can be set in a dataset file. The combined scalling factor is: EWEIGHT*(configuration eweight)/(number of atoms) *Example*: 1 :: EWEIGHT 0.96 .. _FORCE: FORCE ..... FORCE [*bool*] true | false *Max number of values*: 1 *Default*: false *Description*: Set to true to calculate force descriptors and/or use forces during the training process. .. _FWEIGHT: FWEIGHT ....... FWEIGHT [*double*] N *Max number of values*: 1 *Default*: 1.0 *Description*: Global force scalling factor for all configurations used in the training process. Note that each force component is always scalled by 1/(number of atoms)/3. Individual scalling factors for every configuration can be set in a dataset file. The combined scalling factor is: FWEIGHT*(configuration fweight)/(number of atoms)/3 *Example*: 1 :: FWEIGHT 1e-2 .. _HPOEVERY: HPOEVERY ........ HPOEVERY [*string*] DIR [*int*] N *Max number of values*: 2 *Description*: . This KEY is used during hyperparameter optimisation with HPO. It prints potential file every N steps to directory DIR. *Example*: 1 :: HPOEVERY potfiles 100 .. _INIT2B: INIT2B ...... INIT2B [*bool*] true | false *Max number of values*: 1 *Default*: false *Description*: If set to true the two-body descriptor will be calculated. *Example*: 1 :: INIT2B true .. _INIT3B: INIT3B ...... INIT3B [*bool*] true | false *Max number of values*: 1 *Default*: false *Description*: This is a dummy flag as Ta-dah! does not calculate three body descriptors. Three body interactions can be included with some of the many-body descriptors. .. _INITMB: INITMB ...... INITMB [*bool*] true | false *Max number of values*: 1 *Default*: false *Description*: If set to true the many-body descriptor will be calculated. *Example*: 1 :: INITMB true .. _LAMBDA: LAMBDA ...... LAMBDA [*int | double*] N *Max number of values*: 1 *Default*: 0 *Description*: This KEY controls regularisation parameter :math:`\lambda` for both :cpp:class:`M_BLR` and :cpp:class:`M_KRR`. If :code:`N=0` than no regularisation is being applied. If :code:`N>0` than :math:`\lambda` is set to this value of :code:`N`. If :code:`N<0` than evidence-approximation algorithm is used to estimate the value of :math:`\lambda`. *Example*: 1 :: LAMBDA -1 *Example*: 2 :: LAMBDA 1e-4 .. _MODEL: MODEL ..... MODEL [*string*] MODEL [*string*] FUNCTION *Max number of values*: 2 *Description*: This key defines the :code:`MODEL` to be used for training. :code:`MODEL` can be any class which inherits from :cpp:class:`M_Base`. :code:`FUNCTION` any child class of :cpp:class`Func_Base`. *Example*: 1 :: MODEL M_BLR BF_Linear *Example*: 2 :: MODEL M_BLR BF_Polynomial2 *Example*: 3 :: MODEL M_KRR Kern_Linear .. _MPARAMS: MPARAMS ....... MPARAMS [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: List of parameters used by some models. See model description for more details. Note that many models do not require this parameter at all. *Example*: 1 :: MPARAMS 0.1 .. _NMEAN: NMEAN ..... NMEAN [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: Mean values for the columns of the :cpp:class:`DesignMatrix`. This vector is obained during standardisation (see NORM_) of the :cpp:class:`DesignMatrix`. .. _NORM: NORM .... NORM [*bool*] true | false *Max number of values*: 1 *Default*: false *Description*: Set true to standardise descriptors. Note that this usually make sense only when energies are being used for fitting. *Example*: 1 :: NORM true .. _NSTDEV: NSTDEV ...... NSTDEV [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: Standard deviations obtained during standardisation (see NORM_) of the columns of the :cpp:class:`DesignMatrix`. The size of the vector is equal to the number of columns of the :cpp:class:`DesignMatrix`. .. _OUTPREC: OUTPREC ....... OUTPREC [*int*] N *Max number of values*: 1 *Default*: 10 *Description*: Number of decimal places used when dumping a potential file. *Example*: 1 :: OUTPREC 12 .. _RCTYPE2B: RCTYPE2B ........ RCTYPE2B [*string*] Cut_NAME *Max number of values*: 1 *Description*: Cutoff type to be used with a two-body descriptor. *Example*: 1 :: RCTYPE2B Cut_Cos .. _RCTYPE3B: RCTYPE3B ........ RCTYPE3B [*string*] Cut_NAME *Max number of values*: 1 *Description*: Dummy. See INIT3B_. .. _RCTYPEMB: RCTYPEMB ........ RCTYPEMB [*string*] Cut_NAME *Max number of values*: 1 *Description*: Cutoff type to be used with a many-body descriptor. *Example*: 1 :: RCTYPEMB Cut_Cos .. _RCUT2B: RCUT2B ...... RCUT2B [*double*] N *Max number of values*: 1 *Description*: Cutoff distance used by the two-body descriptor. *Example*: 1 :: RCUT2B 6.7 .. _RCUT3B: RCUT3B ...... RCUT3B [*double*] N *Max number of values*: 1 *Description*: Dummy. See INIT3B_. .. _RCUTMB: RCUTMB ...... RCUTMB [*double*] N *Max number of values*: 1 *Description*: Cutoff distance used by the many-body descriptor. *Example*: 1 :: RCUTMB 4.9 .. _SBASIS: SBASIS ...... SBASIS [*int*] N *Max number of values*: 1 *Description*: The number of basis functions to use when constructing the :cpp:class:`DesignMatrix`. Note that many models do not require this parameter at all. *Example*: 1 :: SBASIS 10 .. _SEMBFUNC: SEMBFUNC ........ SEMBFUNC [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: A number of shape parameters of the embedding function. It is used by some many-bod descriptors, e.g., It controls the depth of the function in :cpp:class:`F_RLR`. *Example*: 1 :: SEMBFUNC 0.14 0.45 1.00 1.1 .. _SETFL: SETFL ..... SETFL [*string*] /path/to/dbfile *Max number of values*: 1 *Description*: Path to setfl file with eam potential. *Example*: 1 :: SETFL Ta1_Ravelo_2013.eam.alloy .. _SGRID2B: SGRID2B ....... *Max number of values*: 2147483647 *Description*: Control the number of shape parameters for the radial basis functions for a two-body descriptor, e.g., width of the Gaussian function. Similarly to CGRID2B_, the parameter list can be provided or generated automatically. This KEY is usually employed together with CGRID2B_. SGRID2B [*int*] -A [*int*] N [*double*] min [*double*] max *Description*: Generate grid using one of two available algorithms. The grid contains :code:`N` points between :code:`min` and :code:`max`, inclusive. Both :code:`min` and :code:`max` values must be positive. The algorithm :code:`-A` is selected as either :code:`-1` or :code:`-2`. Note the minus in front of :code:`-A`. Algorithm :code:`-1` provides evenly spaced points between :code:`min` and :code:`max` on a linear scale. Algorithm :code:`-2` as above but on the log scale. *Example*: 1 :: SGRID2B -1 6 1.0 6.73 *Example*: 2 :: SGRID2B -2 6 1.0 6.73 SGRID2B [*double*] N1 N2 N3 ... *Description*: Provide grid manually. Each number must be greater than zero. *Example*: 1 :: SGRID2B 1.14 3.45 4.55 6.73 .. _SGRID3B: SGRID3B ....... *Description*: Dummy. See INIT3B_. .. _SGRIDMB: SGRIDMB ....... *Description*: See SGRID2B_ for a description. .. _SIGMA: SIGMA ..... SIGMA [*int*] N [*double*] D ... *Max number of values*: 2147483647 *Description*: The :math:`\Sigma` matrix used in Bayesian Linear Regression :cpp:class:`M_BLR`. The :math:`\Sigma` matrix is :math:`N\times N`. It is stored in column major order, i.e., it is stored column by column. *Example*: 1 :: `Sigma 2 1.2 2.2 2.3 3.3` .. _STRESS: STRESS ...... STRESS [*bool*] true | false *Max number of values*: 1 *Default*: false *Description*: Set to true to calculate stress descriptors and/or use virial stress during the training process. .. _SWEIGHT: SWEIGHT ....... SWEIGHT [*double*] N *Max number of values*: 1 *Default*: 1.0 *Description*: Global stress scalling factor for all configurations used in the training process. Note that each stress component is always scalled by 1/6.Individual scalling factors for every configuration can be set in a dataset file.The combined scalling factor is:SWEIGHT*(configuration sweight)/6 *Example*: 1 :: SWEIGHT 1e-1 .. _TYPE2B: TYPE2B ...... TYPE2B [*string*] D2_NAME *Max number of values*: 1 *Description*: Type of a two-body descriptor to be used. Every two-body descriptor inherits from :cpp:class:`D2_Base`. *Example*: 1 :: TYPE2B D2_LJ *Example*: 2 :: TYPE2B D2_Blip .. _TYPE3B: TYPE3B ...... TYPE3B [*string*] D3_NAME *Max number of values*: 1 *Description*: Dummy. See INIT3B_. .. _TYPEMB: TYPEMB ...... TYPEMB [*string*] DM_NAME *Max number of values*: 1 *Description*: Type of a many-body descriptor to be used. Every many-body descriptor inherits from :cpp:class:`DM_Base`. *Example*: 1 :: TYPEMB DM_EAD .. _VERBOSE: VERBOSE ....... VERBOSE [*int*] N *Max number of values*: 1 *Default*: 0 *Description*: Set verbosity level.For :code:`N>0` it provides detailed output for diagnostic purposes. *Example*: 1 :: VERBOSE 1 .. _WEIGHTS: WEIGHTS ....... WEIGHTS [*double*] N1 N2 N3 ... *Max number of values*: 2147483647 *Description*: The machine learned coefficients for the model. *Example*: 1 :: WEIGHTS 0.12 1.2 0.3