API Examples

Example 1 - Traininig Process and Simple Prediction

This example shows how to perform training and how to predict with a trained model.

To compile with g++ and run:

$ g++ -std=c++17 -O3 ex1.cpp -o ex1.out -ltadah -fopenmp
$ ./ex1.out

Ta-dah! models and descriptors are selected at compile time but all model parameters are provided in the config file.

Here we use Blip2B two-body descriptor and train using both energies and virial stresses.

Files:

  • ex1.cpp Example c++ script for training and prediction.

  • config Config file used for training, contains all model parameters.

  • config_pred List of datasets used for prediction (DBFILE key). Keys FORCE and STRESS controls whether forces and stresses are predicted.

  • tdata.db Dataset which we will use for both training and prediction. The dataset is generated using EAM model for Ta by R.Ravelo. https://journals.aps.org/prb/abstract/10.1103/PhysRevB.88.134101

Functions

int main()

Example 1 c++ file:

#include <tadah/cutoffs/cut_all.h>
#include <tadah/descriptors/d_all.h>
#include <tadah/models/m_all.h>
#include <tadah/basis_functions/bf_all.h>
#include <tadah/kernels/kern_all.h>
#include <tadah/config.h>
#include <tadah/structure.h>
#include <tadah/descriptors_calc.h>
#include <tadah/nn_finder.h>
#include <tadah/output/output.h>
#include <fstream>

/**  @file ex1.cpp
 * This example shows how to perform training and
 * how to predict with a trained model.
 *
 * To compile with `g++` and run:
 *
 * \code{.sh}
 *  $ g++ -std=c++17 -O3 ex1.cpp -o ex1.out -ltadah -fopenmp
 *  $ ./ex1.out
 * \endcode
 *
 * Ta-dah! models and descriptors are selected at compile time but all model
 * parameters are provided in the config file.
 *
 * Here we use Blip2B two-body descriptor and train using both energies
 * and virial stresses.
 *
 * Files:
 *
 *   - `ex1.cpp`
 *      Example c++ script for training and prediction.
 *   - `config`
 *      Config file used for training, contains all model parameters.
 *   - `config_pred` 
 *      List of datasets used for prediction (\ref DBFILE key).
 *      Keys \ref FORCE and \ref STRESS controls whether forces and stresses
 *      are predicted.
 *   - `tdata.db`
 *      Dataset which we will use for both training and prediction.
 *      The dataset is generated using EAM model for Ta by R.Ravelo.
 *      https://journals.aps.org/prb/abstract/10.1103/PhysRevB.88.134101
 */
int main() {
    
    std::cout << "TRAINING STAGE" << std::endl;
    // Config file configures almost all model parameters.
    // See below for a more detailed explanation of used key-value(s) pairs.
	Config config("config");

    // First we load all training data from a list
    // of training datasets into StrutureDB object.
    // Paths to datasets are specified with a key DBFILE in a config file.
    std::cout << "StructureDB loading data..." << std::flush;
    StructureDB stdb(config);
    std::cout << "Done!" << std::endl;

    // Next we pass StructureDB object to the nearest neighbour calculator.
    // NNFinder will create full nearest neighbours lists for every atom
    // in every structure. These lists will be stored by individual Structures
    // in a StructureDB object.
    // The lists are calculated up to the max cutoff from the config file:
    // cutoff_max = max(RCUT2B, RCUT3B, RCUTMB).
    std::cout << "Calculating nearest neighbours..." << std::flush;
    NNFinder nnf(config);
    nnf.calc(stdb);
    std::cout << "Done!" << std::endl;

    // STEP 1a: Select descriptors.
    // All three types must be specified.
    // Use Dummy if given type is not required.

    // D2 - TWO-BODY
    //using D2=D2_LJ;
    //using D2=D2_BP;
    using D2=D2_Blip;
    //using D2=D2_Dummy;
    //using D2=D2_EAM;

    // D3 - THREE-BODY
    using D3=D3_Dummy;

    // DM - MANY-BODY
    //using DM=DM_EAM;
    //using DM=DM_EAD;
    using DM=DM_Dummy;

    // STEP 1b: Select cutoffs, C2 for D2, etc
    using C2=Cut_Cos;
    using C3=Cut_Dummy;
    using CM=Cut_Dummy;

    // STEP 1c: Prepare descriptor calculator
    DescriptorsCalc<D2,D3,DM,C2,C3,CM> dc(config);

    // STEP 2a: Select Basis Function (BF) or Kernels (K).
    // BF is used for M_BLR - Bayesian Linear Regression
    // K is used with M_KRR - Kernel Ridge Regression
    // See documentation for more BF and K
    using BF=BF_Linear;
    //using BF=BF_Polynomial2;
    //using K=Kern_Linear;
    //using K=Kern_Quadratic;

    // STEP 2b: Select Model
    using M=M_BLR<BF>;
    //using M=M_KRR<K>;

    // STEP 2c: Instantiate a model
    M model(config);

    std::cout << "TRAINING STAGE..." << std::flush;

    // STEP 3: Training - Option 1.
    // Train with StructureDB only. We have to provide calculators here.
    // Descriptors are calculated in batches to construct a design matrix
    // and then are discarded.
    // This is usually the best choice unless you need descriptors for something else
    // after the training is done.
    model.train(stdb,dc);

    // STEP 3: Training - Option 2.
    // Train with StructureDB and precalcualted StDescriptorsDB.
    //StDescriptorsDB st_desc_db = dc.calc(stdb);
    //model.train(st_desc_db,stdb);
    std::cout << "Done!" << std::endl;

    // STEP 4: Save model to a text file.
    // Once model is trained we can dump it to a file. 
    // Saved models can be used with LAMMPS or can be reloaded
    // to make predictions.
    std::cout << "Saving LAMMPS pot.tadah file..." << std::flush;
    Config param_file = model.get_param_file();
    std::ofstream outfile("pot.tadah");
    outfile << param_file << std::endl;
    outfile.close();
    std::cout << "Done!" << std::endl;

    std::cout << "PREDICTION STAGE..." << std::endl;
    // STEP 1: We will reuse LAMMPS param file and add to it
    // DBFILE(s) from config_pred file.
    // In other words training datasets go to the config file
    // and validation datasets are in the config_pred
    param_file.add("config_pred");

    // STEP 2: Load DBFILE from config_pred
    std::cout << "StructureDB loading data..." << std::flush;
    StructureDB stdb2(param_file);
    std::cout << "Done!" << std::endl;

    // STEP 3: Calculate nearest neighbours
    std::cout << "Calculating nearest neighbours..." << std::flush;
    NNFinder nnf2(param_file);
    nnf2.calc(stdb2);
    std::cout << "Done!" << std::endl;

    // STEP 4: Prepare DescriptorCalc
    DescriptorsCalc<D2,D3,DM,C2,C3,CM> dc2(param_file);

    // STEP 5: Results are saved to new StructureDB object 
    // - it will only contain predicted values
    // so there are no atom positions, etc...

    bool err_bool=false;       // predict error, requires LAMBDA -1
    t_type predicted_error;    // container for prediction error
    std::cout << "Predicting..." << std::flush;
    StructureDB stpred = model.predict(param_file,stdb2,dc2);
    //StructureDB stpred = model.predict(param_file,stdb2,dc2,predicted_error);
    std::cout << "Done!" << std::endl;

    std::cout << "Dumping results to disk..." << std::flush;
    Output output(param_file,err_bool);
    output.print_predict_all(stdb,stpred,predicted_error);
    std::cout << "Done!" << std::endl;

    return 0;
}

Config file used for training:

# For description of KEYS and corresponding values  see Config documentation:
# https://ta-dah.readthedocs.io/en/latest/config.html

DBFILE tdata.db

INIT2B true
INIT3B false
INITMB false

RCUT2B 5.3

FORCE false
STRESS true

SGRID2B -2 4 0.1 1.0
CGRID2B -1 4 1.0 5.3

LAMBDA 0
BIAS true
NORM true
VERBOSE 1

EWEIGHT 1.0
#FWEIGHT 1e-2
#SWEIGHT 1e-3

Config file used for prediction:

DBFILE tdata.db
FORCE true
Stress true

Example 2 - Prediction using existing model

This example shows how to predict with a trained model. Example model is provided in a pot.tadah file.

To compile with g++ and run:

$ g++ -std=c++17 -O3 ex2.cpp -o ex2.out -ltadah -fopenmp
$ ./ex2.out

Ta-dah! models and descriptors are selected at compile time but all model parameters are provided in the pot.tadah file. Model, cutoff and descriptors in the ex2.cpp file must match those in the pot.tadah file. See code comment bellow for more detail.

Files:

  • ex2.cpp Example c++ script for prediction using already available model.

  • config_pred List of datasets used for prediction (DBFILE key). Keys FORCE and STRESS controls whether forces and stresses are predicted.

  • tdata.db Dataset which we will use for prediction. The dataset is generated using EAM model for Ta by R.Ravelo. https://journals.aps.org/prb/abstract/10.1103/PhysRevB.88.134101

Functions

int main()

Example 2 c++ file:

#include <tadah/cutoffs/cut_all.h>
#include <tadah/descriptors/d_all.h>
#include <tadah/models/m_all.h>
#include <tadah/descriptors_calc.h>
#include <tadah/config.h>
#include <tadah/structure.h>
#include <tadah/nn_finder.h>
#include <tadah/output/output.h>
#include <fstream>

/**  @file ex2.cpp
 * This example shows how to predict with a trained model.
 * Example model is provided in a `pot.tadah` file.
 *
 * To compile with `g++` and run:
 *
 * \code{.sh}
 *  $ g++ -std=c++17 -O3 ex2.cpp -o ex2.out -ltadah -fopenmp
 *  $ ./ex2.out
 * \endcode
 *
 * Ta-dah! models and descriptors are selected at compile time but all model
 * parameters are provided in the `pot.tadah` file. Model, cutoff and descriptors
 * in the `ex2.cpp` file must match those in the `pot.tadah` file.
 * See code comment bellow for more detail.
 *
 * Files:
 *
 *   - `ex2.cpp`
 *      Example c++ script for prediction using already available model.
 *   - `config_pred` 
 *      List of datasets used for prediction (\ref DBFILE key).
 *      Keys \ref FORCE and \ref STRESS controls whether forces and stresses
 *      are predicted.
 *   - `tdata.db`
 *      Dataset which we will use for prediction.
 *      The dataset is generated using EAM model for Ta by R.Ravelo.
 *      https://journals.aps.org/prb/abstract/10.1103/PhysRevB.88.134101
 */
int main() {
    
    // STEP 0: Load model saved in a `pot.tadah` as a Config object.
    Config param_file("pot.tadah");

    // STEP 1a: Select descriptors. All three types must be specified.
    // Use Dummy if given type is not required.
    // Look for keywords `TYPE2B` `TYPE3B` and `TYPEMB` in a `pot.tadah`
    // If keyword is not listed use `D2_Dummy` as a descriptor.

    // D2 - TWO-BODY
    // `pot.tadah`: TYPE2B      D2_Blip
    using D2=D2_Blip;

    // D3 - THREE-BODY
    // `pot.tadah` no keyword
    using D3=D3_Dummy;

    // DM - MANY-BODY
    // `pot.tadah` no keyword
    using DM=DM_Dummy;

    // STEP 2b: Select cutoffs for descriptors, C2 for D2, etc
    // Look for keywords `RCTYPE2B` `RCTYPE3B` and `RCTYPEMB` in the `pot.tadah`
    // If keyword is not listed use `Cut_Dummy`.
    // `pot.tadah`: RCTYPE2B      Cut_Cos
    using C2=Cut_Cos;
    // `pot.tadah` no keywords ofr three-body and many-body
    using C3=Cut_Dummy;
    using CM=Cut_Dummy;

    // STEP 2a: Select Basis Function (BF) or Kernels (K).
    // BF is used for M_BLR - Bayesian Linear Regression
    // K is used with M_KRR - Kernel Ridge Regression
    // KEYWORD `MODEL`: first argument is model, second BF/Kernel
    // `pot.tadah`: MODEL     M_KRR     Kern_Linear
    using K=Kern_Linear;

    // STEP 2b: Select Model and instantiate object.
    // `pot.tadah`: MODEL     M_KRR     Kern_Linear
    using M=M_KRR<K>;
    M model(param_file);

    std::cout << "PREDICTION STAGE" << std::endl;
    // We will reuse param_file Config file and add to it
    // DBFILE(s) from config_pred file.
    // config_pred contain 
    // As we are reusing existing config we have
    // to remove FORCE and STRESS keys first
    param_file.remove("FORCE");
    param_file.remove("STRESS");
    param_file.add("config_pred");

    // Load DBFILE from config_pred
    std::cout << "StructureDB loading data..." << std::flush;
    StructureDB stdb(param_file);
    std::cout << "Done!" << std::endl;

    // Calculate nearest neighbours
    std::cout << "Calculating nearest neighbours..." << std::flush;
    NNFinder nnf2(param_file);
    nnf2.calc(stdb);
    std::cout << "Done!" << std::endl;

    // Calculate descriptors and store them in StDescriptorsDB
    //std::cout << "Calculating descriptors..." << std::flush;
    DescriptorsCalc<D2,D3,DM,C2,C3,CM> dc2(param_file);
    //StDescriptorsDB st_desc_db = dc2.calc(stdb);
    //std::cout << "Done!" << std::endl;

    // open file streams for energy and force prediction
    std::ofstream out_force("forces.pred");
    std::ofstream out_energy("energy.pred");
    std::ofstream out_stress("stress.pred");

    // predict energies (and forces if FORCE true). Result is saved
    // to new StructureDB object - it will only contain predicted values
    // so there are no atom positions, etc...
    //t_type pred_err;    // TODO dump it ...
    ////StructureDB stpred = model.predict(param_file, stdb,dc2,pred_err);
    //std::cout << "Predicting..." << std::flush;
    //StructureDB stpred = model.predict(param_file,st_desc_db, stdb);
    //std::cout << "Done!" << std::endl;

    bool err_bool=false;       // predict error, requires LAMBDA -1
    t_type predicted_error;    // container for prediction error
    std::cout << "Predicting..." << std::flush;
    StructureDB stpred = model.predict(param_file,stdb,dc2);
    //StructureDB stpred = model.predict(param_file,stdb,dc2,predicted_error);
    std::cout << "Done!" << std::endl;

    std::cout << "Dumping results to disk..." << std::flush;
    Output output(param_file,err_bool);
    output.print_predict_all(stdb,stpred,predicted_error);
    std::cout << "Done!" << std::endl;

    return 0;
}

Trained model used for prediction:

ALPHA      1.0
BETA      1.0
BIAS      true
CGRID2B     -1     4     1.0     5.3
CHECKPRESS      false
DIMER     false     0     false
EWEIGHT      1.0
FWEIGHT      1.0
INIT2B      true
INIT3B      false
INITMB      false
LAMBDA      0
MODEL     M_BLR     BF_Linear
NMEAN     1     194.0239012     198.413198     167.8651188     8.242704556
NORM      true
NSTDEV     0     2.79425471     2.914508228     1.43784523     0.3225202259
OUTPREC      10
RCTYPE2B      Cut_Cos
RCUT2B      5.3
SGRID2B     -2     4     0.1     1.0
SWEIGHT      1.0
TYPE2B      D2_Blip
WEIGHTS     -4.415741551     0.4218953511     -0.3750771288     -0.06328812171     -0.05230011264

Config file used for prediction:

DBFILE tdata.db
FORCE true
STRESS true