Data Management

Use Data Management to record projects, model versions, data paths, and notes. It neither generates structures nor trains models. Its purpose is to preserve provenance when several DFT rounds, nep.txt files, candidate pools, and training results accumulate.

When to use it

Start recording versions when any of the following applies:

The same material system has gone through several active-learning iterations.
Each iteration has its own candidate structures, cleaned structures, DFT results, trained model, and test results.
You need to compare the errors, notes, and applicability of different nep.txt files.
You often have to rediscover which directory contains a previously successful model.

A quick one-off card experiment does not necessarily need a project record.

Recommended record structure

Create a Project for a material system or research topic, then record each model iteration as a Model(version).

A typical record looks like this:

::

Record at least the following for each version:

The model or results directory.
The source of the associated training set.
A short note describing which structures were added, which anomalies were removed, and where the model is intended to be used.
Tags such as surface, defect, magnetic, failed, or baseline.

Place in the main workflow

Use Data Management at the end of a meaningful iteration:

::

This keeps records at meaningful version boundaries instead of creating an entry for every temporary file.

Common operations

Create, edit, or delete a Project.
Create, edit, or delete a Model(version).
Open a local directory or URL.
Press Ctrl+F to search projects, tags, notes, or paths.

After you select a NEP results directory, the application attempts to calculate training RMSE from energy_train.out, force_train.out, and virial_train.out. If an output is missing or its columns are incomplete, the dialog identifies that file and preserves the current manual value instead of treating the missing result as zero error.

Storage location

The local database is stored in the user configuration directory by default:

Windows：C:\Users\<You>\AppData\Local\NepTrainKit\mlpman.db
Linux：~/.config/NepTrainKit/mlpman.db

The database stores management metadata only; it does not copy large training datasets. If a training directory moves, update its path in the corresponding record.