NEP Dataset Display
1.User Interface
As shown in the figure, the overall user interface of the software primarily includes the toolbar, result visualization area, structure display area, information display area, and working path display area.

2.Data transmission area
Data Import
Users can import files in the following two ways:
Click the import button located at the top left of the menu
to import the file path.
Drag and drop the file directly into the software interface for import.
Important
software will automatically detect the nep.txt file type and import it, including the norm、dipole and polarizability.
Currently, the software supports:
train.xyzand matching*.outfilesnep.txt(optional; falls back to NEP89 if absent) +train.xyzDeepMD training directory (auto‑detected)
Data Export
After completing the operation, the user can click the save button
to export the results as two files:
export_remove_model.xyz: Contains information about the deleted structures.
export_good_model.xyz: Contains information about the remaining structures. In the export menu, you can click “Export Selected Structures” to export the currently selected structures.
3.Toolbar
Results Toolbar (with icons)
-
Input: index expression; supports Python-like slices
1:10,:100,::3and lists1, 5, 9.Toggle: “Use original indices” to interpret expression in original file order; otherwise uses current view indices mapped back to originals.
-
Input:
x_min,x_max,y_min,y_max, and logicAND/ORto combine masks.Uses the currently active subplot (e.g., energy, force, descriptor projection).
-
Input:
N(count of structures).Behavior: on the active subplot, computes per-structure error (sum of absolute residuals) and selects top-
Nunique structures.
-
Input:
Max num,Min distance(in descriptor space, unitless).Behavior: computes descriptors then runs farthest-point sampling; selects retained structures.
Mouse Selection: Toggle selection mode; left-drag to select, right-click to deselect.
-
Parameter: uses Settings → “Covalent radius coefficient” to detect too-short bonds.
Behavior: scans all visible structures; selects those below threshold.
Delete Selected Items: Remove selected structures from the current set.
-
Input: add key/value pairs or remove existing keys on all selected structures.
Values: numbers or JSON (e.g., arrays) are parsed automatically; otherwise treated as text.
-
Output: TSV with selected descriptors; when available, appends the reference energy column.
-
Inputs: group regex patterns (
;or,separated), Max generations, Population size, Convergence tol, Alignment mode.Modes:
REF_GROUP: shift each regex group toward the mean energy of the currently selected reference structures.
ZERO_BASELINE: shift each group toward zero energy.
DFT_TO_NEP: shift toward NEP energies (requires loaded NEP energy array).
-
Inputs: Functional (e.g., pbe, scan, b3lyp), D3 cutoff, D3 cutoff_cn, Mode.
Modes:
NEP Only: overwrite with NEP predictions.
DFT-D3 only: overwrite with D3 corrections (diagnostic mode).
NEP with DFT-D3: overwrite with NEP + D3.
Add DFT-D3 / Subtract DFT-D3: add/subtract D3 corrections to current values.
Structure Toolbar (with icons)
Search and Select (with icons)
Important
After clicking Search, matched structures turn green but are not selected. Use Select/Deselect to change selection state.
Search by formula: toggle the small “formula” checkbox next to the search box to switch between searching by
Config_type(default) and by chemical formula. Autocompletion updates to your available formulas.
Important
After clicking the search button, the relevant structures will turn green to indicate the search results. However, at this point, the structures will not be selected. You will need to perform additional actions to complete the selection.
Important
If there are no Config_type available, clicking the Select button will select all visible items, and clicking the Deselect button will deselect all items.
Search by formula
Toggle the small “formula” checkbox next to the search box to switch between searching by
Config_type(default) and searching by chemical formula.When formula mode is enabled, autocompletion updates to your available formulas.
4.Result Visualization and Structure Display
The result visualization area consists of five subplots, displaying the descriptors, energy, force, pressure, and potential energy information of the dataset. We use the pyqtgraph library to encapsulate the plotting functions, and all five subplots support switching to the main plot by double-clicking.
By clicking on a data point in the main plot, the corresponding crystal structure will be displayed in the right-side display area. The atom sizes and colors in the crystal structure are set based on the atomic radius and the CPK color scheme, respectively.
Plotting Details: During the plotting process, energy, force, pressure, and potential energy data are all read from the NEP output files in the working path. For descriptor projection, we use NEP_CPU to obtain the descriptor of each atom and compute its average as the structure descriptor. Then, principal component analysis (PCA) is used to project the structure descriptors into a two-dimensional space for easier visualization.
5.Information Display Area
In the information display area on the right, the system will show detailed information about the selected xyz file. By default, when you click on any data point in the subplots, the display area on the right will synchronize and show the detailed information of the selected structure.
Below the display area, the frame number of the current structure in the original file will be shown. Users can adjust the frame number to view the corresponding structure and its detailed information.
Auto playback
Use the play/pause button near the index to automatically step through currently visible/selected structures.
Playback respects your current selection and stops at the end of the range.
Appendix: Toolbar Reference
The display includes two toolbars: the Results toolbar (left panel) and the Structure toolbar (right panel). Tools act on the active/main plot unless otherwise noted.
