API Reference

The API pages below are generated by automodule. Source docstrings are shown in their original English.

High-level data structures and helpers for NEP datasets.

class NepTrainKit.core.structure.Structure(lattice, atomic_properties, properties, additional_fields)

Bases: object

Container for EXTXYZ frames (lattice, positions, species, and fields).

Notes

Coordinates are stored in Cartesian Angstroms under pos.
Frame-level attributes like energy, pbc, and virial live in additional_fields.

Examples

>>> from NepTrainKit.core.structure import Structure
# read structure from file by iterating over it
>>> for structure in  Structure.iter_read_multiple(filename="train.xyz"):
...     print(structure)
# read structure from file
>>> structure_list = Structure.read_multiple(filename="train.xyz")

Parameters:

lattice (list[float] | npt.NDArray[np.float64])
atomic_properties (dict[str, npt.NDArray[Any]])
properties (list[dict[str, str]])
additional_fields (dict[str, Any])

property tag: str: Alias for the Config_type additional field.

get_prop_key(additional_fields=True, atomic_properties=True)

List all property keys available on the structure.

Parameters:

additional_fields (bool, default=True) – Include keys from additional_fields.
atomic_properties (bool, default=True) – Include keys from atomic_properties.

Returns:

Combined keys such as [“pos”, “energy”, …].

Return type:

list of str

remove_atomic_properties(key)

Remove an atomic array property.

Parameters:: key (str) – Name of the property to delete.

classmethod read_xyz(filename)

Read a single EXTXYZ structure from a file path.

Parameters:: filename (str) – Path to an .xyz file containing a single frame.
Returns:: Parsed structure instance.
Return type:: Structure

static iter_read_multiple(filename, cancel_event=None)

Iterate frames in a multi-structure EXTXYZ file.

Parameters:

filename (str) – Path to a multi-frame .xyz file.
cancel_event (threading.Event or None, optional) – If provided and is_set(), stop early.

Yields:

Structure – Parsed structures one by one.

Examples

>>> from NepTrainKit.core.structure import Structure
>>> for structure in  Structure.iter_read_multiple(filename="train.xyz"):
...     print(structure)

property cell

Simulation cell lattice vectors.

Returns:: Row-wise lattice vectors [a, b, c].
Return type:: ndarray, shape (3, 3)

property volume

Cell volume.

Returns:: Volume of the simulation cell.
Return type:: float

property abc

Lattice vector lengths (a, b, c).

Returns:: Lengths of the three lattice vectors in Å.
Return type:: ndarray, shape (3,), dtype float

property angles

Lattice angles (alpha, beta, gamma) in degrees.

Returns:: Angles α, β, γ in degrees.
Return type:: ndarray, shape (3,), dtype float

property numbers

Atomic numbers of all atoms in the cell.

Returns:: List of atomic numbers in the same order as elements.
Return type:: list[int]

property spin_num: int

Number of atoms with non-zero magnetic moment.

Returns:: Count of atoms whose force_mag entry is not [0, 0, 0]. Returns 0 if force_mag is absent.
Return type:: int

property formula

Chemical formula string (plain text).

Returns:: Formula like H2O, Fe3O4 without sub-scripts.
Return type:: str

property html_formula: str

Chemical formula string with HTML sub-scripts.

Returns:: Formula like H<sub>2</sub>O for direct HTML rendering.
Return type:: str

property per_atom_energy

Energy per atom.

Returns:: Total energy divided by the number of atoms (same units as energy).
Return type:: float

property energy

Total energy of the structure.

Returns:: Value stored in additional_fields['energy'].
Return type:: float

property has_energy

Check if energy or stress data is available.

Returns:: True if additional_fields contains energy .
Return type:: bool

property forces

Per-atom force array.

Returns:: Forces in eV/Å for each atom.
Return type:: ndarray, shape (N, 3), dtype float64

property has_forces

Check if forces or stress data is available.

Returns:: True if atomic_properties contains self.force_label.
Return type:: bool

property bec: NDArray[float64]: Per-atom Born effective charge tensor as (N, 9).

property has_bec: bool: Return True when BEC data is present.

property has_virial

Check if virial or stress data is available.

Returns:: True if additional_fields contains virial or stress.
Return type:: bool

property virial

Virial vector (flattened).

If only stress is present, convert via \(\mathrm{virial} = -\mathrm{stress} \times V\).

Returns:: Flattened virial in eV; ordering: [xx, xy, xz, yx, yy, yz, zx, zy, zz].
Return type:: ndarray, shape (9,), dtype float
Raises:: ValueError – If neither virial nor stress is available.

property nep_virial

Virial in NEP 6-component order per atom.

Returns:: [xx, yy, zz, xy, yz, zx] components in eV/atom.
Return type:: ndarray, shape (6,), dtype float

property nep_dipole

Dipole moment per atom in NEP format.

Returns:: Dipole vector in e·Å/atom, parsed from additional_fields['dipole'].
Return type:: ndarray, shape (3,), dtype float64

property nep_polarizability

Polarizability tensor per atom in NEP 6-component order.

Returns:: [xx, yy, zz, xy, yz, zx] components in Å³/atom.
Return type:: ndarray, shape (6,), dtype float64

get_chemical_symbols()

Return chemical symbols for all atoms.

Returns:: Same as elements.
Return type:: list[str]

property elements

Chemical symbols of all atoms.

Returns:: Symbol for each atom.
Return type:: ndarray, shape (N,), dtype str

property positions

Cartesian coordinates of all atoms.

Returns:: Positions in Å.
Return type:: ndarray, shape (N, 3), dtype float

property num_atoms

Number of atoms in the structure.

Return type:: int

copy()

Return a deep copy of the structure and all its arrays.

Returns:: Independent duplicate of the current instance.
Return type:: Structure

set_lattice(new_lattice, in_place=False)

Scale positions to a new lattice and update pos.

Parameters:

new_lattice (numpy.ndarray) – New lattice matrix in Angstroms with shape (3, 3).
in_place (bool, default=False) – If True, modify this object; otherwise return a copy.

Returns:

Updated structure (self if in_place=True).

Return type:

Structure

supercell(scale_factor, order='atom-major', tol=1e-05)

Return type:: Structure

adjust_reasonable(coefficient=0.7)

Check whether the structure is physically reasonable based on covalent radii.

For each pair of nearest-neighbour atoms, the actual bond length is compared with coefficient * (R_cov1 + R_cov2). If any bond is shorter than this threshold, the structure is considered non-physical.

Parameters:: coefficient (float, optional) – Scaling factor for the sum of covalent radii. Default is 0.7.
Returns:: True if the structure passes the check, False if any bond is unphysically short.
Return type:: bool

classmethod parse_xyz(lines)

Parse an extended XYZ block into a Structure instance.

Parameters:: lines (list[str] or str) – Raw XYZ content. If a single string is provided it is split on newlines internally.
Returns:: New object with lattice, atomic properties and global metadata extracted from the comment line.
Return type:: Structure

Examples

>>> xyz = '''2
... Lattice="4.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 4.0" Properties=species:S:1:pos:R:3
... H 0.0 0.0 0.0
... H 1.0 0.0 0.0'''
>>> struct = Structure.parse_xyz(xyz)
>>> struct.num_atoms
2

classmethod read_multiple(filename)

Read a multi-structure XYZ file and return a list of Structure objects.

Parameters:: filename (str or os.PathLike) – Path to a multi-frame .xyz file.
Returns:: List of Structure instances, one per frame.
Return type:: list[Structure]

Examples

>>> from NepTrainKit.core.structure import Structure
>>> structure_list = Structure.read_multiple("train.xyz")
>>> len(structure_list)
42

classmethod read_multiple_fast(filename, max_workers=None, **kwargs)

High-performance multi-frame EXTXYZ reader backed by a C++ parser.

This uses a pybind11 extension (NepTrainKit._native._io) and Python mmap to index and parse frames in native code, then constructs Structure objects. Falls back to read_multiple on error or if the extension is unavailable.

Parameters:

filename (str)
max_workers (int | None)

write(file, *, atomic_float_digits=None)

Write the structure as an EXTXYZ frame to a file-like object.

Parameters:

file (IO) – Open text stream supporting write().
atomic_float_digits (int, optional) – Significant digits for per-atom floating-point fields.

get_all_distances()

Compute all-pairs distances using periodic minimum image.

Returns:: Matrix of shape (N, N) with distances.
Return type:: numpy.ndarray

get_mini_distance_info()

Minimum interatomic distance for each element pair.

Returns:: Map from element pair (A, B) with A <= B to the minimal distance across the structure.
Return type:: dict[tuple[str, str], float]

get_bond_pairs()

Likely bonded pairs using a covalent-radii heuristic.

Returns:: Upper-triangular pairs where distance < 1.15 * (r_i + r_j).
Return type:: list[tuple[int, int]]

get_bad_bond_pairs(coefficient=0.8, max_atoms=None)

Pairs that violate a short-bond threshold.

Parameters:

coefficient (float, default=0.8) – Threshold relative to the sum of covalent radii.
max_atoms (int or None, default=None) – Skip the Python fallback when the structure exceeds this size. Native scans are not size-limited. None keeps the complete fallback behavior.

Returns:

Upper-triangular pairs shorter than the threshold.

Return type:

list[tuple[int, int]]

NepTrainKit.core.structure.calculate_pairwise_distances(lattice_params, atom_coords, fractional=True, block_size=2048)

All-pairs distances under periodic minimum-image convention.

This implementation is robust for triclinic (skewed) cells. It first reduces fractional deltas into [-0.5, 0.5) and then checks the 26 neighboring image shifts to ensure the true shortest image vector is selected under the lattice metric.

Parameters:

lattice_params (numpy.ndarray) – Lattice matrix with shape (3, 3). Row-wise lattice vectors [a, b, c].
atom_coords (numpy.ndarray) – Coordinates with shape (N, 3).
fractional (bool, default=True) – If True, atom_coords are fractional; otherwise Cartesian.
block_size (int, default=2048) – Row-block size to balance memory and speed for large N.

Returns:

Distance matrix of shape (N, N).

Return type:

numpy.ndarray

NepTrainKit.core.structure.is_organic_cluster(symbols)

Check whether the structure represents an organic molecular cluster.

Parameters:: symbols (list[str])
Return type:: bool

NepTrainKit.core.structure.get_vibration_modes(structure, min_frequency=0.0)

Extract vibrational modes stored in per-atom arrays on an ASE Atoms object.

Parameters:

structure (ase.Atoms) – Atomic structure that potentially carries vibrational mode information.
min_frequency (float, optional) – Absolute frequency threshold (in the same units as the stored data) used to filter out near-zero translational modes. Set to 0.0 to keep all provided modes. Defaults to 0.0.

Returns:

Pair of (frequencies, modes) where modes has shape (n_modes, n_atoms, 3). Missing frequencies are returned as nan. When no data is attached to the structure the function returns two empty arrays.

Return type:

tuple(ndarray, ndarray)

NepTrainKit.core.structure.get_clusters(structure)

Connected-atom clusters under ASE natural cutoffs.

Parameters:: structure (ase.Atoms) – ASE atoms object used for neighbor analysis.
Returns:: Cluster index lists and a boolean list marking organic clusters.
Return type:: tuple[list[list[int]], list[bool]]

NepTrainKit.core.structure.unwrap_molecule(structure, cluster_indices): Unwrap atoms in a molecular cluster back into the primary simulation cell.

NepTrainKit.core.structure.process_organic_clusters(structure, new_structure, clusters, is_organic_list)

Recenter and unwrap organic molecular clusters.

Parameters:

structure (ase.Atoms) – Original ASE atoms with the reference cell.
new_structure (ase.Atoms) – Target ASE atoms whose positions will be updated.
clusters (list[list[int]]) – Atom-index clusters from get_clusters().
is_organic_list (list[bool]) – Flags indicating organic clusters.

NepTrainKit.core.structure.load_npy_structure(folders, order_file=None, cancel_event=None, base_root=None)

Recursively load DeepMD datasets beneath folders.

Parameters:

folders (str | Path)
base_root (str | Path | None)

NepTrainKit.core.structure.get_type_map(structures)

Parameters:: structures (list[Structure])
Return type:: list[str]

NepTrainKit.core.structure.write_structures_extxyz_atomic(destination, structures, *, atomic_float_digits=None)

Write EXTXYZ frames without exposing a partially written destination.

Parameters:

destination (str | Path)
atomic_float_digits (int | None)

Return type:

None

NepTrainKit.core.structure.save_npy_structure(folder, structures, type_map=None)

Save structures to a DeepMD-style .npy dataset layout.

Parameters:

folder (PathLike) – Target root folder. One subfolder per Config_type is created.
structures (list[Structure]) – Structures to persist. Per-atom arrays are saved under set.000 and per-frame values under the config folder.
type_map (list[str])

class NepTrainKit.core.structure.FastStructure(lattice, atomic_properties, properties, additional_fields)

Bases: Structure

Structure subclass that uses a C++-accelerated parser for EXTXYZ IO.

Parameters:

lattice (list[float] | npt.NDArray[np.float64])
atomic_properties (dict[str, npt.NDArray[Any]])
properties (list[dict[str, str]])
additional_fields (dict[str, Any])

classmethod read_multiple(filename, max_workers=None)

Read a multi-structure XYZ file and return a list of Structure objects.

Parameters:

filename (str or os.PathLike) – Path to a multi-frame .xyz file.
max_workers (int | None)

Returns:

List of Structure instances, one per frame.

Return type:

list[Structure]

Examples

>>> from NepTrainKit.core.structure import Structure
>>> structure_list = Structure.read_multiple("train.xyz")
>>> len(structure_list)
42

classmethod iter_read_multiple(filename, max_workers=None)

Iterate frames in a multi-structure EXTXYZ file.

Parameters:

filename (str) – Path to a multi-frame .xyz file.
cancel_event (threading.Event or None, optional) – If provided and is_set(), stop early.
max_workers (int | None)

Yields:

Structure – Parsed structures one by one.

Examples

>>> from NepTrainKit.core.structure import Structure
>>> for structure in  Structure.iter_read_multiple(filename="train.xyz"):
...     print(structure)

NepTrainKit policy layer over the public nep_adapters interface.

class NepTrainKit.core.calculator.BackendSelection(requested, resolved, cuda_status, reason)

Bases: object

One immutable backend decision made while loading a model.

Parameters:

requested (NepBackend)
resolved (NepBackend)
cuda_status (None)
reason (str)

requested: NepBackend

resolved: NepBackend

cuda_status: None

reason: str

property summary: str

class NepTrainKit.core.calculator.NepCalculator(model_file='nep.txt', backend=None, chunk_max_atoms=None)

Bases: object

Select one backend, then expose typed NEP predictions to the application.

Parameters:

model_file (PathLike)
backend (NepBackend | str | None)
chunk_max_atoms (int | None)

model_info: ModelInfo

close()

Return type:: None

cancel()

Return type:: None

reset_cancel()

Return type:: None

predict(structures, *, progress=None)

Parameters:

structures (Iterable[Atoms] | Atoms)
progress (Callable[[int, int], None] | None)

Return type:

None

predict_with_descriptors(structures, *, mean=True, progress=None)

Return prediction and descriptors while sharing CPU descriptor work.

Parameters:

structures (Iterable[Atoms] | Atoms)
mean (bool)
progress (Callable[[int, int], None] | None)

Return type:

tuple[None, NDArray[float64]]

descriptors(structures, *, mean=True, progress=None)

Parameters:

structures (Iterable[Atoms] | Atoms)
mean (bool)
progress (Callable[[int, int], None] | None)

Return type:

NDArray[float64]

dipoles(structures)

Parameters:: structures (Iterable[Atoms] | Atoms)
Return type:: NDArray[float64]

polarizabilities(structures)

Parameters:: structures (Iterable[Atoms] | Atoms)
Return type:: NDArray[float64]

predict_dftd3(structures, *, functional, cutoff, cutoff_cn)

Parameters:

structures (Iterable[Atoms] | Atoms)
functional (str)
cutoff (float)
cutoff_cn (float)

Return type:

None

predict_with_dftd3(structures, *, functional, cutoff, cutoff_cn)

Parameters:

structures (Iterable[Atoms] | Atoms)
functional (str)
cutoff (float)
cutoff_cn (float)

Return type:

None

calculate_to_ase(atoms_list, calc_descriptor=False)

Parameters:

atoms_list (Atoms | Iterable[Atoms])
calc_descriptor (bool)

Return type:

None

NepTrainKit.core.calculator.Nep3Calculator: alias of NepCalculator

class NepTrainKit.core.calculator.NepAseCalculator(model_file='nep.txt', backend=None, chunk_max_atoms=None, *args, **kwargs)

Bases: Calculator

Parameters:

model_file (PathLike)
backend (NepBackend | str | None)
chunk_max_atoms (int | None)

implemented_properties: list[str] = ['energy', 'energies', 'forces', 'stress', 'descriptor', 'mforces']: Properties calculator can handle (energy, forces, …)

calculate(atoms=None, properties=('energy',), system_changes=['positions', 'numbers', 'cell', 'pbc', 'initial_charges', 'initial_magmoms'])

Do the calculation.

properties: list of str: List of what needs to be calculated. Can be any combination of ‘energy’, ‘forces’, ‘stress’, ‘dipole’, ‘charges’, ‘magmom’ and ‘magmoms’.
system_changes: list of str: List of what has changed since last calculation. Can be any combination of these six: ‘positions’, ‘numbers’, ‘cell’, ‘pbc’, ‘initial_charges’ and ‘initial_magmoms’.

Subclasses need to implement this, but can ignore properties and system_changes if they want. Calculated properties should be inserted into results dictionary like shown in this dummy example:

self.results = {'energy': 0.0,
                'forces': np.zeros((len(atoms), 3)),
                'stress': np.zeros(6),
                'dipole': np.zeros(3),
                'charges': np.zeros(len(atoms)),
                'magmom': 0.0,
                'magmoms': np.zeros(len(atoms))}

The subclass implementation should first call this implementation to set the atoms attribute and create any missing directories.

class NepTrainKit.core.io.ResultData(*args, **kwargs)

Bases: DistributionAnalysisMixin, QObject

Manage structures, descriptors, and plots for NEP result files. Subclasses implement _load_dataset() and expose their plot datasets through datasets. The class also centralises selection and synchronisation utilities shared by the GUI.

Parameters:

nep_txt_path (Path)
data_xyz_path (Path)
descriptor_path (Path)
calculator_factory (Callable[[str], Any] | None)
import_options (dict[str, Any] | None)

STRUCTURE_SYNC_RULES: dict[str, StructureSyncRule] = {}

FORCE_CPU_BACKEND = False

predictionStatusSignal: alias of str

atoms_num_list: NDArray

move_to_load_thread(thread)

Move this long-lived result object to a loader and remember its owner.

Parameters:: thread (PySide6.QtCore.QThread)
Return type:: None

request_cancel(): Request cooperative cancel during load. Also forward to calculator.

reset_cancel(): Clear the cancellation flag so future operations proceed.

load_structures(): Populate structure from disk or a prefetched cache. The method honours _prefetched_structures first; otherwise it delegates to the importer registry and honours import_options.

set_structures(structures)

Provide pre-parsed structures so load_structures can skip file IO.

Parameters:: structures (list[Structure])

has_completer_cache(search_type=None, max_items=50000)

Return True if a completer cache exists for search_type and max_items.

Parameters:

search_type (SearchType | str | None)
max_items (int)

Return type:

bool

ensure_completer_cache(search_type=None, max_items=50000)

Build and cache completer mappings for the requested search type.

Parameters:

search_type (SearchType | str | None)
max_items (int)

Return type:

None

get_completer_cache(search_type=None, max_items=50000)

Return cached completer mapping for search_type; builds it if needed.

Parameters:

search_type (SearchType | str | None)
max_items (int)

Return type:

dict[str, int]

search_config(config, search_type)

Return structure indices matching the selected search mode.

Parameters:

config (str)
search_type (SearchType)

Return type:

list[int]

search_config_tags(filter_spec, search_type)

Return structure indices matching a tag/formula filter spec.

Parameters:

filter_spec (dict)
search_type (SearchType)

Return type:

list[int]

search_structures(filter_spec): Evaluate a typed composite structure filter without changing selection.

sync_structures(fields=None, structure_indices=None)

Apply registered StructureSyncRule objects to datasets.

Parameters:

fields (Iterable[str] or str, optional) – Subset of rule names to apply. None means all registered rules.
structure_indices (Sequence[int], optional) – Visible structure indices affected by the update. None uses all active structures.

Return type:

None

write_prediction(): Create a nep.in stub when large datasets require prediction mode. The GUI expects a nep.in file to mark prediction runs for large (>1000) structure collections.

set_cache_outputs_override(enabled)

Override cache persistence for this result instance only.

Parameters:: enabled (bool | None)
Return type:: None

cache_outputs_enabled()

Return whether loader-generated cache files should be written.

Return type:: bool

load(): Load structures, descriptors, and dataset arrays in sequence. The routine instantiates a calculator (optionally via calculator_factory), parses structures, and then delegates to subclass hooks for descriptors and dataset-specific properties.

property datasets: list[NepPlotData]: Return the plot datasets exposed by the subclass.

property descriptor: Return the descriptor dataset prepared in _load_descriptors().

property num: Return the number of active structures in the dataset.

property structure: Return the StructureData wrapper for the active structures.

property abcs: NDArray[float32]: Return the cached lattice vector lengths (a, b, c) for all structures.

property angles: NDArray[float32]: Return the cached lattice angles (alpha, beta, gamma) for all structures.

get_reference_per_atom_energy_array(use_active=False)

Return reference per-atom energies as a flat float64 array.

Parameters:: use_active (bool)
Return type:: NDArray[float64]

get_predicted_per_atom_energy_array(use_active=False)

Return predicted per-atom energies as a flat float64 array.

Parameters:: use_active (bool)
Return type:: NDArray[float64]

is_select(i)

Return True if the structure index is marked as selected.

Parameters:: i (int)
Return type:: bool

property can_undo_selection: bool: Return whether a previous selection state is available.

clear_selection_history()

Drop stored selection undo states.

Return type:: None

undo_selection()

Restore the previous selection state.

Return type:: bool

select(indices)

Mark structures denoted by indices as selected.

Parameters:: indices (Sequence[int] | int)
Return type:: None

uncheck(indices)

Remove structures denoted by indices from the selection set.

Parameters:: indices (Sequence[int] | int)
Return type:: None

inverse_select()

Invert the current selection over the active structure set.

Return type:: None

apply_selection(indices, mode)

Apply one cached result to selection as a single undoable change.

Parameters:

indices (Iterable[int])
mode (str)

Return type:

bool

select_structures_by_index(index_expression, use_origin=True)

Resolve an index expression into raw structure indices.

Parameters:

index_expression (str)
use_origin (bool)

Return type:

list[int]

select_structures_by_range(dataset, x_min, x_max, y_min, y_max, use_and=True)

Return structure indices whose scatter positions fall in the given bounds.

Parameters:

dataset (NepPlotData)
x_min (float)
x_max (float)
y_min (float)
y_max (float)
use_and (bool)

Return type:

list[int]

select_structures_by_lattice_range(a_range, b_range, c_range, alpha_range, beta_range, gamma_range)

Return structure indices whose lattice parameters fall within the given ranges.

Uses a fixed tolerance of 1e-4 to handle floating-point precision loss from float32 storage of lattice vectors, independent of range size.

Parameters:

a_range (tuple[float, float])
b_range (tuple[float, float])
c_range (tuple[float, float])
alpha_range (tuple[float, float])
beta_range (tuple[float, float])
gamma_range (tuple[float, float])

Return type:

list[int]

get_selected_structures()

Return the selected structures in the order of their raw index.

Return type:: list[Structure]

export_selected_xyz(save_file_path)

Write the currently selected structures to save_file_path.

Parameters:: save_file_path (str | Path)
Return type:: None

export_selected_npy(save_path)

Export selected structures as a DeepMD-style deepmd/npy dataset.

Parameters:: save_path (str | Path)
Return type:: None

export_active_xyz(save_file_path)

Write active (non-removed) structures to save_file_path.

Parameters:: save_file_path (str | Path)
Return type:: None

export_active_npy(save_path)

Export active (non-removed) structures as a DeepMD-style deepmd/npy dataset.

Parameters:: save_path (str | Path)
Return type:: None

export_removed_xyz(save_file_path)

Write removed structures (if any) to save_file_path.

Parameters:: save_file_path (str | Path)
Return type:: None

export_removed_npy(save_path)

Export removed structures as a DeepMD-style deepmd/npy dataset.

Parameters:: save_path (str | Path)
Return type:: None

export_current_npy(save_path, index)

Export a single structure as DeepMD-style deepmd/npy dataset.

Parameters:

save_path (str | Path)
index (int)

Return type:

None

export_model_extxyz(save_path)

Export active and removed structures into save_path folder as extxyz.

Parameters:: save_path (str | Path)
Return type:: None

export_model_npy(save_path)

Export active and removed structures into save_path folder as deepmd/npy.

Parameters:: save_path (str | Path)
Return type:: None

export_model_xyz(save_path)

Export active and removed structures into save_path folder.

Parameters:: save_path (str | Path)
Return type:: None

get_atoms(index)

Return the ASE atoms object for the original index.

Parameters:: index (int)

remove(i)

Remove the structure i across all datasets.

Parameters:: i (int)
Return type:: None

property is_revoke: bool: Return True if any structures have been removed.

revoke()

Undo the most recent removal across structures and datasets.

Return type:: None

delete_selected(): Remove and clear all currently selected structures.

iter_non_physical_structure_indices(radius_coefficient)

Yield progress increments while collecting non-physical structures.

Parameters:: radius_coefficient (float)

consume_non_physical_structure_indices()

Return and clear indices collected by the non-physical scan.

Return type:: list[int]

iter_unbalanced_force_indices(threshold)

Yield progress units while collecting structures with non-zero net force.

Parameters:: threshold (float) – Minimum allowed magnitude of the summed force vector ΣF. Structures whose net force exceeds this value are recorded for later selection.

consume_unbalanced_force_indices()

Return and clear indices collected by the net-force scan.

Return type:: list[int]

sparse_descriptor_selection(n_samples, distance, restrict_to_selection=False)

Return FPS-selected structure indices and whether they should be deselected.

Parameters:

n_samples (int)
distance (float)
restrict_to_selection (bool)

Return type:

tuple[list[int], bool]

sparse_point_selection(n_samples, distance, descriptor_source='reduced', restrict_to_selection=False, training_path=None, sampling_mode='count', r2_threshold=0.9, selection_strategy='global')

Delegate sparse sampling to the sampler helper.

Parameters:

n_samples (int)
distance (float)
descriptor_source (str)
restrict_to_selection (bool)
training_path (str | None)
sampling_mode (str)
r2_threshold (float)
selection_strategy (str)

Return type:

tuple[list[int], bool]

export_descriptor_data(path)

Write descriptor values for the current selection to path.

Parameters:: path (str | Path)
Return type:: None

get_editable_structure_tags()

Return the editable tags for currently selected structures.

Return type:: set[str]

update_structure_metadata(remove_tags, new_tag_info, rename_map=None)

Apply metadata removals, additions, and key renames to the selected structures.

Parameters:

remove_tags (Iterable[str])
new_tag_info (Mapping[str, str])
rename_map (Mapping[str, str] | None)

Return type:

None

iter_shift_energy_baseline(group_patterns, alignment_mode, max_generations, population_size, convergence_tol, reference_indices=None, precomputed_baseline=None, baseline_store=None, source_summary=None)

Shift dataset energies and yield progress units for UI hooks.

Parameters:

group_patterns (Sequence[str])
alignment_mode (str)
max_generations (int)
population_size (int)
convergence_tol (float)
reference_indices (Sequence[int] | None)
baseline_store (dict | None)
source_summary (dict | None)

apply_dft_d3_correction(mode, functional, cutoff, cutoff_cn)

Apply DFT-D3 corrections and synchronise dependent datasets.

Parameters:

mode (int)
functional (str)
cutoff (float)
cutoff_cn (float)

Return type:

None

class NepTrainKit.core.io.StructureSyncRule(dataset_attr, target, collector, precondition=<function StructureSyncRule.<lambda>>, dtype=None)

Bases: object

Declarative instruction that synchronises structure attributes into datasets.

Parameters:

dataset_attr (str)
target (str | slice | Callable[[Any], Any])
collector (Callable[[ResultData, Any, ndarray | None], tuple[ndarray, NDArray[Any]]])
precondition (Callable[[ResultData], bool])
dtype (Any)

dataset_attr: str

target: str | slice | Callable[[Any], Any]

collector: Callable[[ResultData, Any, ndarray | None], tuple[ndarray, NDArray[Any]]]

precondition()

dtype: Any = None

apply(result_data, structure_indices=None)

Execute the rule on result_data if the precondition passes.

Parameters:

result_data (ResultData)
structure_indices (ndarray | None)

Return type:

None

class NepTrainKit.core.io.NepPlotData(data_list, **kwargs)

Bases: NepData

Two-column plot helper that separates NEP predictions from references.

Parameters:

data_list (Sequence[Any] | NDArray[Any])
kwargs (Any)

property x: NDArray[Any]: Flattened NEP predictions suitable for scatter plots.

property y: NDArray[Any]: Flattened reference values.

property structure_index: NDArray[int32]: Map each flattened point back to its parent structure index.

class NepTrainKit.core.io.StructureData(data_list, group_list=1, index_list=None, **kwargs)

Bases: NepData

Utility mixin for structure-level queries.

Parameters:

data_list (Sequence[Any] | NDArray[Any])
group_list (int | Sequence[int])
index_list (Sequence[int] | NDArray[Any] | None)
kwargs (Any)

geometry_snapshot(source_indices=None)

Return cached contiguous geometry for all or selected source rows.

Parameters:: source_indices (Sequence[int] | NDArray[int64] | None)
Return type:: GeometrySnapshot

cached_geometry_analysis(namespace, key, build): Cache a geometry-derived result for this immutable dataset.

has_completer_cache(search_type=None, max_items=50000)

Return True if a completer cache exists for search_type and max_items.

Parameters:

search_type (SearchType | str | None)
max_items (int)

Return type:

bool

ensure_completer_cache(max_items=50000)

Build and cache completer mappings for tag/formula/elements.

Notes

Designed to run in a background thread (e.g. dataset load thread).
Results are stored as dict[SearchType, dict[str,int]] and can be fed directly into ConfigTypeSearchLineEdit.setCompleterKeyWord(…).

Parameters:: max_items (int)
Return type:: None

get_element_count_cache(elements=None)

Return element count arrays aligned with active structures.

Parameters:: elements (set[str] | None)
Return type:: dict[str, NDArray[int32]]

get_completer_cache(search_type=None, max_items=50000)

Return cached completer mapping for search_type; builds it if needed.

Parameters:

search_type (SearchType | str | None)
max_items (int)

Return type:

dict[str, int]

get_all_config(search_type=None)

Return structure metadata used for filtering.

Parameters:: search_type (SearchType, optional) – Metadata selector. Defaults to SearchType.TAG.
Returns:: Value per active structure matching search_type.
Return type:: list[str]

search_config(config, search_type)

Return structure indices whose metadata match config.

Parameters:

config (str) – Regular expression used for matching.
search_type (SearchType) – Attribute family to inspect.

Returns:

Structure indices satisfying the pattern; empty on failure.

Return type:

list[int]

search_config_tags(filter_spec, search_type)

Return structure indices matching a tag/formula filter spec.

Uses simple substring matching (not regex) with group-based logic: groups are AND’d, conditions within a group use AND/OR per group mode.

Parameters:

filter_spec (dict) – Dictionary with groups (list of group dicts, each having conditions and mode keys).
search_type (SearchType) – One of SearchType.TAG or SearchType.FORMULA.

Returns:

Matching structure indices.

Return type:

list[int]

class NepTrainKit.core.io.NepTrainResultData(*args, **kwargs)

Bases: ResultData

Result loader for NEP training outputs with energy, force, stress, and virial datasets.

The loader normalises NEP predictions into plot-ready datasets and registers synchronisation rules used by the UI.

Examples

>>> from NepTrainKit.core.io import NepTrainResultData
# Load the xyz file
>>> result_dataset = NepTrainResultData.from_path(r"D:/Desktop/dataset3635-addD3/train.xyz")
>>> result_dataset.load()
>>> print(result_dataset)
# Select structures at indices 0 and 10
>>> result_dataset.select([0, 10])
>>> print(result_dataset)
# Delete the selected structures
>>> result_dataset.delete_selected()
>>> print(result_dataset)
# Get the indices of the 10 points with the largest energy error
>>> index = result_dataset.energy.get_max_error_index(10)
# Select the 10 points with the largest energy error and delete them
>>> result_dataset.select(index)
>>> result_dataset.delete_selected()
>>> print(result_dataset)
# Revoke the last deletion
>>> result_dataset.revoke()
# Perform farthest point sampling (normal global sampling)
>>> index, reverse = result_dataset.sparse_descriptor_selection(100, 0.001, False)
# Perform sampling within a region (select the first 300 structures)
>>> index = result_dataset.select_structures_by_index(":300")
>>> result_dataset.select(index)
>>> index, reverse = result_dataset.sparse_descriptor_selection(100, 0.001, True)
# Uncheck or inverse select based on the reverse flag
>>> if reverse:
>>>     result_dataset.uncheck(index)
>>> else:
>>>     result_dataset.select(index)
>>>     result_dataset.inverse_select()
>>> print(result_dataset)

Parameters:

nep_txt_path (Path | str)
data_xyz_path (Path | str)
energy_out_path (Path | str)
force_out_path (Path | str)
stress_out_path (Path | str)
virial_out_path (Path | str)
descriptor_path (Path | str)
charge_out_path (Path | str | None)
bec_out_path (Path | str | None)
charge_model (bool | None)
spin_force_out_path (Path | str | None)
spin_model (bool | None)

STRUCTURE_SYNC_RULES = {'energy': StructureSyncRule(dataset_attr='energy', target='x_cols', collector=<function collect_energy_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'force': StructureSyncRule(dataset_attr='force', target='x_cols', collector=<function collect_force_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'stress': StructureSyncRule(dataset_attr='stress', target='x_cols', collector=<function collect_stress_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'virial': StructureSyncRule(dataset_attr='virial', target='x_cols', collector=<function collect_virial_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None)}

property datasets: Return datasets exposed to the UI in display order.

property energy: Return the per-structure energy dataset.

property force: Return the force dataset respecting per-atom settings.

property stress: Return the stress dataset derived from predicted virials.

property virial: Return the per-structure virial dataset.

property bec: Return the per-atom Born effective charge dataset when available.

property spin_force: Return the magnetic force dataset when available.

classmethod from_path(path, model_type=0, *, structures=None, nep_txt_path=None, cache_outputs=None)

Create an instance from a NEP result directory.

Parameters:

path (PathLike) – Directory containing NEP outputs and descriptors.
model_type (int, optional) – NEP model type hint used to select descriptor fallbacks.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.
cache_outputs (bool, optional) – Per-result cache policy. False keeps predictions in memory.
nep_txt_path (Path | str | None)

Returns:

Configured loader bound to the resolved directory.

Return type:

NepTrainResultData

class NepTrainKit.core.io.NepPolarizabilityResultData(*args, **kwargs)

Bases: ResultData

Result loader for NEP polarizability evaluations.

Parameters:

nep_txt_path (Path | str)
data_xyz_path (Path | str)
polarizability_out_path (Path | str)
descriptor_path (Path | str)

FORCE_CPU_BACKEND = True

property datasets: Return the polarizability datasets in display order.

property polarizability_diagonal: Return the diagonal polarizability dataset.

property polarizability_no_diagonal: Return the off-diagonal polarizability dataset.

property descriptor: Return the descriptor dataset associated with the polarizability run.

classmethod from_path(path, *, structures=None)

Create a polarizability loader from a NEP dataset directory.

Parameters:

path (PathLike) – Directory containing NEP outputs.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.

Returns:

Configured loader bound to the resolved directory.

Return type:

NepPolarizabilityResultData

class NepTrainKit.core.io.NepDipoleResultData(*args, **kwargs)

Bases: ResultData

Result loader for NEP dipole predictions.

Parameters:

nep_txt_path (Path | str)
data_xyz_path (Path | str)
dipole_out_path (Path | str)
descriptor_path (Path | str)

FORCE_CPU_BACKEND = True

property datasets: Return the dipole datasets in display order.

property dipole: Return the dipole dataset.

property descriptor: Return the descriptor dataset associated with the dipole run.

classmethod from_path(path, *, structures=None)

Create a dipole loader from a NEP dataset directory.

Parameters:

path (PathLike) – Directory containing NEP outputs.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.

Returns:

Configured loader bound to the resolved directory.

Return type:

NepDipoleResultData

class NepTrainKit.core.io.TaceResultData(*args, **kwargs)

Bases: ResultData

Result loader for TACE EXTXYZ prediction files.

Parameters:

nep_txt_path (Path | str)
data_xyz_path (Path | str)
descriptor_path (Path | str)
import_options (dict[str, Any] | None)

property datasets: list[NepPlotData]: Return the plot datasets exposed by the subclass.

property energy: NepPlotData

property force: NepPlotData

property virial: NepPlotData

property mforce: NepPlotData

classmethod from_path(path, *, structures=None, nep_txt_path=None, import_options=None)

Parameters:

path (str | Path)
structures (list[Structure] | None)
nep_txt_path (Path | str | None)
import_options (dict[str, Any] | None)

Return type:

TaceResultData

class NepTrainKit.core.io.DeepmdResultData(*args, **kwargs)

Bases: ResultData

Result loader that adapts DeepMD outputs to the ResultData interface.

The loader reads DeepMD numpy exports, normalises them, and exposes consistent NEP plot datasets for downstream visualisation.

Parameters:

nep_txt_path (Path | str)
data_xyz_path (Path | str)
energy_out_path (Path | str)
force_out_path (Path | str)
virial_out_path (Path | str)
descriptor_path (Path | str)
spin_out_path (Path | str | None)
cached_outputs_only (bool)

STRUCTURE_SYNC_RULES = {'energy': StructureSyncRule(dataset_attr='energy', target='x_cols', collector=<function collect_energy_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'force': StructureSyncRule(dataset_attr='force', target='x_cols', collector=<function collect_force_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'stress': StructureSyncRule(dataset_attr='stress', target='x_cols', collector=<function collect_stress_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'virial': StructureSyncRule(dataset_attr='virial', target='x_cols', collector=<function collect_virial_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None)}

classmethod from_path(path, *, structures=None, nep_txt_path=None, model_type=None)

Create an instance from a DeepMD dataset directory.

Parameters:

path (Path or str) – Directory that contains DeepMD set.* data and outputs.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.
nep_txt_path (Path or str, optional) – Override the NEP model text file used for (re)calculation.
model_type (int, optional) – Ignored. Accepted for API compatibility with NEP loaders.

Returns:

Configured loader pointing at the resolved dataset.

Return type:

DeepmdResultData

load_structures()

Load structures from DeepMD numpy sets into the local dataset.

Notes

Recognises DeepMD set.* partitions and aggregates per-set arrays.
Respects the optional cancel_event for graceful cancellation.

Examples

>>> # Constructed via DeepmdResultData.from_path(...)

property datasets: Return the datasets exposed to the UI in canonical order.

property energy: Return the per-atom energy dataset.

property force: Return the per-atom force dataset.

property spin: Return the per-atom spin dataset.

property mforce: Return the magnetic force dataset.

property virial: Return the per-atom virial dataset.

export_model_xyz(save_path)

Export current and removed structures into dedicated directories.

Parameters:: save_path (Path or str) – Destination directory that will receive export_good_model and export_remove_model folders.
Return type:: None

NepTrainKit.core.io.is_deepmd_path(folder)

Return True when folder looks like a DeepMD dataset directory.

Parameters:: folder (str | Path)
Return type:: bool

NepTrainKit.core.io.load_result_data(path)

Load result data for path via the first matching loader.

Parameters:: path (PathLike) – File or directory to be loaded.
Returns:: The loaded dataset if any loader recognises path, else None.
Return type:: ResultData or None

Examples

>>> dataset = load_result_data("./train.xyz")
>>> dataset.load()
>>> print(dataset)

NepTrainKit.core.io.register_result_loader(loader)

Register a loader so that it participates in result discovery.

Parameters:: loader (ResultLoader) – An instance of a concrete ResultLoader subclass.
Returns:: The same loader instance, allowing use as a decorator.
Return type:: ResultLoader

Examples

>>> from NepTrainKit.core.io.registry import DeepmdFolderLoader, NepModelTypeLoader
>>> register_result_loader(DeepmdFolderLoader())
>>> register_result_loader(
...     NepModelTypeLoader("nep_train", {0, 3},
...                        'NepTrainKit.core.io:NepTrainResultData')
... )
>>> register_result_loader(
...     NepModelTypeLoader("nep_dipole", {1},
...                        'NepTrainKit.core.io:NepDipoleResultData')
... )
>>> register_result_loader(
...     NepModelTypeLoader("nep_polar", {2},
...                        'NepTrainKit.core.io:NepPolarizabilityResultData')
... )
>>> register_result_loader(OtherLoader())

NepTrainKit.core.io.matches_result_loader(path)

Return True if any registered loader recognises path.

Parameters:: path (str or os.PathLike) – File or directory to be examined.
Returns:: True if path is recognised by at least one loader.
Return type:: bool

Examples

>>> matches_result_loader("./train.xyz")
True

NepTrainKit.core.io.farthest_point_sampling(points, n_samples, min_dist=0.1, selected_data=None)

Greedy FPS with optional warm-start and minimum-distance constraint.

Parameters:

points (numpy.ndarray) – Input point set of shape (N, D).
n_samples (int) – Maximum number of samples to select.
min_dist (float, default=0.1) – Minimum allowed distance to any already selected point.
selected_data (numpy.ndarray or None, optional) – Warm-start set with shape (M, D). If provided, selection respects the minimum distance from this set.

Returns:

Indices of selected points.

Return type:

list[int]

Examples

>>> import numpy as np
>>> P = np.random.rand(100, 3).astype(np.float32)
>>> idx = farthest_point_sampling(P, 5, min_dist=0.0)
>>> len(idx) <= 5
True

NepTrainKit.core.io.allocate_sqrt_quotas(group_sizes, n_samples)

Allocate one slot per non-empty group, then distribute by sqrt(size).

Parameters:

group_sizes (dict[Hashable, int])
n_samples (int)

Return type:

dict[Hashable, int]

NepTrainKit.core.io.centered_fps(points, n_samples, min_dist, selected_data=None)

Run FPS from the feature-space center, or from a warm-start set.

Parameters:

n_samples (int)
min_dist (float)

Return type:

list[int]

NepTrainKit.core.io.structure_element_set_key(structure)

Return a stable element-set key for ASE or NepTrainKit structures.

Return type:: tuple[str, …]