API 参考

下列页面由 automodule 自动生成;源码 docstring 保持英文原文。

High-level data structures and helpers for NEP datasets.

class NepTrainKit.core.structure.Structure(lattice, atomic_properties, properties, additional_fields)

Bases: object

Container for EXTXYZ frames (lattice, positions, species, and fields).

Notes

  • Coordinates are stored in Cartesian Angstroms under pos.

  • Frame-level attributes like energy, pbc, and virial live in additional_fields.

Examples

>>> from NepTrainKit.core.structure import Structure
# read structure from file by iterating over it
>>> for structure in  Structure.iter_read_multiple(filename="train.xyz"):
...     print(structure)
# read structure from file
>>> structure_list = Structure.read_multiple(filename="train.xyz")
Parameters:
  • lattice (list[float] | npt.NDArray[np.float64])

  • atomic_properties (dict[str, npt.NDArray[Any]])

  • properties (list[dict[str, str]])

  • additional_fields (dict[str, Any])

property tag: str

Alias for the Config_type additional field.

get_prop_key(additional_fields=True, atomic_properties=True)

List all property keys available on the structure.

Parameters:
  • additional_fields (bool, default=True) – Include keys from additional_fields.

  • atomic_properties (bool, default=True) – Include keys from atomic_properties.

Returns:

Combined keys such as [“pos”, “energy”, …].

Return type:

list of str

remove_atomic_properties(key)

Remove an atomic array property.

Parameters:

key (str) – Name of the property to delete.

classmethod read_xyz(filename)

Read a single EXTXYZ structure from a file path.

Parameters:

filename (str) – Path to an .xyz file containing a single frame.

Returns:

Parsed structure instance.

Return type:

Structure

static iter_read_multiple(filename, cancel_event=None)

Iterate frames in a multi-structure EXTXYZ file.

Parameters:
  • filename (str) – Path to a multi-frame .xyz file.

  • cancel_event (threading.Event or None, optional) – If provided and is_set(), stop early.

Yields:

Structure – Parsed structures one by one.

Examples

>>> from NepTrainKit.core.structure import Structure
>>> for structure in  Structure.iter_read_multiple(filename="train.xyz"):
...     print(structure)
property cell

Simulation cell lattice vectors.

Returns:

Row-wise lattice vectors [a, b, c].

Return type:

ndarray, shape (3, 3)

property volume

Cell volume.

Returns:

Volume of the simulation cell.

Return type:

float

property abc

Lattice vector lengths (a, b, c).

Returns:

Lengths of the three lattice vectors in Å.

Return type:

ndarray, shape (3,), dtype float

property angles

Lattice angles (alpha, beta, gamma) in degrees.

Returns:

Angles α, β, γ in degrees.

Return type:

ndarray, shape (3,), dtype float

property numbers

Atomic numbers of all atoms in the cell.

Returns:

List of atomic numbers in the same order as elements.

Return type:

list[int]

property spin_num: int

Number of atoms with non-zero magnetic moment.

Returns:

Count of atoms whose force_mag entry is not [0, 0, 0]. Returns 0 if force_mag is absent.

Return type:

int

property formula

Chemical formula string (plain text).

Returns:

Formula like H2O, Fe3O4 without sub-scripts.

Return type:

str

property html_formula: str

Chemical formula string with HTML sub-scripts.

Returns:

Formula like H<sub>2</sub>O for direct HTML rendering.

Return type:

str

property per_atom_energy

Energy per atom.

Returns:

Total energy divided by the number of atoms (same units as energy).

Return type:

float

property energy

Total energy of the structure.

Returns:

Value stored in additional_fields['energy'].

Return type:

float

property has_energy

Check if energy or stress data is available.

Returns:

True if additional_fields contains energy .

Return type:

bool

property forces

Per-atom force array.

Returns:

Forces in eV/Å for each atom.

Return type:

ndarray, shape (N, 3), dtype float64

property has_forces

Check if forces or stress data is available.

Returns:

True if atomic_properties contains self.force_label.

Return type:

bool

property bec: ndarray[tuple[Any, ...], dtype[float64]]

Per-atom Born effective charge tensor as (N, 9).

property has_bec: bool

Return True when BEC data is present.

property has_virial

Check if virial or stress data is available.

Returns:

True if additional_fields contains virial or stress.

Return type:

bool

property virial

Virial vector (flattened).

If only stress is present, convert via \(\mathrm{virial} = -\mathrm{stress} \times V\).

Returns:

Flattened virial in eV; ordering: [xx, xy, xz, yx, yy, yz, zx, zy, zz].

Return type:

ndarray, shape (9,), dtype float

Raises:

ValueError – If neither virial nor stress is available.

property nep_virial

Virial in NEP 6-component order per atom.

Returns:

[xx, yy, zz, xy, yz, zx] components in eV/atom.

Return type:

ndarray, shape (6,), dtype float

property nep_dipole

Dipole moment per atom in NEP format.

Returns:

Dipole vector in e·Å/atom, parsed from additional_fields['dipole'].

Return type:

ndarray, shape (3,), dtype float64

property nep_polarizability

Polarizability tensor per atom in NEP 6-component order.

Returns:

[xx, yy, zz, xy, yz, zx] components in ų/atom.

Return type:

ndarray, shape (6,), dtype float64

get_chemical_symbols()

Return chemical symbols for all atoms.

Returns:

Same as elements.

Return type:

list[str]

property elements

Chemical symbols of all atoms.

Returns:

Symbol for each atom.

Return type:

ndarray, shape (N,), dtype str

property positions

Cartesian coordinates of all atoms.

Returns:

Positions in Å.

Return type:

ndarray, shape (N, 3), dtype float

property num_atoms

Number of atoms in the structure.

Return type:

int

copy()

Return a deep copy of the structure and all its arrays.

Returns:

Independent duplicate of the current instance.

Return type:

Structure

set_lattice(new_lattice, in_place=False)

Scale positions to a new lattice and update pos.

Parameters:
  • new_lattice (numpy.ndarray) – New lattice matrix in Angstroms with shape (3, 3).

  • in_place (bool, default=False) – If True, modify this object; otherwise return a copy.

Returns:

Updated structure (self if in_place=True).

Return type:

Structure

supercell(scale_factor, order='atom-major', tol=1e-05)
Return type:

Structure

adjust_reasonable(coefficient=0.7)

Check whether the structure is physically reasonable based on covalent radii.

For each pair of nearest-neighbour atoms, the actual bond length is compared with coefficient * (R_cov1 + R_cov2). If any bond is shorter than this threshold, the structure is considered non-physical.

Parameters:

coefficient (float, optional) – Scaling factor for the sum of covalent radii. Default is 0.7.

Returns:

True if the structure passes the check, False if any bond is unphysically short.

Return type:

bool

classmethod parse_xyz(lines)

Parse an extended XYZ block into a Structure instance.

Parameters:

lines (list[str] or str) – Raw XYZ content. If a single string is provided it is split on newlines internally.

Returns:

New object with lattice, atomic properties and global metadata extracted from the comment line.

Return type:

Structure

Examples

>>> xyz = '''2
... Lattice="4.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 4.0" Properties=species:S:1:pos:R:3
... H 0.0 0.0 0.0
... H 1.0 0.0 0.0'''
>>> struct = Structure.parse_xyz(xyz)
>>> struct.num_atoms
2
classmethod read_multiple(filename)

Read a multi-structure XYZ file and return a list of Structure objects.

Parameters:

filename (str or os.PathLike) – Path to a multi-frame .xyz file.

Returns:

List of Structure instances, one per frame.

Return type:

list[Structure]

Examples

>>> from NepTrainKit.core.structure import Structure
>>> structure_list = Structure.read_multiple("train.xyz")
>>> len(structure_list)
42
classmethod read_multiple_fast(filename, max_workers=None, **kwargs)

High-performance multi-frame EXTXYZ reader backed by a C++ parser.

This uses a pybind11 extension (NepTrainKit.core._fastxyz) and Python mmap to index and parse frames in native code, then constructs Structure objects. Falls back to read_multiple on error or if the extension is unavailable.

Parameters:
  • filename (str)

  • max_workers (int | None)

write(file)

Write the structure as an EXTXYZ frame to a file-like object.

Parameters:

file (IO) – Open text stream supporting write().

get_all_distances()

Compute all-pairs distances using periodic minimum image.

Returns:

Matrix of shape (N, N) with distances.

Return type:

numpy.ndarray

get_mini_distance_info()

Minimum interatomic distance for each element pair.

Returns:

Map from element pair (A, B) with A <= B to the minimal distance across the structure.

Return type:

dict[tuple[str, str], float]

get_bond_pairs()

Likely bonded pairs using a covalent-radii heuristic.

Returns:

Upper-triangular pairs where distance < 1.15 * (r_i + r_j).

Return type:

list[tuple[int, int]]

get_bad_bond_pairs(coefficient=0.8)

Pairs that violate a short-bond threshold.

Parameters:

coefficient (float, default=0.8) – Threshold relative to the sum of covalent radii.

Returns:

Upper-triangular pairs shorter than the threshold.

Return type:

list[tuple[int, int]]

NepTrainKit.core.structure.calculate_pairwise_distances(lattice_params, atom_coords, fractional=True, block_size=2048)

All-pairs distances under periodic minimum-image convention.

This implementation is robust for triclinic (skewed) cells. It first reduces fractional deltas into [-0.5, 0.5) and then checks the 26 neighboring image shifts to ensure the true shortest image vector is selected under the lattice metric.

Parameters:
  • lattice_params (numpy.ndarray) – Lattice matrix with shape (3, 3). Row-wise lattice vectors [a, b, c].

  • atom_coords (numpy.ndarray) – Coordinates with shape (N, 3).

  • fractional (bool, default=True) – If True, atom_coords are fractional; otherwise Cartesian.

  • block_size (int, default=2048) – Row-block size to balance memory and speed for large N.

Returns:

Distance matrix of shape (N, N).

Return type:

numpy.ndarray

NepTrainKit.core.structure.is_organic_cluster(symbols)

Check whether the structure represents an organic molecular cluster.

Parameters:

symbols (list[str])

Return type:

bool

NepTrainKit.core.structure.get_vibration_modes(structure, min_frequency=0.0)

Extract vibrational modes stored in per-atom arrays on an ASE Atoms object.

Parameters:
  • structure (ase.Atoms) – Atomic structure that potentially carries vibrational mode information.

  • min_frequency (float, optional) – Absolute frequency threshold (in the same units as the stored data) used to filter out near-zero translational modes. Set to 0.0 to keep all provided modes. Defaults to 0.0.

Returns:

Pair of (frequencies, modes) where modes has shape (n_modes, n_atoms, 3). Missing frequencies are returned as nan. When no data is attached to the structure the function returns two empty arrays.

Return type:

tuple(ndarray, ndarray)

NepTrainKit.core.structure.get_clusters(structure)

Connected-atom clusters under ASE natural cutoffs.

Parameters:

structure (ase.Atoms) – ASE atoms object used for neighbor analysis.

Returns:

Cluster index lists and a boolean list marking organic clusters.

Return type:

tuple[list[list[int]], list[bool]]

NepTrainKit.core.structure.unwrap_molecule(structure, cluster_indices)

Unwrap atoms in a molecular cluster back into the primary simulation cell.

NepTrainKit.core.structure.process_organic_clusters(structure, new_structure, clusters, is_organic_list)

Recenter and unwrap organic molecular clusters.

Parameters:
  • structure (ase.Atoms) – Original ASE atoms with the reference cell.

  • new_structure (ase.Atoms) – Target ASE atoms whose positions will be updated.

  • clusters (list[list[int]]) – Atom-index clusters from get_clusters().

  • is_organic_list (list[bool]) – Flags indicating organic clusters.

NepTrainKit.core.structure.load_npy_structure(folders, order_file=None, cancel_event=None, base_root=None)

Recursively load DeepMD datasets beneath folders.

Parameters:
  • folders (str | Path)

  • base_root (str | Path | None)

NepTrainKit.core.structure.get_type_map(structures)
Parameters:

structures (list[Structure])

Return type:

list[str]

NepTrainKit.core.structure.save_npy_structure(folder, structures, type_map=None)

Save structures to a DeepMD-style .npy dataset layout.

Parameters:
  • folder (PathLike) – Target root folder. One subfolder per Config_type is created.

  • structures (list[Structure]) – Structures to persist. Per-atom arrays are saved under set.000 and per-frame values under the config folder.

  • type_map (list[str])

class NepTrainKit.core.structure.FastStructure(lattice, atomic_properties, properties, additional_fields)

Bases: Structure

Structure subclass that uses a C++-accelerated parser for EXTXYZ IO.

Parameters:
  • lattice (list[float] | npt.NDArray[np.float64])

  • atomic_properties (dict[str, npt.NDArray[Any]])

  • properties (list[dict[str, str]])

  • additional_fields (dict[str, Any])

classmethod read_multiple(filename, max_workers=None)

Read a multi-structure XYZ file and return a list of Structure objects.

Parameters:
  • filename (str or os.PathLike) – Path to a multi-frame .xyz file.

  • max_workers (int | None)

Returns:

List of Structure instances, one per frame.

Return type:

list[Structure]

Examples

>>> from NepTrainKit.core.structure import Structure
>>> structure_list = Structure.read_multiple("train.xyz")
>>> len(structure_list)
42
classmethod iter_read_multiple(filename, max_workers=None)

Iterate frames in a multi-structure EXTXYZ file.

Parameters:
  • filename (str) – Path to a multi-frame .xyz file.

  • cancel_event (threading.Event or None, optional) – If provided and is_set(), stop early.

  • max_workers (int | None)

Yields:

Structure – Parsed structures one by one.

Examples

>>> from NepTrainKit.core.structure import Structure
>>> for structure in  Structure.iter_read_multiple(filename="train.xyz"):
...     print(structure)

Runtime NEP calculator wrapper handling CPU/GPU backends.

class NepTrainKit.core.calculator.NepCalculator(model_file='nep.txt', backend=None, batch_size=None, native_stdio='silent')

Bases: object

Initialise the NEP calculator and load a CPU/GPU backend.

Parameters:
  • model_file (str | Path)

  • backend (NepBackend | None)

  • batch_size (int | None)

  • native_stdio (str | Path | Literal['inherit', 'silent'] | None)

cancel()
Return type:

None

load_nep()
Return type:

None

compose_structures(structures)
Parameters:

structures (Iterable[Structure] | Structure)

Return type:

tuple[list[list[int]], list[list[float]], list[list[float]], list[int]]

calculate(structures, return_charge=False, mean_virial=True)
Parameters:
calculate_dftd3(structures, functional, cutoff, cutoff_cn, mean_virial=True)
Parameters:
  • structures (Iterable[Structure] | Structure)

  • functional (str)

  • cutoff (float)

  • cutoff_cn (float)

  • mean_virial (bool)

calculate_with_dftd3(structures, functional, cutoff, cutoff_cn, mean_virial=True)
Parameters:
  • structures (Iterable[Structure] | Structure)

  • functional (str)

  • cutoff (float)

  • cutoff_cn (float)

  • mean_virial (bool)

get_descriptor(structure)
Parameters:

structure (Structure)

Return type:

ndarray[tuple[Any, …], dtype[float32]]

get_structures_descriptor(structures, mean_descriptor=True)
Parameters:
  • structures (list[Structure])

  • mean_descriptor (bool)

Return type:

ndarray[tuple[Any, …], dtype[float32]]

get_structures_polarizability(structures)
Parameters:

structures (list[Structure])

Return type:

ndarray[tuple[Any, …], dtype[float64]]

get_structures_dipole(structures)
Parameters:

structures (list[Structure])

Return type:

ndarray[tuple[Any, …], dtype[float64]]

calculate_to_ase(atoms_list, calc_descriptor=False)
Parameters:

atoms_list (Atoms | Iterable[Atoms])

NepTrainKit.core.calculator.Nep3Calculator

alias of NepCalculator

class NepTrainKit.core.calculator.NepAseCalculator(model_file='nep.txt', backend=None, batch_size=None, *args, **kwargs)

Bases: Calculator

Parameters:
  • model_file (str | Path)

  • backend (NepBackend | None)

  • batch_size (int | None)

implemented_properties: list[str] = ['energy', 'energies', 'forces', 'stress', 'descriptor']

Properties calculator can handle (energy, forces, …)

calculate(atoms=None, properties=['energy'], system_changes=['positions', 'numbers', 'cell', 'pbc', 'initial_charges', 'initial_magmoms'])

Do the calculation.

properties: list of str

List of what needs to be calculated. Can be any combination of ‘energy’, ‘forces’, ‘stress’, ‘dipole’, ‘charges’, ‘magmom’ and ‘magmoms’.

system_changes: list of str

List of what has changed since last calculation. Can be any combination of these six: ‘positions’, ‘numbers’, ‘cell’, ‘pbc’, ‘initial_charges’ and ‘initial_magmoms’.

Subclasses need to implement this, but can ignore properties and system_changes if they want. Calculated properties should be inserted into results dictionary like shown in this dummy example:

self.results = {'energy': 0.0,
                'forces': np.zeros((len(atoms), 3)),
                'stress': np.zeros(6),
                'dipole': np.zeros(3),
                'charges': np.zeros(len(atoms)),
                'magmom': 0.0,
                'magmoms': np.zeros(len(atoms))}

The subclass implementation should first call this implementation to set the atoms attribute and create any missing directories.

class NepTrainKit.core.io.ResultData(*args, **kwargs)

Bases: QObject

Manage structures, descriptors, and plots for NEP result files. Subclasses implement _load_dataset() and expose their plot datasets through datasets. The class also centralises selection and synchronisation utilities shared by the GUI.

Parameters:
  • nep_txt_path (Path)

  • data_xyz_path (Path)

  • descriptor_path (Path)

  • calculator_factory (Callable[[str], Any] | None)

  • import_options (dict[str, Any] | None)

STRUCTURE_SYNC_RULES: dict[str, StructureSyncRule] = {}
atoms_num_list: ndarray[tuple[Any, ...], dtype[_ScalarT]]
request_cancel()

Request cooperative cancel during load. Also forward to calculator.

reset_cancel()

Clear the cancellation flag so future operations proceed.

load_structures()

Populate structure from disk or a prefetched cache. The method honours _prefetched_structures first; otherwise it delegates to the importer registry and honours import_options.

set_structures(structures)

Provide pre-parsed structures so load_structures can skip file IO.

Parameters:

structures (list[Structure])

has_completer_cache(search_type=None, max_items=50000)

Return True if a completer cache exists for search_type and max_items.

Parameters:
  • search_type (SearchType | str | None)

  • max_items (int)

Return type:

bool

ensure_completer_cache(search_type=None, max_items=50000)

Build and cache completer mappings for the requested search type.

Parameters:
  • search_type (SearchType | str | None)

  • max_items (int)

Return type:

None

get_completer_cache(search_type=None, max_items=50000)

Return cached completer mapping for search_type; builds it if needed.

Parameters:
  • search_type (SearchType | str | None)

  • max_items (int)

Return type:

dict[str, int]

search_config(config, search_type)

Return structure indices matching the selected search mode.

Parameters:
  • config (str)

  • search_type (SearchType)

Return type:

list[int]

sync_structures(fields=None, structure_indices=None)

Apply registered StructureSyncRule objects to datasets.

Parameters:
  • fields (Iterable[str] or str, optional) – Subset of rule names to apply. None means all registered rules.

  • structure_indices (Sequence[int], optional) – Visible structure indices affected by the update. None uses all active structures.

Return type:

None

write_prediction()

Create a nep.in stub when large datasets require prediction mode. The GUI expects a nep.in file to mark prediction runs for large (>1000) structure collections.

static cache_outputs_enabled()

Return whether loader-generated cache files should be written.

Return type:

bool

load()

Load structures, descriptors, and dataset arrays in sequence. The routine instantiates a calculator (optionally via calculator_factory), parses structures, and then delegates to subclass hooks for descriptors and dataset-specific properties.

property datasets: list[NepPlotData]

Return the plot datasets exposed by the subclass.

property descriptor

Return the descriptor dataset prepared in _load_descriptors().

property num

Return the number of active structures in the dataset.

property structure

Return the StructureData wrapper for the active structures.

property abcs: ndarray[tuple[Any, ...], dtype[float32]]

Return the cached lattice vector lengths (a, b, c) for all structures.

property angles: ndarray[tuple[Any, ...], dtype[float32]]

Return the cached lattice angles (alpha, beta, gamma) for all structures.

get_reference_per_atom_energy_array(use_active=False)

Return reference per-atom energies as a flat float64 array.

Parameters:

use_active (bool)

Return type:

ndarray[tuple[Any, …], dtype[float64]]

get_predicted_per_atom_energy_array(use_active=False)

Return predicted per-atom energies as a flat float64 array.

Parameters:

use_active (bool)

Return type:

ndarray[tuple[Any, …], dtype[float64]]

is_select(i)

Return True if the structure index is marked as selected.

Parameters:

i (int)

Return type:

bool

select(indices)

Mark structures denoted by indices as selected.

Parameters:

indices (Sequence[int] | int)

Return type:

None

uncheck(indices)

Remove structures denoted by indices from the selection set.

Parameters:

indices (Sequence[int] | int)

Return type:

None

inverse_select()

Invert the current selection over the active structure set.

Return type:

None

select_structures_by_index(index_expression, use_origin=True)

Resolve an index expression into raw structure indices.

Parameters:
  • index_expression (str)

  • use_origin (bool)

Return type:

list[int]

select_structures_by_range(dataset, x_min, x_max, y_min, y_max, use_and=True)

Return structure indices whose scatter positions fall in the given bounds.

Parameters:
  • dataset (NepPlotData)

  • x_min (float)

  • x_max (float)

  • y_min (float)

  • y_max (float)

  • use_and (bool)

Return type:

list[int]

select_structures_by_lattice_range(a_range, b_range, c_range, alpha_range, beta_range, gamma_range)

Return structure indices whose lattice parameters fall within the given ranges.

Uses a fixed tolerance of 1e-4 to handle floating-point precision loss from float32 storage of lattice vectors, independent of range size.

Parameters:
  • a_range (tuple[float, float])

  • b_range (tuple[float, float])

  • c_range (tuple[float, float])

  • alpha_range (tuple[float, float])

  • beta_range (tuple[float, float])

  • gamma_range (tuple[float, float])

Return type:

list[int]

get_selected_structures()

Return the selected structures in the order of their raw index.

Return type:

list[Structure]

export_selected_xyz(save_file_path)

Write the currently selected structures to save_file_path.

Parameters:

save_file_path (str | Path)

Return type:

None

export_selected_npy(save_path)

Export selected structures as a DeepMD-style deepmd/npy dataset.

Parameters:

save_path (str | Path)

Return type:

None

export_active_xyz(save_file_path)

Write active (non-removed) structures to save_file_path.

Parameters:

save_file_path (str | Path)

Return type:

None

export_active_npy(save_path)

Export active (non-removed) structures as a DeepMD-style deepmd/npy dataset.

Parameters:

save_path (str | Path)

Return type:

None

export_removed_xyz(save_file_path)

Write removed structures (if any) to save_file_path.

Parameters:

save_file_path (str | Path)

Return type:

None

export_removed_npy(save_path)

Export removed structures as a DeepMD-style deepmd/npy dataset.

Parameters:

save_path (str | Path)

Return type:

None

export_current_npy(save_path, index)

Export a single structure as DeepMD-style deepmd/npy dataset.

Parameters:
  • save_path (str | Path)

  • index (int)

Return type:

None

export_model_extxyz(save_path)

Export active and removed structures into save_path folder as extxyz.

Parameters:

save_path (str | Path)

Return type:

None

export_model_npy(save_path)

Export active and removed structures into save_path folder as deepmd/npy.

Parameters:

save_path (str | Path)

Return type:

None

export_model_xyz(save_path)

Export active and removed structures into save_path folder.

Parameters:

save_path (str | Path)

Return type:

None

get_atoms(index)

Return the ASE atoms object for the original index.

Parameters:

index (int)

remove(i)

Remove the structure i across all datasets.

Parameters:

i (int)

Return type:

None

property is_revoke: bool

Return True if any structures have been removed.

revoke()

Undo the most recent removal across structures and datasets.

Return type:

None

delete_selected()

Remove and clear all currently selected structures.

iter_non_physical_structure_indices(radius_coefficient)

Yield progress increments while collecting non-physical structures.

Parameters:

radius_coefficient (float)

consume_non_physical_structure_indices()

Return and clear indices collected by the non-physical scan.

Return type:

list[int]

iter_unbalanced_force_indices(threshold)

Yield progress units while collecting structures with non-zero net force.

Parameters:

threshold (float) – Minimum allowed magnitude of the summed force vector ΣF. Structures whose net force exceeds this value are recorded for later selection.

consume_unbalanced_force_indices()

Return and clear indices collected by the net-force scan.

Return type:

list[int]

discover_atomic_numeric_fields(scope=DistributionScope.ACTIVE)

Discover available numeric fields for distribution analysis.

Parameters:

scope (str | DistributionScope, default="active") – Structure subset to inspect. selected inspects only selected active structures.

Return type:

list[FieldSpec]

iter_distribution_analysis(request=None)

Build distribution statistics for selected numeric fields.

Notes

Results are cached by (structure.version, request, force_mode) and stored for retrieval via get_distribution_analysis().

Parameters:

request (DistributionRequest | Mapping[str, Any] | None)

get_distribution_analysis()

Return the last computed distribution-analysis payload.

Return type:

dict[str, Any]

resolve_distribution_bin_indices(analysis_id, metric_key, series_key, bin_index)

Resolve structure indices represented by a histogram bin.

Parameters:
  • analysis_id (int)

  • metric_key (str)

  • series_key (str)

  • bin_index (int)

Return type:

list[int]

iter_dataset_summary(group_by=SearchType.TAG)

Aggregate dataset-wide statistics for use in summary dialogs.

Notes

This generator yields a progress unit after each structure so that callers can drive a progress dialog. Results are cached on the instance and later returned by get_dataset_summary().

Parameters:

group_by (SearchType, default=SearchType.TAG) – Attribute used for grouping the distribution table. TAG uses Structure.tag (Config_type), while FORMULA uses Structure.formula.

get_dataset_summary()

Return the most recently computed dataset summary.

Return type:

dict[str, Any]

sparse_descriptor_selection(n_samples, distance, restrict_to_selection=False)

Return FPS-selected structure indices and whether they should be deselected.

Parameters:
  • n_samples (int)

  • distance (float)

  • restrict_to_selection (bool)

Return type:

tuple[list[int], bool]

sparse_point_selection(n_samples, distance, descriptor_source='reduced', restrict_to_selection=False, training_path=None, sampling_mode='count', r2_threshold=0.9)

Delegate sparse sampling to the sampler helper.

Parameters:
  • n_samples (int)

  • distance (float)

  • descriptor_source (str)

  • restrict_to_selection (bool)

  • training_path (str | None)

  • sampling_mode (str)

  • r2_threshold (float)

Return type:

tuple[list[int], bool]

export_descriptor_data(path)

Write descriptor values for the current selection to path.

Parameters:

path (str | Path)

Return type:

None

get_editable_structure_tags()

Return the editable tags for currently selected structures.

Return type:

set[str]

update_structure_metadata(remove_tags, new_tag_info, rename_map=None)

Apply metadata removals, additions, and key renames to the selected structures.

Parameters:
  • remove_tags (Iterable[str])

  • new_tag_info (Mapping[str, str])

  • rename_map (Mapping[str, str] | None)

Return type:

None

iter_shift_energy_baseline(group_patterns, alignment_mode, max_generations, population_size, convergence_tol, reference_indices=None, precomputed_baseline=None, baseline_store=None, source_summary=None)

Shift dataset energies and yield progress units for UI hooks.

Parameters:
  • group_patterns (Sequence[str])

  • alignment_mode (str)

  • max_generations (int)

  • population_size (int)

  • convergence_tol (float)

  • reference_indices (Sequence[int] | None)

  • baseline_store (dict | None)

  • source_summary (dict | None)

apply_dft_d3_correction(mode, functional, cutoff, cutoff_cn)

Apply DFT-D3 corrections and synchronise dependent datasets.

Parameters:
  • mode (int)

  • functional (str)

  • cutoff (float)

  • cutoff_cn (float)

Return type:

None

class NepTrainKit.core.io.StructureSyncRule(dataset_attr, target, collector, precondition=<function StructureSyncRule.<lambda>>, dtype=None)

Bases: object

Declarative instruction that synchronises structure attributes into datasets.

Parameters:
  • dataset_attr (str)

  • target (str | slice | Callable[[Any], Any])

  • collector (Callable[[ResultData, Any, ndarray | None], tuple[ndarray, ndarray[tuple[Any, ...], dtype[Any]]]])

  • precondition (Callable[[ResultData], bool])

  • dtype (Any)

dataset_attr: str
target: str | slice | Callable[[Any], Any]
collector: Callable[[ResultData, Any, ndarray | None], tuple[ndarray, ndarray[tuple[Any, ...], dtype[Any]]]]
precondition()
dtype: Any = None
apply(result_data, structure_indices=None)

Execute the rule on result_data if the precondition passes.

Parameters:
  • result_data (ResultData)

  • structure_indices (ndarray | None)

Return type:

None

class NepTrainKit.core.io.NepPlotData(data_list, **kwargs)

Bases: NepData

Two-column plot helper that separates NEP predictions from references.

Parameters:
  • data_list (Sequence[Any] | ndarray[tuple[Any, ...], dtype[Any]])

  • kwargs (Any)

property x: ndarray[tuple[Any, ...], dtype[Any]]

Flattened NEP predictions suitable for scatter plots.

property y: ndarray[tuple[Any, ...], dtype[Any]]

Flattened reference values.

property structure_index: ndarray[tuple[Any, ...], dtype[int32]]

Map each flattened point back to its parent structure index.

class NepTrainKit.core.io.StructureData(data_list, group_list=1, index_list=None, **kwargs)

Bases: NepData

Utility mixin for structure-level queries.

Parameters:
  • data_list (Sequence[Any] | ndarray[tuple[Any, ...], dtype[Any]])

  • group_list (int | Sequence[int])

  • index_list (Sequence[int] | ndarray[tuple[Any, ...], dtype[Any]] | None)

  • kwargs (Any)

has_completer_cache(search_type=None, max_items=50000)

Return True if a completer cache exists for search_type and max_items.

Parameters:
  • search_type (SearchType | str | None)

  • max_items (int)

Return type:

bool

ensure_completer_cache(max_items=50000)

Build and cache completer mappings for tag/formula/elements.

Notes

  • Designed to run in a background thread (e.g. dataset load thread).

  • Results are stored as dict[SearchType, dict[str,int]] and can be fed directly into ConfigTypeSearchLineEdit.setCompleterKeyWord(…).

Parameters:

max_items (int)

Return type:

None

get_completer_cache(search_type=None, max_items=50000)

Return cached completer mapping for search_type; builds it if needed.

Parameters:
  • search_type (SearchType | str | None)

  • max_items (int)

Return type:

dict[str, int]

get_all_config(search_type=None)

Return structure metadata used for filtering.

Parameters:

search_type (SearchType, optional) – Metadata selector. Defaults to SearchType.TAG.

Returns:

Value per active structure matching search_type.

Return type:

list[str]

search_config(config, search_type)

Return structure indices whose metadata match config.

Parameters:
  • config (str) – Regular expression used for matching.

  • search_type (SearchType) – Attribute family to inspect.

Returns:

Structure indices satisfying the pattern; empty on failure.

Return type:

list[int]

class NepTrainKit.core.io.NepTrainResultData(*args, **kwargs)

Bases: ResultData

Result loader for NEP training outputs with energy, force, stress, and virial datasets.

The loader normalises NEP predictions into plot-ready datasets and registers synchronisation rules used by the UI.

Examples

>>> from NepTrainKit.core.io import NepTrainResultData
# Load the xyz file
>>> result_dataset = NepTrainResultData.from_path(r"D:/Desktop/dataset3635-addD3/train.xyz")
>>> result_dataset.load()
>>> print(result_dataset)
# Select structures at indices 0 and 10
>>> result_dataset.select([0, 10])
>>> print(result_dataset)
# Delete the selected structures
>>> result_dataset.delete_selected()
>>> print(result_dataset)
# Get the indices of the 10 points with the largest energy error
>>> index = result_dataset.energy.get_max_error_index(10)
# Select the 10 points with the largest energy error and delete them
>>> result_dataset.select(index)
>>> result_dataset.delete_selected()
>>> print(result_dataset)
# Revoke the last deletion
>>> result_dataset.revoke()
# Perform farthest point sampling (normal global sampling)
>>> index, reverse = result_dataset.sparse_descriptor_selection(100, 0.001, False)
# Perform sampling within a region (select the first 300 structures)
>>> index = result_dataset.select_structures_by_index(":300")
>>> result_dataset.select(index)
>>> index, reverse = result_dataset.sparse_descriptor_selection(100, 0.001, True)
# Uncheck or inverse select based on the reverse flag
>>> if reverse:
>>>     result_dataset.uncheck(index)
>>> else:
>>>     result_dataset.select(index)
>>>     result_dataset.inverse_select()
>>> print(result_dataset)
Parameters:
  • nep_txt_path (Path | str)

  • data_xyz_path (Path | str)

  • energy_out_path (Path | str)

  • force_out_path (Path | str)

  • stress_out_path (Path | str)

  • virial_out_path (Path | str)

  • descriptor_path (Path | str)

  • charge_out_path (Path | str | None)

  • bec_out_path (Path | str | None)

  • charge_model (bool | None)

  • spin_force_out_path (Path | str | None)

STRUCTURE_SYNC_RULES = {'energy': StructureSyncRule(dataset_attr='energy', target='x_cols', collector=<function collect_energy_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'force': StructureSyncRule(dataset_attr='force', target='x_cols', collector=<function collect_force_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'stress': StructureSyncRule(dataset_attr='stress', target='x_cols', collector=<function collect_stress_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'virial': StructureSyncRule(dataset_attr='virial', target='x_cols', collector=<function collect_virial_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None)}
property datasets

Return datasets exposed to the UI in display order.

property energy

Return the per-structure energy dataset.

property force

Return the force dataset respecting per-atom settings.

property stress

Return the stress dataset derived from predicted virials.

property virial

Return the per-structure virial dataset.

property bec

Return the per-atom Born effective charge dataset when available.

property spin_force

Return the magnetic force dataset when available.

classmethod from_path(path, model_type=0, *, structures=None, nep_txt_path=None)

Create an instance from a NEP result directory.

Parameters:
  • path (PathLike) – Directory containing NEP outputs and descriptors.

  • model_type (int, optional) – NEP model type hint used to select descriptor fallbacks.

  • structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.

  • nep_txt_path (Path | str | None)

Returns:

Configured loader bound to the resolved directory.

Return type:

NepTrainResultData

class NepTrainKit.core.io.NepPolarizabilityResultData(*args, **kwargs)

Bases: ResultData

Result loader for NEP polarizability evaluations.

Parameters:
  • nep_txt_path (Path | str)

  • data_xyz_path (Path | str)

  • polarizability_out_path (Path | str)

  • descriptor_path (Path | str)

property datasets

Return the polarizability datasets in display order.

property polarizability_diagonal

Return the diagonal polarizability dataset.

property polarizability_no_diagonal

Return the off-diagonal polarizability dataset.

property descriptor

Return the descriptor dataset associated with the polarizability run.

classmethod from_path(path, *, structures=None)

Create a polarizability loader from a NEP dataset directory.

Parameters:
  • path (PathLike) – Directory containing NEP outputs.

  • structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.

Returns:

Configured loader bound to the resolved directory.

Return type:

NepPolarizabilityResultData

class NepTrainKit.core.io.NepDipoleResultData(*args, **kwargs)

Bases: ResultData

Result loader for NEP dipole predictions.

Parameters:
  • nep_txt_path (Path | str)

  • data_xyz_path (Path | str)

  • dipole_out_path (Path | str)

  • descriptor_path (Path | str)

property datasets

Return the dipole datasets in display order.

property dipole

Return the dipole dataset.

property descriptor

Return the descriptor dataset associated with the dipole run.

classmethod from_path(path, *, structures=None)

Create a dipole loader from a NEP dataset directory.

Parameters:
  • path (PathLike) – Directory containing NEP outputs.

  • structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.

Returns:

Configured loader bound to the resolved directory.

Return type:

NepDipoleResultData

class NepTrainKit.core.io.TaceResultData(*args, **kwargs)

Bases: ResultData

Result loader for TACE EXTXYZ prediction files.

Parameters:
  • nep_txt_path (Path | str)

  • data_xyz_path (Path | str)

  • descriptor_path (Path | str)

  • import_options (dict[str, Any] | None)

property datasets: list[NepPlotData]

Return the plot datasets exposed by the subclass.

property energy: NepPlotData
property force: NepPlotData
property virial: NepPlotData
property mforce: NepPlotData
classmethod from_path(path, *, structures=None, nep_txt_path=None, import_options=None)
Parameters:
  • path (str | Path)

  • structures (list[Structure] | None)

  • nep_txt_path (Path | str | None)

  • import_options (dict[str, Any] | None)

Return type:

TaceResultData

class NepTrainKit.core.io.DeepmdResultData(*args, **kwargs)

Bases: ResultData

Result loader that adapts DeepMD outputs to the ResultData interface.

The loader reads DeepMD numpy exports, normalises them, and exposes consistent NEP plot datasets for downstream visualisation.

Parameters:
  • nep_txt_path (Path | str)

  • data_xyz_path (Path | str)

  • energy_out_path (Path | str)

  • force_out_path (Path | str)

  • virial_out_path (Path | str)

  • descriptor_path (Path | str)

  • spin_out_path (Path | str | None)

STRUCTURE_SYNC_RULES = {'energy': StructureSyncRule(dataset_attr='energy', target='x_cols', collector=<function collect_energy_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'force': StructureSyncRule(dataset_attr='force', target='x_cols', collector=<function collect_force_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'stress': StructureSyncRule(dataset_attr='stress', target='x_cols', collector=<function collect_stress_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'virial': StructureSyncRule(dataset_attr='virial', target='x_cols', collector=<function collect_virial_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None)}
classmethod from_path(path, *, structures=None, nep_txt_path=None, model_type=None)

Create an instance from a DeepMD dataset directory.

Parameters:
  • path (Path or str) – Directory that contains DeepMD set.* data and outputs.

  • structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.

  • nep_txt_path (Path or str, optional) – Override the NEP model text file used for (re)calculation.

  • model_type (int, optional) – Ignored. Accepted for API compatibility with NEP loaders.

Returns:

Configured loader pointing at the resolved dataset.

Return type:

DeepmdResultData

load_structures()

Load structures from DeepMD numpy sets into the local dataset.

Notes

  • Recognises DeepMD set.* partitions and aggregates per-set arrays.

  • Respects the optional cancel_event for graceful cancellation.

Examples

>>> # Constructed via DeepmdResultData.from_path(...)
property datasets

Return the datasets exposed to the UI in canonical order.

property energy

Return the per-atom energy dataset.

property force

Return the per-atom force dataset.

property spin

Return the per-atom spin dataset.

property virial

Return the per-atom virial dataset.

export_model_xyz(save_path)

Export current and removed structures into dedicated directories.

Parameters:

save_path (Path or str) – Destination directory that will receive export_good_model and export_remove_model folders.

Return type:

None

NepTrainKit.core.io.is_deepmd_path(folder)

Return True when folder looks like a DeepMD dataset directory.

Parameters:

folder (str | Path)

Return type:

bool

NepTrainKit.core.io.load_result_data(path)

Load result data for path via the first matching loader.

Parameters:

path (PathLike) – File or directory to be loaded.

Returns:

The loaded dataset if any loader recognises path, else None.

Return type:

ResultData or None

Examples

>>> dataset = load_result_data("./train.xyz")
>>> dataset.load()
>>> print(dataset)
NepTrainKit.core.io.register_result_loader(loader)

Register a loader so that it participates in result discovery.

Parameters:

loader (ResultLoader) – An instance of a concrete ResultLoader subclass.

Returns:

The same loader instance, allowing use as a decorator.

Return type:

ResultLoader

Examples

>>> from NepTrainKit.core.io.registry import DeepmdFolderLoader, NepModelTypeLoader
>>> register_result_loader(DeepmdFolderLoader())
>>> register_result_loader(
...     NepModelTypeLoader("nep_train", {0, 3},
...                        'NepTrainKit.core.io:NepTrainResultData')
... )
>>> register_result_loader(
...     NepModelTypeLoader("nep_dipole", {1},
...                        'NepTrainKit.core.io:NepDipoleResultData')
... )
>>> register_result_loader(
...     NepModelTypeLoader("nep_polar", {2},
...                        'NepTrainKit.core.io:NepPolarizabilityResultData')
... )
>>> register_result_loader(OtherLoader())
NepTrainKit.core.io.matches_result_loader(path)

Return True if any registered loader recognises path.

Parameters:

path (str or os.PathLike) – File or directory to be examined.

Returns:

True if path is recognised by at least one loader.

Return type:

bool

Examples

>>> matches_result_loader("./train.xyz")
True
NepTrainKit.core.io.farthest_point_sampling(points, n_samples, min_dist=0.1, selected_data=None)

Greedy FPS with optional warm-start and minimum-distance constraint.

Parameters:
  • points (numpy.ndarray) – Input point set of shape (N, D).

  • n_samples (int) – Maximum number of samples to select.

  • min_dist (float, default=0.1) – Minimum allowed distance to any already selected point.

  • selected_data (numpy.ndarray or None, optional) – Warm-start set with shape (M, D). If provided, selection respects the minimum distance from this set.

Returns:

Indices of selected points.

Return type:

list[int]

Examples

>>> import numpy as np
>>> P = np.random.rand(100, 3).astype(np.float32)
>>> idx = farthest_point_sampling(P, 5, min_dist=0.0)
>>> len(idx) <= 5
True