API 参考
下列页面由 automodule 自动生成;源码 docstring 保持英文原文。
High-level data structures and helpers for NEP datasets.
- class NepTrainKit.core.structure.Structure(lattice, atomic_properties, properties, additional_fields)
Bases:
objectContainer for EXTXYZ frames (lattice, positions, species, and fields).
Notes
Coordinates are stored in Cartesian Angstroms under
pos.Frame-level attributes like
energy,pbc, andviriallive inadditional_fields.
Examples
>>> from NepTrainKit.core.structure import Structure # read structure from file by iterating over it >>> for structure in Structure.iter_read_multiple(filename="train.xyz"): ... print(structure) # read structure from file >>> structure_list = Structure.read_multiple(filename="train.xyz")
- Parameters:
lattice (list[float] | npt.NDArray[np.float64])
atomic_properties (dict[str, npt.NDArray[Any]])
properties (list[dict[str, str]])
additional_fields (dict[str, Any])
- property tag: str
Alias for the
Config_typeadditional field.
- get_prop_key(additional_fields=True, atomic_properties=True)
List all property keys available on the structure.
- Parameters:
additional_fields (bool, default=True) – Include keys from
additional_fields.atomic_properties (bool, default=True) – Include keys from
atomic_properties.
- Returns:
Combined keys such as [“pos”, “energy”, …].
- Return type:
list of str
- remove_atomic_properties(key)
Remove an atomic array property.
- Parameters:
key (str) – Name of the property to delete.
- classmethod read_xyz(filename)
Read a single EXTXYZ structure from a file path.
- Parameters:
filename (str) – Path to an .xyz file containing a single frame.
- Returns:
Parsed structure instance.
- Return type:
- static iter_read_multiple(filename, cancel_event=None)
Iterate frames in a multi-structure EXTXYZ file.
- Parameters:
filename (str) – Path to a multi-frame .xyz file.
cancel_event (threading.Event or None, optional) – If provided and is_set(), stop early.
- Yields:
Structure – Parsed structures one by one.
Examples
>>> from NepTrainKit.core.structure import Structure >>> for structure in Structure.iter_read_multiple(filename="train.xyz"): ... print(structure)
- property cell
Simulation cell lattice vectors.
- Returns:
Row-wise lattice vectors
[a, b, c].- Return type:
ndarray, shape (3, 3)
- property volume
Cell volume.
- Returns:
Volume of the simulation cell.
- Return type:
float
- property abc
Lattice vector lengths (a, b, c).
- Returns:
Lengths of the three lattice vectors in Å.
- Return type:
ndarray, shape (3,), dtype float
- property angles
Lattice angles (alpha, beta, gamma) in degrees.
- Returns:
Angles α, β, γ in degrees.
- Return type:
ndarray, shape (3,), dtype float
- property numbers
Atomic numbers of all atoms in the cell.
- Returns:
List of atomic numbers in the same order as
elements.- Return type:
list[int]
- property spin_num: int
Number of atoms with non-zero magnetic moment.
- Returns:
Count of atoms whose
force_magentry is not [0, 0, 0]. Returns 0 ifforce_magis absent.- Return type:
int
- property formula
Chemical formula string (plain text).
- Returns:
Formula like
H2O,Fe3O4without sub-scripts.- Return type:
str
- property html_formula: str
Chemical formula string with HTML sub-scripts.
- Returns:
Formula like
H<sub>2</sub>Ofor direct HTML rendering.- Return type:
str
- property per_atom_energy
Energy per atom.
- Returns:
Total energy divided by the number of atoms (same units as
energy).- Return type:
float
- property energy
Total energy of the structure.
- Returns:
Value stored in
additional_fields['energy'].- Return type:
float
- property has_energy
Check if energy or stress data is available.
- Returns:
Trueifadditional_fieldscontainsenergy.- Return type:
bool
- property forces
Per-atom force array.
- Returns:
Forces in eV/Å for each atom.
- Return type:
ndarray, shape (N, 3), dtype float64
- property has_forces
Check if forces or stress data is available.
- Returns:
Trueifatomic_propertiescontainsself.force_label.- Return type:
bool
- property bec: ndarray[tuple[Any, ...], dtype[float64]]
Per-atom Born effective charge tensor as (N, 9).
- property has_bec: bool
Return True when BEC data is present.
- property has_virial
Check if virial or stress data is available.
- Returns:
Trueifadditional_fieldscontainsvirialorstress.- Return type:
bool
- property virial
Virial vector (flattened).
If only stress is present, convert via \(\mathrm{virial} = -\mathrm{stress} \times V\).
- Returns:
Flattened virial in eV; ordering:
[xx, xy, xz, yx, yy, yz, zx, zy, zz].- Return type:
ndarray, shape (9,), dtype float
- Raises:
ValueError – If neither virial nor stress is available.
- property nep_virial
Virial in NEP 6-component order per atom.
- Returns:
[xx, yy, zz, xy, yz, zx] components in eV/atom.
- Return type:
ndarray, shape (6,), dtype float
- property nep_dipole
Dipole moment per atom in NEP format.
- Returns:
Dipole vector in e·Å/atom, parsed from
additional_fields['dipole'].- Return type:
ndarray, shape (3,), dtype float64
- property nep_polarizability
Polarizability tensor per atom in NEP 6-component order.
- Returns:
[xx, yy, zz, xy, yz, zx] components in ų/atom.
- Return type:
ndarray, shape (6,), dtype float64
- get_chemical_symbols()
Return chemical symbols for all atoms.
- Returns:
Same as
elements.- Return type:
list[str]
- property elements
Chemical symbols of all atoms.
- Returns:
Symbol for each atom.
- Return type:
ndarray, shape (N,), dtype str
- property positions
Cartesian coordinates of all atoms.
- Returns:
Positions in Å.
- Return type:
ndarray, shape (N, 3), dtype float
- property num_atoms
Number of atoms in the structure.
- Return type:
int
- copy()
Return a deep copy of the structure and all its arrays.
- Returns:
Independent duplicate of the current instance.
- Return type:
- set_lattice(new_lattice, in_place=False)
Scale positions to a new lattice and update pos.
- Parameters:
new_lattice (numpy.ndarray) – New lattice matrix in Angstroms with shape (3, 3).
in_place (bool, default=False) – If True, modify this object; otherwise return a copy.
- Returns:
Updated structure (self if in_place=True).
- Return type:
- adjust_reasonable(coefficient=0.7)
Check whether the structure is physically reasonable based on covalent radii.
For each pair of nearest-neighbour atoms, the actual bond length is compared with
coefficient * (R_cov1 + R_cov2). If any bond is shorter than this threshold, the structure is considered non-physical.- Parameters:
coefficient (float, optional) – Scaling factor for the sum of covalent radii. Default is 0.7.
- Returns:
Trueif the structure passes the check,Falseif any bond is unphysically short.- Return type:
bool
- classmethod parse_xyz(lines)
Parse an extended XYZ block into a Structure instance.
- Parameters:
lines (list[str] or str) – Raw XYZ content. If a single string is provided it is split on newlines internally.
- Returns:
New object with lattice, atomic properties and global metadata extracted from the comment line.
- Return type:
Examples
>>> xyz = '''2 ... Lattice="4.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 4.0" Properties=species:S:1:pos:R:3 ... H 0.0 0.0 0.0 ... H 1.0 0.0 0.0''' >>> struct = Structure.parse_xyz(xyz) >>> struct.num_atoms 2
- classmethod read_multiple(filename)
Read a multi-structure XYZ file and return a list of Structure objects.
- Parameters:
filename (str or os.PathLike) – Path to a multi-frame .xyz file.
- Returns:
List of Structure instances, one per frame.
- Return type:
list[Structure]
Examples
>>> from NepTrainKit.core.structure import Structure >>> structure_list = Structure.read_multiple("train.xyz") >>> len(structure_list) 42
- classmethod read_multiple_fast(filename, max_workers=None, **kwargs)
High-performance multi-frame EXTXYZ reader backed by a C++ parser.
This uses a pybind11 extension (NepTrainKit.core._fastxyz) and Python mmap to index and parse frames in native code, then constructs Structure objects. Falls back to read_multiple on error or if the extension is unavailable.
- Parameters:
filename (str)
max_workers (int | None)
- write(file)
Write the structure as an EXTXYZ frame to a file-like object.
- Parameters:
file (IO) – Open text stream supporting write().
- get_all_distances()
Compute all-pairs distances using periodic minimum image.
- Returns:
Matrix of shape (N, N) with distances.
- Return type:
numpy.ndarray
- get_mini_distance_info()
Minimum interatomic distance for each element pair.
- Returns:
Map from element pair (A, B) with A <= B to the minimal distance across the structure.
- Return type:
dict[tuple[str, str], float]
- get_bond_pairs()
Likely bonded pairs using a covalent-radii heuristic.
- Returns:
Upper-triangular pairs where distance < 1.15 * (r_i + r_j).
- Return type:
list[tuple[int, int]]
- get_bad_bond_pairs(coefficient=0.8)
Pairs that violate a short-bond threshold.
- Parameters:
coefficient (float, default=0.8) – Threshold relative to the sum of covalent radii.
- Returns:
Upper-triangular pairs shorter than the threshold.
- Return type:
list[tuple[int, int]]
- NepTrainKit.core.structure.calculate_pairwise_distances(lattice_params, atom_coords, fractional=True, block_size=2048)
All-pairs distances under periodic minimum-image convention.
This implementation is robust for triclinic (skewed) cells. It first reduces fractional deltas into [-0.5, 0.5) and then checks the 26 neighboring image shifts to ensure the true shortest image vector is selected under the lattice metric.
- Parameters:
lattice_params (numpy.ndarray) – Lattice matrix with shape (3, 3). Row-wise lattice vectors [a, b, c].
atom_coords (numpy.ndarray) – Coordinates with shape (N, 3).
fractional (bool, default=True) – If True,
atom_coordsare fractional; otherwise Cartesian.block_size (int, default=2048) – Row-block size to balance memory and speed for large N.
- Returns:
Distance matrix of shape (N, N).
- Return type:
numpy.ndarray
- NepTrainKit.core.structure.is_organic_cluster(symbols)
Check whether the structure represents an organic molecular cluster.
- Parameters:
symbols (list[str])
- Return type:
bool
- NepTrainKit.core.structure.get_vibration_modes(structure, min_frequency=0.0)
Extract vibrational modes stored in per-atom arrays on an ASE
Atomsobject.- Parameters:
structure (ase.Atoms) – Atomic structure that potentially carries vibrational mode information.
min_frequency (float, optional) – Absolute frequency threshold (in the same units as the stored data) used to filter out near-zero translational modes. Set to 0.0 to keep all provided modes. Defaults to
0.0.
- Returns:
Pair of
(frequencies, modes)wheremodeshas shape(n_modes, n_atoms, 3). Missing frequencies are returned asnan. When no data is attached to the structure the function returns two empty arrays.- Return type:
tuple(ndarray, ndarray)
- NepTrainKit.core.structure.get_clusters(structure)
Connected-atom clusters under ASE natural cutoffs.
- Parameters:
structure (ase.Atoms) – ASE atoms object used for neighbor analysis.
- Returns:
Cluster index lists and a boolean list marking organic clusters.
- Return type:
tuple[list[list[int]], list[bool]]
- NepTrainKit.core.structure.unwrap_molecule(structure, cluster_indices)
Unwrap atoms in a molecular cluster back into the primary simulation cell.
- NepTrainKit.core.structure.process_organic_clusters(structure, new_structure, clusters, is_organic_list)
Recenter and unwrap organic molecular clusters.
- Parameters:
structure (ase.Atoms) – Original ASE atoms with the reference cell.
new_structure (ase.Atoms) – Target ASE atoms whose positions will be updated.
clusters (list[list[int]]) – Atom-index clusters from get_clusters().
is_organic_list (list[bool]) – Flags indicating organic clusters.
- NepTrainKit.core.structure.load_npy_structure(folders, order_file=None, cancel_event=None, base_root=None)
Recursively load DeepMD datasets beneath
folders.- Parameters:
folders (str | Path)
base_root (str | Path | None)
- NepTrainKit.core.structure.get_type_map(structures)
- Parameters:
structures (list[Structure])
- Return type:
list[str]
- NepTrainKit.core.structure.save_npy_structure(folder, structures, type_map=None)
Save structures to a DeepMD-style .npy dataset layout.
- Parameters:
folder (PathLike) – Target root folder. One subfolder per Config_type is created.
structures (list[Structure]) – Structures to persist. Per-atom arrays are saved under set.000 and per-frame values under the config folder.
type_map (list[str])
- class NepTrainKit.core.structure.FastStructure(lattice, atomic_properties, properties, additional_fields)
Bases:
StructureStructure subclass that uses a C++-accelerated parser for EXTXYZ IO.
- Parameters:
lattice (list[float] | npt.NDArray[np.float64])
atomic_properties (dict[str, npt.NDArray[Any]])
properties (list[dict[str, str]])
additional_fields (dict[str, Any])
- classmethod read_multiple(filename, max_workers=None)
Read a multi-structure XYZ file and return a list of Structure objects.
- Parameters:
filename (str or os.PathLike) – Path to a multi-frame .xyz file.
max_workers (int | None)
- Returns:
List of Structure instances, one per frame.
- Return type:
list[Structure]
Examples
>>> from NepTrainKit.core.structure import Structure >>> structure_list = Structure.read_multiple("train.xyz") >>> len(structure_list) 42
- classmethod iter_read_multiple(filename, max_workers=None)
Iterate frames in a multi-structure EXTXYZ file.
- Parameters:
filename (str) – Path to a multi-frame .xyz file.
cancel_event (threading.Event or None, optional) – If provided and is_set(), stop early.
max_workers (int | None)
- Yields:
Structure – Parsed structures one by one.
Examples
>>> from NepTrainKit.core.structure import Structure >>> for structure in Structure.iter_read_multiple(filename="train.xyz"): ... print(structure)
Runtime NEP calculator wrapper handling CPU/GPU backends.
- class NepTrainKit.core.calculator.NepCalculator(model_file='nep.txt', backend=None, batch_size=None, native_stdio='silent')
Bases:
objectInitialise the NEP calculator and load a CPU/GPU backend.
- Parameters:
model_file (str | Path)
backend (NepBackend | None)
batch_size (int | None)
native_stdio (str | Path | Literal['inherit', 'silent'] | None)
- cancel()
- Return type:
None
- load_nep()
- Return type:
None
- compose_structures(structures)
- calculate(structures, return_charge=False, mean_virial=True)
- calculate_dftd3(structures, functional, cutoff, cutoff_cn, mean_virial=True)
- calculate_with_dftd3(structures, functional, cutoff, cutoff_cn, mean_virial=True)
- get_descriptor(structure)
- Parameters:
structure (Structure)
- Return type:
ndarray[tuple[Any, …], dtype[float32]]
- get_structures_descriptor(structures, mean_descriptor=True)
- Parameters:
structures (list[Structure])
mean_descriptor (bool)
- Return type:
ndarray[tuple[Any, …], dtype[float32]]
- get_structures_polarizability(structures)
- Parameters:
structures (list[Structure])
- Return type:
ndarray[tuple[Any, …], dtype[float64]]
- get_structures_dipole(structures)
- Parameters:
structures (list[Structure])
- Return type:
ndarray[tuple[Any, …], dtype[float64]]
- calculate_to_ase(atoms_list, calc_descriptor=False)
- Parameters:
atoms_list (Atoms | Iterable[Atoms])
- NepTrainKit.core.calculator.Nep3Calculator
alias of
NepCalculator
- class NepTrainKit.core.calculator.NepAseCalculator(model_file='nep.txt', backend=None, batch_size=None, *args, **kwargs)
Bases:
Calculator- Parameters:
model_file (str | Path)
backend (NepBackend | None)
batch_size (int | None)
- implemented_properties: list[str] = ['energy', 'energies', 'forces', 'stress', 'descriptor']
Properties calculator can handle (energy, forces, …)
- calculate(atoms=None, properties=['energy'], system_changes=['positions', 'numbers', 'cell', 'pbc', 'initial_charges', 'initial_magmoms'])
Do the calculation.
- properties: list of str
List of what needs to be calculated. Can be any combination of ‘energy’, ‘forces’, ‘stress’, ‘dipole’, ‘charges’, ‘magmom’ and ‘magmoms’.
- system_changes: list of str
List of what has changed since last calculation. Can be any combination of these six: ‘positions’, ‘numbers’, ‘cell’, ‘pbc’, ‘initial_charges’ and ‘initial_magmoms’.
Subclasses need to implement this, but can ignore properties and system_changes if they want. Calculated properties should be inserted into results dictionary like shown in this dummy example:
self.results = {'energy': 0.0, 'forces': np.zeros((len(atoms), 3)), 'stress': np.zeros(6), 'dipole': np.zeros(3), 'charges': np.zeros(len(atoms)), 'magmom': 0.0, 'magmoms': np.zeros(len(atoms))}
The subclass implementation should first call this implementation to set the atoms attribute and create any missing directories.
- class NepTrainKit.core.io.ResultData(*args, **kwargs)
Bases:
QObjectManage structures, descriptors, and plots for NEP result files. Subclasses implement
_load_dataset()and expose their plot datasets throughdatasets. The class also centralises selection and synchronisation utilities shared by the GUI.- Parameters:
nep_txt_path (Path)
data_xyz_path (Path)
descriptor_path (Path)
calculator_factory (Callable[[str], Any] | None)
import_options (dict[str, Any] | None)
- STRUCTURE_SYNC_RULES: dict[str, StructureSyncRule] = {}
- atoms_num_list: ndarray[tuple[Any, ...], dtype[_ScalarT]]
- request_cancel()
Request cooperative cancel during load. Also forward to calculator.
- reset_cancel()
Clear the cancellation flag so future operations proceed.
- load_structures()
Populate
structurefrom disk or a prefetched cache. The method honours_prefetched_structuresfirst; otherwise it delegates to the importer registry and honoursimport_options.
- set_structures(structures)
Provide pre-parsed structures so load_structures can skip file IO.
- Parameters:
structures (list[Structure])
- has_completer_cache(search_type=None, max_items=50000)
Return True if a completer cache exists for
search_typeandmax_items.- Parameters:
search_type (SearchType | str | None)
max_items (int)
- Return type:
bool
- ensure_completer_cache(search_type=None, max_items=50000)
Build and cache completer mappings for the requested search type.
- Parameters:
search_type (SearchType | str | None)
max_items (int)
- Return type:
None
- get_completer_cache(search_type=None, max_items=50000)
Return cached completer mapping for
search_type; builds it if needed.- Parameters:
search_type (SearchType | str | None)
max_items (int)
- Return type:
dict[str, int]
- search_config(config, search_type)
Return structure indices matching the selected search mode.
- Parameters:
config (str)
search_type (SearchType)
- Return type:
list[int]
- sync_structures(fields=None, structure_indices=None)
Apply registered
StructureSyncRuleobjects to datasets.- Parameters:
fields (Iterable[str] or str, optional) – Subset of rule names to apply.
Nonemeans all registered rules.structure_indices (Sequence[int], optional) – Visible structure indices affected by the update.
Noneuses all active structures.
- Return type:
None
- write_prediction()
Create a
nep.instub when large datasets require prediction mode. The GUI expects anep.infile to mark prediction runs for large (>1000) structure collections.
- static cache_outputs_enabled()
Return whether loader-generated cache files should be written.
- Return type:
bool
- load()
Load structures, descriptors, and dataset arrays in sequence. The routine instantiates a calculator (optionally via
calculator_factory), parses structures, and then delegates to subclass hooks for descriptors and dataset-specific properties.
- property datasets: list[NepPlotData]
Return the plot datasets exposed by the subclass.
- property descriptor
Return the descriptor dataset prepared in
_load_descriptors().
- property num
Return the number of active structures in the dataset.
- property structure
Return the
StructureDatawrapper for the active structures.
- property abcs: ndarray[tuple[Any, ...], dtype[float32]]
Return the cached lattice vector lengths (a, b, c) for all structures.
- property angles: ndarray[tuple[Any, ...], dtype[float32]]
Return the cached lattice angles (alpha, beta, gamma) for all structures.
- get_reference_per_atom_energy_array(use_active=False)
Return reference per-atom energies as a flat float64 array.
- Parameters:
use_active (bool)
- Return type:
ndarray[tuple[Any, …], dtype[float64]]
- get_predicted_per_atom_energy_array(use_active=False)
Return predicted per-atom energies as a flat float64 array.
- Parameters:
use_active (bool)
- Return type:
ndarray[tuple[Any, …], dtype[float64]]
- is_select(i)
Return
Trueif the structure index is marked as selected.- Parameters:
i (int)
- Return type:
bool
- select(indices)
Mark structures denoted by
indicesas selected.- Parameters:
indices (Sequence[int] | int)
- Return type:
None
- uncheck(indices)
Remove structures denoted by
indicesfrom the selection set.- Parameters:
indices (Sequence[int] | int)
- Return type:
None
- inverse_select()
Invert the current selection over the active structure set.
- Return type:
None
- select_structures_by_index(index_expression, use_origin=True)
Resolve an index expression into raw structure indices.
- Parameters:
index_expression (str)
use_origin (bool)
- Return type:
list[int]
- select_structures_by_range(dataset, x_min, x_max, y_min, y_max, use_and=True)
Return structure indices whose scatter positions fall in the given bounds.
- Parameters:
dataset (NepPlotData)
x_min (float)
x_max (float)
y_min (float)
y_max (float)
use_and (bool)
- Return type:
list[int]
- select_structures_by_lattice_range(a_range, b_range, c_range, alpha_range, beta_range, gamma_range)
Return structure indices whose lattice parameters fall within the given ranges.
Uses a fixed tolerance of 1e-4 to handle floating-point precision loss from float32 storage of lattice vectors, independent of range size.
- Parameters:
a_range (tuple[float, float])
b_range (tuple[float, float])
c_range (tuple[float, float])
alpha_range (tuple[float, float])
beta_range (tuple[float, float])
gamma_range (tuple[float, float])
- Return type:
list[int]
- get_selected_structures()
Return the selected structures in the order of their raw index.
- Return type:
list[Structure]
- export_selected_xyz(save_file_path)
Write the currently selected structures to
save_file_path.- Parameters:
save_file_path (str | Path)
- Return type:
None
- export_selected_npy(save_path)
Export selected structures as a DeepMD-style
deepmd/npydataset.- Parameters:
save_path (str | Path)
- Return type:
None
- export_active_xyz(save_file_path)
Write active (non-removed) structures to
save_file_path.- Parameters:
save_file_path (str | Path)
- Return type:
None
- export_active_npy(save_path)
Export active (non-removed) structures as a DeepMD-style
deepmd/npydataset.- Parameters:
save_path (str | Path)
- Return type:
None
- export_removed_xyz(save_file_path)
Write removed structures (if any) to
save_file_path.- Parameters:
save_file_path (str | Path)
- Return type:
None
- export_removed_npy(save_path)
Export removed structures as a DeepMD-style
deepmd/npydataset.- Parameters:
save_path (str | Path)
- Return type:
None
- export_current_npy(save_path, index)
Export a single structure as DeepMD-style
deepmd/npydataset.- Parameters:
save_path (str | Path)
index (int)
- Return type:
None
- export_model_extxyz(save_path)
Export active and removed structures into
save_pathfolder as extxyz.- Parameters:
save_path (str | Path)
- Return type:
None
- export_model_npy(save_path)
Export active and removed structures into
save_pathfolder as deepmd/npy.- Parameters:
save_path (str | Path)
- Return type:
None
- export_model_xyz(save_path)
Export active and removed structures into
save_pathfolder.- Parameters:
save_path (str | Path)
- Return type:
None
- get_atoms(index)
Return the ASE atoms object for the original
index.- Parameters:
index (int)
- remove(i)
Remove the structure
iacross all datasets.- Parameters:
i (int)
- Return type:
None
- property is_revoke: bool
Return
Trueif any structures have been removed.
- revoke()
Undo the most recent removal across structures and datasets.
- Return type:
None
- delete_selected()
Remove and clear all currently selected structures.
- iter_non_physical_structure_indices(radius_coefficient)
Yield progress increments while collecting non-physical structures.
- Parameters:
radius_coefficient (float)
- consume_non_physical_structure_indices()
Return and clear indices collected by the non-physical scan.
- Return type:
list[int]
- iter_unbalanced_force_indices(threshold)
Yield progress units while collecting structures with non-zero net force.
- Parameters:
threshold (float) – Minimum allowed magnitude of the summed force vector ΣF. Structures whose net force exceeds this value are recorded for later selection.
- consume_unbalanced_force_indices()
Return and clear indices collected by the net-force scan.
- Return type:
list[int]
- discover_atomic_numeric_fields(scope=DistributionScope.ACTIVE)
Discover available numeric fields for distribution analysis.
- Parameters:
scope (str | DistributionScope, default="active") – Structure subset to inspect.
selectedinspects only selected active structures.- Return type:
list[FieldSpec]
- iter_distribution_analysis(request=None)
Build distribution statistics for selected numeric fields.
Notes
Results are cached by
(structure.version, request, force_mode)and stored for retrieval viaget_distribution_analysis().- Parameters:
request (DistributionRequest | Mapping[str, Any] | None)
- get_distribution_analysis()
Return the last computed distribution-analysis payload.
- Return type:
dict[str, Any]
- resolve_distribution_bin_indices(analysis_id, metric_key, series_key, bin_index)
Resolve structure indices represented by a histogram bin.
- Parameters:
analysis_id (int)
metric_key (str)
series_key (str)
bin_index (int)
- Return type:
list[int]
- iter_dataset_summary(group_by=SearchType.TAG)
Aggregate dataset-wide statistics for use in summary dialogs.
Notes
This generator yields a progress unit after each structure so that callers can drive a progress dialog. Results are cached on the instance and later returned by
get_dataset_summary().- Parameters:
group_by (SearchType, default=SearchType.TAG) – Attribute used for grouping the distribution table.
TAGusesStructure.tag(Config_type), whileFORMULAusesStructure.formula.
- get_dataset_summary()
Return the most recently computed dataset summary.
- Return type:
dict[str, Any]
- sparse_descriptor_selection(n_samples, distance, restrict_to_selection=False)
Return FPS-selected structure indices and whether they should be deselected.
- Parameters:
n_samples (int)
distance (float)
restrict_to_selection (bool)
- Return type:
tuple[list[int], bool]
- sparse_point_selection(n_samples, distance, descriptor_source='reduced', restrict_to_selection=False, training_path=None, sampling_mode='count', r2_threshold=0.9)
Delegate sparse sampling to the sampler helper.
- Parameters:
n_samples (int)
distance (float)
descriptor_source (str)
restrict_to_selection (bool)
training_path (str | None)
sampling_mode (str)
r2_threshold (float)
- Return type:
tuple[list[int], bool]
- export_descriptor_data(path)
Write descriptor values for the current selection to
path.- Parameters:
path (str | Path)
- Return type:
None
- get_editable_structure_tags()
Return the editable tags for currently selected structures.
- Return type:
set[str]
- update_structure_metadata(remove_tags, new_tag_info, rename_map=None)
Apply metadata removals, additions, and key renames to the selected structures.
- Parameters:
remove_tags (Iterable[str])
new_tag_info (Mapping[str, str])
rename_map (Mapping[str, str] | None)
- Return type:
None
- iter_shift_energy_baseline(group_patterns, alignment_mode, max_generations, population_size, convergence_tol, reference_indices=None, precomputed_baseline=None, baseline_store=None, source_summary=None)
Shift dataset energies and yield progress units for UI hooks.
- Parameters:
group_patterns (Sequence[str])
alignment_mode (str)
max_generations (int)
population_size (int)
convergence_tol (float)
reference_indices (Sequence[int] | None)
baseline_store (dict | None)
source_summary (dict | None)
- apply_dft_d3_correction(mode, functional, cutoff, cutoff_cn)
Apply DFT-D3 corrections and synchronise dependent datasets.
- Parameters:
mode (int)
functional (str)
cutoff (float)
cutoff_cn (float)
- Return type:
None
- class NepTrainKit.core.io.StructureSyncRule(dataset_attr, target, collector, precondition=<function StructureSyncRule.<lambda>>, dtype=None)
Bases:
objectDeclarative instruction that synchronises structure attributes into datasets.
- Parameters:
dataset_attr (str)
target (str | slice | Callable[[Any], Any])
collector (Callable[[ResultData, Any, ndarray | None], tuple[ndarray, ndarray[tuple[Any, ...], dtype[Any]]]])
precondition (Callable[[ResultData], bool])
dtype (Any)
- dataset_attr: str
- target: str | slice | Callable[[Any], Any]
- collector: Callable[[ResultData, Any, ndarray | None], tuple[ndarray, ndarray[tuple[Any, ...], dtype[Any]]]]
- precondition()
- dtype: Any = None
- apply(result_data, structure_indices=None)
Execute the rule on
result_dataif the precondition passes.- Parameters:
result_data (ResultData)
structure_indices (ndarray | None)
- Return type:
None
- class NepTrainKit.core.io.NepPlotData(data_list, **kwargs)
Bases:
NepDataTwo-column plot helper that separates NEP predictions from references.
- Parameters:
data_list (Sequence[Any] | ndarray[tuple[Any, ...], dtype[Any]])
kwargs (Any)
- property x: ndarray[tuple[Any, ...], dtype[Any]]
Flattened NEP predictions suitable for scatter plots.
- property y: ndarray[tuple[Any, ...], dtype[Any]]
Flattened reference values.
- property structure_index: ndarray[tuple[Any, ...], dtype[int32]]
Map each flattened point back to its parent structure index.
- class NepTrainKit.core.io.StructureData(data_list, group_list=1, index_list=None, **kwargs)
Bases:
NepDataUtility mixin for structure-level queries.
- Parameters:
data_list (Sequence[Any] | ndarray[tuple[Any, ...], dtype[Any]])
group_list (int | Sequence[int])
index_list (Sequence[int] | ndarray[tuple[Any, ...], dtype[Any]] | None)
kwargs (Any)
- has_completer_cache(search_type=None, max_items=50000)
Return True if a completer cache exists for
search_typeandmax_items.- Parameters:
search_type (SearchType | str | None)
max_items (int)
- Return type:
bool
- ensure_completer_cache(max_items=50000)
Build and cache completer mappings for tag/formula/elements.
Notes
Designed to run in a background thread (e.g. dataset load thread).
Results are stored as dict[SearchType, dict[str,int]] and can be fed directly into ConfigTypeSearchLineEdit.setCompleterKeyWord(…).
- Parameters:
max_items (int)
- Return type:
None
- get_completer_cache(search_type=None, max_items=50000)
Return cached completer mapping for
search_type; builds it if needed.- Parameters:
search_type (SearchType | str | None)
max_items (int)
- Return type:
dict[str, int]
- get_all_config(search_type=None)
Return structure metadata used for filtering.
- Parameters:
search_type (SearchType, optional) – Metadata selector. Defaults to
SearchType.TAG.- Returns:
Value per active structure matching
search_type.- Return type:
list[str]
- search_config(config, search_type)
Return structure indices whose metadata match
config.- Parameters:
config (str) – Regular expression used for matching.
search_type (SearchType) – Attribute family to inspect.
- Returns:
Structure indices satisfying the pattern; empty on failure.
- Return type:
list[int]
- class NepTrainKit.core.io.NepTrainResultData(*args, **kwargs)
Bases:
ResultDataResult loader for NEP training outputs with energy, force, stress, and virial datasets.
The loader normalises NEP predictions into plot-ready datasets and registers synchronisation rules used by the UI.
Examples
>>> from NepTrainKit.core.io import NepTrainResultData # Load the xyz file >>> result_dataset = NepTrainResultData.from_path(r"D:/Desktop/dataset3635-addD3/train.xyz") >>> result_dataset.load() >>> print(result_dataset) # Select structures at indices 0 and 10 >>> result_dataset.select([0, 10]) >>> print(result_dataset) # Delete the selected structures >>> result_dataset.delete_selected() >>> print(result_dataset) # Get the indices of the 10 points with the largest energy error >>> index = result_dataset.energy.get_max_error_index(10) # Select the 10 points with the largest energy error and delete them >>> result_dataset.select(index) >>> result_dataset.delete_selected() >>> print(result_dataset) # Revoke the last deletion >>> result_dataset.revoke() # Perform farthest point sampling (normal global sampling) >>> index, reverse = result_dataset.sparse_descriptor_selection(100, 0.001, False) # Perform sampling within a region (select the first 300 structures) >>> index = result_dataset.select_structures_by_index(":300") >>> result_dataset.select(index) >>> index, reverse = result_dataset.sparse_descriptor_selection(100, 0.001, True) # Uncheck or inverse select based on the reverse flag >>> if reverse: >>> result_dataset.uncheck(index) >>> else: >>> result_dataset.select(index) >>> result_dataset.inverse_select() >>> print(result_dataset)
- Parameters:
nep_txt_path (Path | str)
data_xyz_path (Path | str)
energy_out_path (Path | str)
force_out_path (Path | str)
stress_out_path (Path | str)
virial_out_path (Path | str)
descriptor_path (Path | str)
charge_out_path (Path | str | None)
bec_out_path (Path | str | None)
charge_model (bool | None)
spin_force_out_path (Path | str | None)
- STRUCTURE_SYNC_RULES = {'energy': StructureSyncRule(dataset_attr='energy', target='x_cols', collector=<function collect_energy_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'force': StructureSyncRule(dataset_attr='force', target='x_cols', collector=<function collect_force_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'stress': StructureSyncRule(dataset_attr='stress', target='x_cols', collector=<function collect_stress_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'virial': StructureSyncRule(dataset_attr='virial', target='x_cols', collector=<function collect_virial_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None)}
- property datasets
Return datasets exposed to the UI in display order.
- property energy
Return the per-structure energy dataset.
- property force
Return the force dataset respecting per-atom settings.
- property stress
Return the stress dataset derived from predicted virials.
- property virial
Return the per-structure virial dataset.
- property bec
Return the per-atom Born effective charge dataset when available.
- property spin_force
Return the magnetic force dataset when available.
- classmethod from_path(path, model_type=0, *, structures=None, nep_txt_path=None)
Create an instance from a NEP result directory.
- Parameters:
path (PathLike) – Directory containing NEP outputs and descriptors.
model_type (int, optional) – NEP model type hint used to select descriptor fallbacks.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.
nep_txt_path (Path | str | None)
- Returns:
Configured loader bound to the resolved directory.
- Return type:
- class NepTrainKit.core.io.NepPolarizabilityResultData(*args, **kwargs)
Bases:
ResultDataResult loader for NEP polarizability evaluations.
- Parameters:
nep_txt_path (Path | str)
data_xyz_path (Path | str)
polarizability_out_path (Path | str)
descriptor_path (Path | str)
- property datasets
Return the polarizability datasets in display order.
- property polarizability_diagonal
Return the diagonal polarizability dataset.
- property polarizability_no_diagonal
Return the off-diagonal polarizability dataset.
- property descriptor
Return the descriptor dataset associated with the polarizability run.
- classmethod from_path(path, *, structures=None)
Create a polarizability loader from a NEP dataset directory.
- Parameters:
path (PathLike) – Directory containing NEP outputs.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.
- Returns:
Configured loader bound to the resolved directory.
- Return type:
- class NepTrainKit.core.io.NepDipoleResultData(*args, **kwargs)
Bases:
ResultDataResult loader for NEP dipole predictions.
- Parameters:
nep_txt_path (Path | str)
data_xyz_path (Path | str)
dipole_out_path (Path | str)
descriptor_path (Path | str)
- property datasets
Return the dipole datasets in display order.
- property dipole
Return the dipole dataset.
- property descriptor
Return the descriptor dataset associated with the dipole run.
- classmethod from_path(path, *, structures=None)
Create a dipole loader from a NEP dataset directory.
- Parameters:
path (PathLike) – Directory containing NEP outputs.
structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.
- Returns:
Configured loader bound to the resolved directory.
- Return type:
- class NepTrainKit.core.io.TaceResultData(*args, **kwargs)
Bases:
ResultDataResult loader for TACE EXTXYZ prediction files.
- Parameters:
nep_txt_path (Path | str)
data_xyz_path (Path | str)
descriptor_path (Path | str)
import_options (dict[str, Any] | None)
- property datasets: list[NepPlotData]
Return the plot datasets exposed by the subclass.
- property energy: NepPlotData
- property force: NepPlotData
- property virial: NepPlotData
- property mforce: NepPlotData
- class NepTrainKit.core.io.DeepmdResultData(*args, **kwargs)
Bases:
ResultDataResult loader that adapts DeepMD outputs to the ResultData interface.
The loader reads DeepMD numpy exports, normalises them, and exposes consistent NEP plot datasets for downstream visualisation.
- Parameters:
nep_txt_path (Path | str)
data_xyz_path (Path | str)
energy_out_path (Path | str)
force_out_path (Path | str)
virial_out_path (Path | str)
descriptor_path (Path | str)
spin_out_path (Path | str | None)
- STRUCTURE_SYNC_RULES = {'energy': StructureSyncRule(dataset_attr='energy', target='x_cols', collector=<function collect_energy_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'force': StructureSyncRule(dataset_attr='force', target='x_cols', collector=<function collect_force_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'stress': StructureSyncRule(dataset_attr='stress', target='x_cols', collector=<function collect_stress_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None), 'virial': StructureSyncRule(dataset_attr='virial', target='x_cols', collector=<function collect_virial_sync>, precondition=<function StructureSyncRule.<lambda>>, dtype=None)}
- classmethod from_path(path, *, structures=None, nep_txt_path=None, model_type=None)
Create an instance from a DeepMD dataset directory.
- Parameters:
path (Path or str) – Directory that contains DeepMD
set.*data and outputs.structures (list[Structure], optional) – Pre-loaded structures to attach instead of reading from disk.
nep_txt_path (Path or str, optional) – Override the NEP model text file used for (re)calculation.
model_type (int, optional) – Ignored. Accepted for API compatibility with NEP loaders.
- Returns:
Configured loader pointing at the resolved dataset.
- Return type:
- load_structures()
Load structures from DeepMD numpy sets into the local dataset.
Notes
Recognises DeepMD
set.*partitions and aggregates per-set arrays.Respects the optional
cancel_eventfor graceful cancellation.
Examples
>>> # Constructed via DeepmdResultData.from_path(...)
- property datasets
Return the datasets exposed to the UI in canonical order.
- property energy
Return the per-atom energy dataset.
- property force
Return the per-atom force dataset.
- property spin
Return the per-atom spin dataset.
- property virial
Return the per-atom virial dataset.
- export_model_xyz(save_path)
Export current and removed structures into dedicated directories.
- Parameters:
save_path (Path or str) – Destination directory that will receive
export_good_modelandexport_remove_modelfolders.- Return type:
None
- NepTrainKit.core.io.is_deepmd_path(folder)
Return
Truewhenfolderlooks like a DeepMD dataset directory.- Parameters:
folder (str | Path)
- Return type:
bool
- NepTrainKit.core.io.load_result_data(path)
Load result data for path via the first matching loader.
- Parameters:
path (PathLike) – File or directory to be loaded.
- Returns:
The loaded dataset if any loader recognises path, else
None.- Return type:
ResultData or None
Examples
>>> dataset = load_result_data("./train.xyz") >>> dataset.load() >>> print(dataset)
- NepTrainKit.core.io.register_result_loader(loader)
Register a loader so that it participates in result discovery.
- Parameters:
loader (ResultLoader) – An instance of a concrete ResultLoader subclass.
- Returns:
The same loader instance, allowing use as a decorator.
- Return type:
ResultLoader
Examples
>>> from NepTrainKit.core.io.registry import DeepmdFolderLoader, NepModelTypeLoader >>> register_result_loader(DeepmdFolderLoader()) >>> register_result_loader( ... NepModelTypeLoader("nep_train", {0, 3}, ... 'NepTrainKit.core.io:NepTrainResultData') ... ) >>> register_result_loader( ... NepModelTypeLoader("nep_dipole", {1}, ... 'NepTrainKit.core.io:NepDipoleResultData') ... ) >>> register_result_loader( ... NepModelTypeLoader("nep_polar", {2}, ... 'NepTrainKit.core.io:NepPolarizabilityResultData') ... ) >>> register_result_loader(OtherLoader())
- NepTrainKit.core.io.matches_result_loader(path)
Return
Trueif any registered loader recognisespath.- Parameters:
path (str or os.PathLike) – File or directory to be examined.
- Returns:
Trueif path is recognised by at least one loader.- Return type:
bool
Examples
>>> matches_result_loader("./train.xyz") True
- NepTrainKit.core.io.farthest_point_sampling(points, n_samples, min_dist=0.1, selected_data=None)
Greedy FPS with optional warm-start and minimum-distance constraint.
- Parameters:
points (numpy.ndarray) – Input point set of shape
(N, D).n_samples (int) – Maximum number of samples to select.
min_dist (float, default=0.1) – Minimum allowed distance to any already selected point.
selected_data (numpy.ndarray or None, optional) – Warm-start set with shape
(M, D). If provided, selection respects the minimum distance from this set.
- Returns:
Indices of selected points.
- Return type:
list[int]
Examples
>>> import numpy as np >>> P = np.random.rand(100, 3).astype(np.float32) >>> idx = farthest_point_sampling(P, 5, min_dist=0.0) >>> len(idx) <= 5 True