Random Occupancy

Group: Alloy | Class: RandomOccupancyCard

What the card does

Realize a target composition as discrete atomic occupations. The card reads a Comp(...) label or a manual composition, assigns species to lattice sites in Exact or Random mode, and outputs actual chemical configurations labeled Occ(...).

典型用法：接在 Composition Space Sampling 之后，把目标配比计划转化为可以跑 DFT/NEP 的实际结构。

Example workflow

Scenario: Error jumps from 5 to 50 meV/atom between arrangements at one composition

你在 CoCrNi 训练集上跑了 Composition Space Sampling，覆盖了从纯元素到等摩尔的各种配比。但每个配比只生成了一个占位结构——对 Co0.33Cr0.33Ni0.33，训练集里 Cr 永远在角落、Co 永远在面心。模型学到的不是”这个成分”，而是”这个成分 + 这个特定排布”。拿到另一个同样成分但 Cr/Co 位置互换的结构，能量预测偏差从 5 meV/atom 跳到了 50 meV/atom。

Diagnosis: Short-range chemical order at fixed composition strongly affects energy and forces. Generate several occupation variants for each target fraction so the model does not conflate composition with one arrangement.

输入： 一批带 Comp(Co=0.3333,Cr=0.3333,Ni=0.3333) 标签的结构（来自上游 Composition Space Sampling）

Objective: Generate five distinct occupations per target composition.

Parameters:

source = Auto (Comp tag)
mode = Exact to keep counts fixed while changing arrangements.
samples = 5

Output: Five Occ(E) structures per input, with species counts matching Comp and distinct arrangements.

How to verify that training-set quality improved:

After labeling and retraining, predictions across arrangements at one composition should reflect physical differences without anomalous divergence.
Check several outputs: Exact counts should match the target integer allocation, while Random counts should approach it statistically.
Reduce samples when validated arrangement energies are nearly insensitive; increase it when short-range order matters strongly.
If no upstream Comp(...) label exists, switch to Manual and provide the composition explicitly.

When to add this card

Add it when:

上游有 Composition Space Sampling 或手工定义了目标配比，需要落到具体原子占位
Several arrangements at one composition are needed to cover short-range chemical order
A high-entropy alloy or solid solution requires joint composition-arrangement sampling

Do not add it when:

Atomic occupations should not change
A direct substitution rule is known; use Random Doping
The input already has realized occupations and no additional arrangement diversity is needed

Parameters

Source（source）

str，默认 Auto (Comp tag)。Auto 从输入结构的 Config_type 中读取 Comp(...) 标签作为目标配比，适合接在 Composition Space Sampling 之后。Manual 从下面的 manual 字段读取手工配比字符串。

Manual（manual）

str, empty by default; Manual only. Composition such as Co:0.333,Cr:0.333,Ni:0.334; fractions are normalized. Ge means Ge:1.0, while Ge,C gives both unit weight.

Mode（mode）

str, default Exact. Exact converts fractions to integer counts by floor-and-remainder allocation, giving fixed composition for comparisons. Random samples species probabilities and matches only statistically, which suits exploration.

Samples（samples）

int, default 1. Occupation variants per input target, commonly 1-20. Total output equals upstream target count times this value; estimate before running.

Use Seed（use_seed）

bool, default false. Fix the random path. The card combines seed with the stable configuration ID to derive per-sample seeds.

Seed（seed）

int, default 0. Different values generate different occupation distributions.

Active only when use_seed=True.

Recommended presets

Single realization per composition for a quick path check

{
  "class": "RandomOccupancyCard",
  "check_state": true,
  "source": "Auto (Comp tag)",
  "manual": "",
  "mode": "Exact",
  "samples": [1],
  "group_filter": "",
  "use_seed": false,
  "seed": [0]
}

Diverse occupations (five per composition for routine training)

{
  "class": "RandomOccupancyCard",
  "check_state": true,
  "source": "Auto (Comp tag)",
  "manual": "",
  "mode": "Exact",
  "samples": [5],
  "group_filter": "",
  "use_seed": true,
  "seed": [42]
}

High-diversity sublattice sampling (20 per composition, group A only)

{
  "class": "RandomOccupancyCard",
  "check_state": true,
  "source": "Auto (Comp tag)",
  "manual": "",
  "mode": "Random",
  "samples": [20],
  "group_filter": "A",
  "use_seed": true,
  "seed": [42]
}

Recommended combinations

Composition Space Sampling → Random Occupancy：标准合金 pipeline，配比 → 落位。
Group Label -> Random Occupancy: label a sublattice before restricting occupation.
Random Occupancy -> Atomic Perturb: add coordinate diversity after occupation; use relaxation for physical structures.

Common questions

提示缺少目标成分。 上游没有 Comp(...) 标签且 manual 为空时，卡片会停止并报错。检查 source 设置，或切换到 Manual 模式并填写成分；它不会再把原结构当成成功输出。

Species counts differ from the target. Random mode fluctuates statistically; use Exact for deterministic integer counts.

Output is much larger than expected. Count is input frames times samples; 500 targets with five samples produce 2,500 structures.

group_filter 报错。 检查输入结构是否有 atoms.arrays['group'] 且标签拼写完全匹配。缺少数组和零命中都会报错，避免本来只想改 A 组却意外改完整个结构。

Output labels

Occ(E), Occ(R), Occ(E,s=...), or Occ(R,s=...); E means Exact and R Random. Seeded output includes the seed for traceability.

Reproducibility

Enable use_seed and fix seed to reproduce occupations. Sample seeds derive from the stable configuration ID and base seed. Input order changes can affect output.