Performance: Dependence of Computational Time on wk_dim¶
Overview¶
One distinctive feature of QS³‑ED2 is the ability to cache Hamiltonian matrix elements in memory. This functionality was not present in the original QS³‑ED implementation. By storing matrix elements within the available memory, repeated evaluations of the Hamiltonian action can be avoided, which can significantly accelerate the Lanczos iterations.
In conventional exact diagonalization implementations, the action of the Hamiltonian on a basis state
is recomputed every time it is required. In QS³‑ED2, however, matrix elements can optionally be cached and reused, reducing the computational overhead of repeated Hamiltonian applications.
To illustrate the effect of this design, we investigate how the computation time depends on the
parameter wk_dim, which controls how many Hamiltonian matrix elements are stored in memory.
The benchmark is performed using the calculation conditions of Example 9 (cubic_sp_HB), except
that the calculation of local magnetization and correlation functions is disabled.
Benchmark model¶
The cubic_sp_HB example corresponds to an antiferromagnetic Heisenberg model on a cubic lattice
of size
with periodic boundary conditions.
The Hamiltonian is
The calculation is performed in a symmetry sector defined by the following quantum numbers.
Magnetization sector¶
The total magnetization is fixed to
where
is the saturation magnetization of a spin‑1/2 system with \(N\) sites, and
denotes the number of down spins.
In this benchmark
The dimension of the Hilbert space in this magnetization sector is therefore
Momentum sector¶
The calculation is performed at the Γ point
Reflection symmetries¶
The state belongs to the even‑parity sector with respect to reflections about the
- XY plane
- YZ plane
- ZX plane
which corresponds to the input condition
After applying all translational and reflection symmetries, the working Hilbert space becomes
corresponding to a reduction of approximately
compared with the original Hilbert space.
Optimal value of MNTE¶
Before tuning wk_dim, an appropriate value of MNTE must be specified.
MNTE represents the maximum number of Hamiltonian matrix elements generated when the Hamiltonian
acts on a single representative state.
For the present model it is straightforward to determine that
is optimal.
This can be understood by considering a configuration in which the
down spins are well separated.
When the Hamiltonian acts once,
- each spin can hop to its nearest neighbors
- the cubic lattice has coordination number
Therefore the number of off‑diagonal transitions is
In addition, the Hamiltonian contains the diagonal interaction term
Thus the total number of generated matrix elements becomes
Computational cost with and without caching¶
Let
denote the cost of applying the Hamiltonian once to a representative state.
The computational cost of evaluating
depends on whether the matrix elements are cached.
With matrix‑element caching¶
Without caching¶
The additional cost arises because, after generating a new configuration, the program must determine which representative state the configuration belongs to under the symmetry group.
This requires an additional search over the symmetry operations, producing the factor
Therefore, when sufficient memory is available, caching matrix elements is strongly recommended.
Cost of each computational stage¶
The computational complexity of the main stages of the calculation can be roughly estimated as follows.
Selection of representative states¶
Matrix‑element storage¶
Lanczos iterations¶
where
is the number of Lanczos iterations.
Since
and
the dominant computational cost becomes
Case 1: wk_dim = 0¶
Case 2: wk_dim = D¶
Expected speedup¶
The ratio between these two limits is approximately
For typical QS³ calculations the first term dominates, leading to a theoretical speedup of roughly
which corresponds to up to about two orders of magnitude.
Benchmark result¶
The measured dependence of the runtime on
is shown in the figure below.
The computation time decreases approximately linearly as a function of
For the present benchmark
- the runtime is reduced by approximately 17×
- the required memory increases by approximately 8×
This clearly demonstrates the effectiveness of the matrix‑element caching mechanism.
Practical recommendation¶
When sufficient memory is available, increasing wk_dim is an effective strategy for accelerating the
Lanczos iterations.
However, caution is required for models without U(1) symmetry, since
- the value of
MNTEtypically becomes larger - the memory consumption grows significantly
and the caching strategy may become memory‑limited.