Skip to content

Add support for far-field approximation of the 3D Green's function#3223

Open
oskooi wants to merge 3 commits into
NanoComp:masterfrom
oskooi:n2f_fft
Open

Add support for far-field approximation of the 3D Green's function#3223
oskooi wants to merge 3 commits into
NanoComp:masterfrom
oskooi:n2f_fft

Conversation

@oskooi

@oskooi oskooi commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Closes #2269, #2463.

What the exact green3d computes

The existing function green3d (near2far.cpp:190-214) evaluates the full free-space Green's function for a point-dipole source at x0 observed at x. It computes three distance-dependent terms:

  • term 1: 1 - 1/(ikr) + 1/(ikr)² — radiating (1/r), intermediate (1/r²), and near-field (1/r³) contributions
  • term 2: (-1 + 3/(ikr) - 3/(ikr)²) · (p·r̂) — radial component with all three ranges
  • term 3: 1 - 1/(ikr) — curl term with radiating and intermediate contributions

Each cell requires computing r = |x - x0|, r̂ = (x-x0)/r, exp(ikr), and the complex divisions for these terms — all of which vary per source-observer pair.

What the new green3d_farfield does differently

The far-field approximation (near2far.cpp:136-182) exploits two simplifications valid when R = |x| >> |x0| and R >> λ:

1. Geometric approximation

  • r ≈ R - x̂·x0 (first-order Taylor expansion of distance)
  • r̂ ≈ x̂ (all source points subtend negligible angle)
  • 1/r ≈ 1/R (amplitude is constant across source points)
  • The only per-source-point variation is in the phase: exp(ikr) ≈ exp(ikR) · exp(ik x̂·x0)

2. Dropping near-field terms: With 1/(ikr) ➔ 0, the terms simplify to term ➔ 1, term2 ➔ -p·x̂, term3 ➔ 1. The resulting fields are purely transverse:

  • E is the transverse projection of the source dipole: t = p - (p·x̂)x̂
  • H is x̂ ⨯ p / Z (or vice versa for magnetic sources)

The function signature reflects these precomputations — instead of taking the far-field point x, it takes xhat (unit vector), k, impedance, and expfac_base = k·n/(4πR) · exp(i(kR + π/2)), all computed once per far-field point and reused across every source point on the near-field surface. Per source point, the work reduces to:

  1. One dot product x̂·x0 (3 multiplies)
  2. One std::polarcall for the phase exp(ik·(...))
  3. One complex multiply to get expfac
  4. Simple arithmetic for the transverse projection and cross product

Speedup: two levels

Single-point speedup (modest, ~2-3x): Whe calling get_farfield for individual points, the code in farfield_lowlevel (line 530-532) still loops over every source point but calls green3d_farfield instead of green3d. The savings come from avoiding per-pair norm/division/complex-division computations. This gives a roughly 2-3x speedup per point.

Grid speedup via FFT (massive, O(N_src / log N_src)): The big win is in get_farfields_array (lines 588-769). When far_field_approx=True in 3D without periodic replicas, the code activates an FFT-accelerated path:

  1. Preprocessing (once per near-field face): The DFT-accumulated source currents on each planar face are zero-padded (4x oversampling) and transformed via 2D FFT. Cost: O(N_src · log N_src) total across all faces.
  2. Per output point: Instead of summing over all N_src source points, the code looks up each face's FFT at the spatial frequency f = k·x̂·spacing·N/(2π) via bilinear interpolation (fft2d_interp), then applies the geometric phase correction and far-field formula. Cost: O(N_faces) per ouput point.

Total complexity comparison for a grid of N_out far-field points:

Method Cost
Exact (green3d) O(N_out ⨯ N_src)
Far-field + FFT O(N_src · log N_src + N_out ⨯ N_faces)

For a typical 3D problem (resolution 20, near-field box of side 4 ➔ ~6,400 points per face ⨯ 6 faces ≈ 38,400 source points, computing the far fields on a 100⨯100 grid):

  • Exact: 10,000 ⨯ 38,400 ≈ 384 million Green's function evaluations
  • FFT: 6 FFTs of size ~256⨯256 + 10,000 ⨯ 6 interpolations ≈ ~2.4 million operations

That's a ~160x speedup for this example, and it scales proportionally with N_src — larger near-field surfaces or higher resolution yield even greater speedups (easily 1000x+ for production-size problems).

Note: the <algorithm> header added in near2far.cpp is needed for two things used by the FFT-accelerated far-field path:

  1. std::swap at line 423 in fft1d_inplace — the bit-reversal permutation step.
  2. std::max at lines 672-673 — computing the zero-padded FFT sizes: next_pow2(std::max(oversample * cf.n1, 2))

There's also a pre-existing use of std::max at line 790 in the progress reporting, but that code predates this feature. The FFT code alone requires the include.

Comment thread src/near2far.cpp
std::complex<double> v = data[i + j + len / 2] * w;
data[i + j] = u + v;
data[i + j + len / 2] = u - v;
w *= wn;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unstable trigonometric recurrence (O(n) roundoff error).

Comment thread src/near2far.cpp
while (p < n)
p *= 2;
return p;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cute way to get the next power of 2 using some bit-twiddling, but this is fine. We don't care about performance here.

Comment thread src/near2far.cpp Outdated
Comment thread src/near2far.cpp
Comment thread src/near2far.cpp Outdated
Comment thread src/near2far.cpp
Comment thread src/near2far.cpp Outdated
Comment thread src/near2far.cpp Outdated
Comment on lines +676 to +679
for (size_t idx_dft = 0; idx_dft < (size_t)f->N; idx_dft++) {
int i1 = idx_dft / cf.n2;
int i2 = idx_dft % cf.n2;
arr[i1 * cf.nf2 + i2] = f->dft[Nfreq * idx_dft + fi];

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems better to loop over i1 and i2 and compute idx_dft from these than vice versa?

Comment thread src/near2far.cpp
EH1[3] += hf * tx;
EH1[4] += hf * ty;
EH1[5] += hf * tz;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems like almost a copy-paste of green_farfield?

Comment thread src/near2far.cpp
for (int k = 0; k < 6; ++k) {
EH_[((k * 2 + 0) * N + idx) * Nfreq + i] = real(EH1[i * 6 + k]);
EH_[((k * 2 + 1) * N + idx) * Nfreq + i] = imag(EH1[i * 6 + k]);
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole block seems to be a line-for-line copy-paste of the code on line 808 … needs refactoring.

@oskooi oskooi Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the right-hand side is different in the two blocks of code.

Comment thread src/near2far.cpp
return p;
}

static void fft1d_inplace(std::complex<double> *data, int n, int sign) {

@stevengj stevengj Jun 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we already link FFTW if it is present — might as well just call it rather than rewriting an FFT?

@oskooi

oskooi commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Empirically confirmed (via C++ instrumentation) that the FFT-accelerated path in get_farfields_array was never executing. The normal-detection heuristic assumed the surface normal had nd ≤ 1 grid points, but near2far DFT chunks are 2 grid points thick along their normal (the tangential field component is offset half a pixel on the Yee grid). So use_fft was always disabled, and every get_farfields(..., far_field_approx=True) call silently fell back to direct summation — which is why the old 1e-12 test passed trivially (both sides ran identical code).

The fix (src/near2far.cpp)

  1. Robust normal detection — pick the thinnest Cartesian direction as the normal; the other two are transverse.
  2. Per-slice handling — each normal layer is treated as a separate planar current sheet (its own face_pos), so the decomposition is exact for any rectangular chunk, including the edge slivers (78×2×2) that arise when a component (e.g. Hy) is tangential to two perpendicular faces. The scatter now uses actual grid indices (IVEC_LOOP_ILOC), independent of iteration order.
  3. Bicubic interpolation — replaced bilinear fft2d_interp with Catmull-Rom bicubic. Measured error scaling at the test geometry:
oversample bilinear bicubic
15% 2.1%
3.6% 0.23%
16× 1.0% 0.015%
  1. Settled on oversample=8 + bicubic (~0.1-0.2% error, comparable to the radiation-zone approximation's own O(L/R) ≈ 0.4% error) as the accuracy/memory sweet spot.

Measured speedup (FFT vs exact green3d)

FFT time is nearly constant (~2.5 s, dominated by the O(N_src log N_src) precompute) while direct scales linearly with output points:

grid speedup accuracy (FFT vs exact)
20×20 6.4× 0.11%
40×40 25× 0.11%
80×80 101× 0.11%

A notable finding: the per-point green3d_farfield simplification alone gives ~no speedup (direct-approx ≈ exact in wall-clock); essentially all the gain comes from the FFT restructuring.

Tests (python/tests/test_farfield_approx.py)

  • Updated test_farfield_approx_get_farfields_array tolerance from the trivially passing 1e-12 to 5e-3 (it now genuinely compares the FFT path vs direct summation).
  • Added test_farfield_approx_fft_grid_vs_exact — the missing coverage: a finite far-field patch spanning many fractional FFT bins, validated against both the exact near-to-far transform and per-point direct summation at 1e-2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FFT acceleration of near2far

2 participants