Add support for far-field approximation of the 3D Green's function by oskooi · Pull Request #3223 · NanoComp/meep

oskooi · 2026-06-01T03:16:55Z

What the exact green3d computes

The existing function green3d (near2far.cpp:190-214) evaluates the full free-space Green's function for a point-dipole source at x0 observed at x. It computes three distance-dependent terms:

term 1: 1 - 1/(ikr) + 1/(ikr)² — radiating (1/r), intermediate (1/r²), and near-field (1/r³) contributions
term 2: (-1 + 3/(ikr) - 3/(ikr)²) · (p·r̂) — radial component with all three ranges
term 3: 1 - 1/(ikr) — curl term with radiating and intermediate contributions

Each cell requires computing r = |x - x0|, r̂ = (x-x0)/r, exp(ikr), and the complex divisions for these terms — all of which vary per source-observer pair.

What the new green3d_farfield does differently

The far-field approximation (near2far.cpp:136-182) exploits two simplifications valid when R = |x| >> |x0| and R >> λ:

1. Geometric approximation

r ≈ R - x̂·x0 (first-order Taylor expansion of distance)
r̂ ≈ x̂ (all source points subtend negligible angle)
1/r ≈ 1/R (amplitude is constant across source points)
The only per-source-point variation is in the phase: exp(ikr) ≈ exp(ikR) · exp(ik x̂·x0)

2. Dropping near-field terms: With 1/(ikr) ➔ 0, the terms simplify to term ➔ 1, term2 ➔ -p·x̂, term3 ➔ 1. The resulting fields are purely transverse:

E is the transverse projection of the source dipole: t = p - (p·x̂)x̂
H is x̂ ⨯ p / Z (or vice versa for magnetic sources)

The function signature reflects these precomputations — instead of taking the far-field point x, it takes xhat (unit vector), k, impedance, and expfac_base = k·n/(4πR) · exp(i(kR + π/2)), all computed once per far-field point and reused across every source point on the near-field surface. Per source point, the work reduces to:

One dot product x̂·x0 (3 multiplies)
One std::polarcall for the phase exp(ik·(...))
One complex multiply to get expfac
Simple arithmetic for the transverse projection and cross product

Speedup: two levels

Single-point speedup (modest, ~2-3x): Whe calling get_farfield for individual points, the code in farfield_lowlevel (line 530-532) still loops over every source point but calls green3d_farfield instead of green3d. The savings come from avoiding per-pair norm/division/complex-division computations. This gives a roughly 2-3x speedup per point.

Grid speedup via FFT (massive, O(N_src / log N_src)): The big win is in get_farfields_array (lines 588-769). When far_field_approx=True in 3D without periodic replicas, the code activates an FFT-accelerated path:

Preprocessing (once per near-field face): The DFT-accumulated source currents on each planar face are zero-padded (4x oversampling) and transformed via 2D FFT. Cost: O(N_src · log N_src) total across all faces.
Per output point: Instead of summing over all N_src source points, the code looks up each face's FFT at the spatial frequency f = k·x̂·spacing·N/(2π) via bilinear interpolation (fft2d_interp), then applies the geometric phase correction and far-field formula. Cost: O(N_faces) per ouput point.

Total complexity comparison for a grid of N_out far-field points:

Method	Cost
Exact (`green3d`)	O(N_out ⨯ N_src)
Far-field + FFT	O(N_src · log N_src + N_out ⨯ N_faces)

For a typical 3D problem (resolution 20, near-field box of side 4 ➔ ~6,400 points per face ⨯ 6 faces ≈ 38,400 source points, computing the far fields on a 100⨯100 grid):

Exact: 10,000 ⨯ 38,400 ≈ 384 million Green's function evaluations
FFT: 6 FFTs of size ~256⨯256 + 10,000 ⨯ 6 interpolations ≈ ~2.4 million operations

That's a ~160x speedup for this example, and it scales proportionally with N_src — larger near-field surfaces or higher resolution yield even greater speedups (easily 1000x+ for production-size problems).

Note: the <algorithm> header added in near2far.cpp is needed for two things used by the FFT-accelerated far-field path:

std::swap at line 423 in fft1d_inplace — the bit-reversal permutation step.
std::max at lines 672-673 — computing the zero-padded FFT sizes: next_pow2(std::max(oversample * cf.n1, 2))

There's also a pre-existing use of std::max at line 790 in the progress reporting, but that code predates this feature. The FFT code alone requires the include.

stevengj · 2026-06-05T20:02:23Z

+        std::complex<double> v = data[i + j + len / 2] * w;
+        data[i + j] = u + v;
+        data[i + j + len / 2] = u - v;
+        w *= wn;


Unstable trigonometric recurrence (O(n) roundoff error).

stevengj · 2026-06-05T20:06:34Z

+  while (p < n)
+    p *= 2;
+  return p;
+}


There is a cute way to get the next power of 2 using some bit-twiddling, but this is fine. We don't care about performance here.

stevengj · 2026-06-05T20:19:07Z

+        for (size_t idx_dft = 0; idx_dft < (size_t)f->N; idx_dft++) {
+          int i1 = idx_dft / cf.n2;
+          int i2 = idx_dft % cf.n2;
+          arr[i1 * cf.nf2 + i2] = f->dft[Nfreq * idx_dft + fi];


Seems better to loop over i1 and i2 and compute idx_dft from these than vice versa?

stevengj · 2026-06-05T20:21:00Z

+            EH1[3] += hf * tx;
+            EH1[4] += hf * ty;
+            EH1[5] += hf * tz;
+          }


This code seems like almost a copy-paste of green_farfield?

stevengj · 2026-06-05T20:25:00Z

+            for (int k = 0; k < 6; ++k) {
+              EH_[((k * 2 + 0) * N + idx) * Nfreq + i] = real(EH1[i * 6 + k]);
+              EH_[((k * 2 + 1) * N + idx) * Nfreq + i] = imag(EH1[i * 6 + k]);
+            }


This whole block seems to be a line-for-line copy-paste of the code on line 808 … needs refactoring.

Note that the right-hand side is different in the two blocks of code.

stevengj · 2026-06-05T20:25:36Z

+  return p;
+}
+
+static void fft1d_inplace(std::complex<double> *data, int n, int sign) {


Note that we already link FFTW if it is present — might as well just call it rather than rewriting an FFT?

…tion and add unit test for far-field patch with finite volume

oskooi · 2026-06-18T04:36:57Z

Empirically confirmed (via C++ instrumentation) that the FFT-accelerated path in get_farfields_array was never executing. The normal-detection heuristic assumed the surface normal had nd ≤ 1 grid points, but near2far DFT chunks are 2 grid points thick along their normal (the tangential field component is offset half a pixel on the Yee grid). So use_fft was always disabled, and every get_farfields(..., far_field_approx=True) call silently fell back to direct summation — which is why the old 1e-12 test passed trivially (both sides ran identical code).

The fix (src/near2far.cpp)

Robust normal detection — pick the thinnest Cartesian direction as the normal; the other two are transverse.
Per-slice handling — each normal layer is treated as a separate planar current sheet (its own face_pos), so the decomposition is exact for any rectangular chunk, including the edge slivers (78×2×2) that arise when a component (e.g. Hy) is tangential to two perpendicular faces. The scatter now uses actual grid indices (IVEC_LOOP_ILOC), independent of iteration order.
Bicubic interpolation — replaced bilinear fft2d_interp with Catmull-Rom bicubic. Measured error scaling at the test geometry:

oversample	bilinear	bicubic
4×	15%	2.1%
8×	3.6%	0.23%
16×	1.0%	0.015%

Settled on oversample=8 + bicubic (~0.1-0.2% error, comparable to the radiation-zone approximation's own O(L/R) ≈ 0.4% error) as the accuracy/memory sweet spot.

Measured speedup (FFT vs exact green3d)

FFT time is nearly constant (~2.5 s, dominated by the O(N_src log N_src) precompute) while direct scales linearly with output points:

grid	speedup	accuracy (FFT vs exact)
20×20	6.4×	0.11%
40×40	25×	0.11%
80×80	101×	0.11%

A notable finding: the per-point green3d_farfield simplification alone gives ~no speedup (direct-approx ≈ exact in wall-clock); essentially all the gain comes from the FFT restructuring.

Tests (python/tests/test_farfield_approx.py)

Updated test_farfield_approx_get_farfields_array tolerance from the trivially passing 1e-12 to 5e-3 (it now genuinely compares the FFT path vs direct summation).
Added test_farfield_approx_fft_grid_vs_exact — the missing coverage: a finite far-field patch spanning many fractional FFT bins, validated against both the exact near-to-far transform and per-point direct summation at 1e-2.

add support for far-field approximation of the 3D Green's function

76644ce

oskooi added the enhancement label Jun 1, 2026