Add support for far-field approximation of the 3D Green's function#3223
Add support for far-field approximation of the 3D Green's function#3223oskooi wants to merge 3 commits into
Conversation
| std::complex<double> v = data[i + j + len / 2] * w; | ||
| data[i + j] = u + v; | ||
| data[i + j + len / 2] = u - v; | ||
| w *= wn; |
There was a problem hiding this comment.
Unstable trigonometric recurrence (O(n) roundoff error).
| while (p < n) | ||
| p *= 2; | ||
| return p; | ||
| } |
There was a problem hiding this comment.
There is a cute way to get the next power of 2 using some bit-twiddling, but this is fine. We don't care about performance here.
| for (size_t idx_dft = 0; idx_dft < (size_t)f->N; idx_dft++) { | ||
| int i1 = idx_dft / cf.n2; | ||
| int i2 = idx_dft % cf.n2; | ||
| arr[i1 * cf.nf2 + i2] = f->dft[Nfreq * idx_dft + fi]; |
There was a problem hiding this comment.
Seems better to loop over i1 and i2 and compute idx_dft from these than vice versa?
| EH1[3] += hf * tx; | ||
| EH1[4] += hf * ty; | ||
| EH1[5] += hf * tz; | ||
| } |
There was a problem hiding this comment.
This code seems like almost a copy-paste of green_farfield?
| for (int k = 0; k < 6; ++k) { | ||
| EH_[((k * 2 + 0) * N + idx) * Nfreq + i] = real(EH1[i * 6 + k]); | ||
| EH_[((k * 2 + 1) * N + idx) * Nfreq + i] = imag(EH1[i * 6 + k]); | ||
| } |
There was a problem hiding this comment.
This whole block seems to be a line-for-line copy-paste of the code on line 808 … needs refactoring.
There was a problem hiding this comment.
Note that the right-hand side is different in the two blocks of code.
| return p; | ||
| } | ||
|
|
||
| static void fft1d_inplace(std::complex<double> *data, int n, int sign) { |
There was a problem hiding this comment.
Note that we already link FFTW if it is present — might as well just call it rather than rewriting an FFT?
…tion and add unit test for far-field patch with finite volume
|
Empirically confirmed (via C++ instrumentation) that the FFT-accelerated path in The fix (
Measured speedup (FFT vs exact FFT time is nearly constant (~2.5 s, dominated by the O(N_src log N_src) precompute) while direct scales linearly with output points:
A notable finding: the per-point Tests (
|
Closes #2269, #2463.
What the exact
green3dcomputesThe existing function
green3d(near2far.cpp:190-214) evaluates the full free-space Green's function for a point-dipole source atx0observed atx. It computes three distance-dependent terms:1 - 1/(ikr) + 1/(ikr)²— radiating (1/r), intermediate (1/r²), and near-field (1/r³) contributions(-1 + 3/(ikr) - 3/(ikr)²) · (p·r̂)— radial component with all three ranges1 - 1/(ikr)— curl term with radiating and intermediate contributionsEach cell requires computing
r = |x - x0|,r̂ = (x-x0)/r,exp(ikr), and the complex divisions for these terms — all of which vary per source-observer pair.What the new
green3d_farfielddoes differentlyThe far-field approximation (
near2far.cpp:136-182)exploits two simplifications valid whenR = |x| >> |x0|andR >> λ:1. Geometric approximation
r ≈ R - x̂·x0(first-order Taylor expansion of distance)r̂ ≈ x̂(all source points subtend negligible angle)1/r ≈ 1/R(amplitude is constant across source points)exp(ikr) ≈ exp(ikR) · exp(ik x̂·x0)2. Dropping near-field terms: With 1/(ikr) ➔ 0, the terms simplify to
term ➔ 1,term2 ➔ -p·x̂,term3 ➔ 1. The resulting fields are purely transverse:t = p - (p·x̂)x̂x̂ ⨯ p / Z(or vice versa for magnetic sources)The function signature reflects these precomputations — instead of taking the far-field point
x, it takesxhat(unit vector),k,impedance, andexpfac_base = k·n/(4πR) · exp(i(kR + π/2)), all computed once per far-field point and reused across every source point on the near-field surface. Per source point, the work reduces to:x̂·x0(3 multiplies)std::polarcall for the phaseexp(ik·(...))expfacSpeedup: two levels
Single-point speedup (modest, ~2-3x): Whe calling
get_farfieldfor individual points, the code infarfield_lowlevel(line 530-532) still loops over every source point but callsgreen3d_farfieldinstead ofgreen3d. The savings come from avoiding per-pair norm/division/complex-division computations. This gives a roughly 2-3x speedup per point.Grid speedup via FFT (massive, O(N_src / log N_src)): The big win is in
get_farfields_array(lines 588-769). Whenfar_field_approx=Truein 3D without periodic replicas, the code activates an FFT-accelerated path:f = k·x̂·spacing·N/(2π)via bilinear interpolation (fft2d_interp), then applies the geometric phase correction and far-field formula. Cost: O(N_faces) per ouput point.Total complexity comparison for a grid of N_out far-field points:
green3d)For a typical 3D problem (resolution 20, near-field box of side 4 ➔ ~6,400 points per face ⨯ 6 faces ≈ 38,400 source points, computing the far fields on a 100⨯100 grid):
That's a ~160x speedup for this example, and it scales proportionally with N_src — larger near-field surfaces or higher resolution yield even greater speedups (easily 1000x+ for production-size problems).
Note: the
<algorithm>header added innear2far.cppis needed for two things used by the FFT-accelerated far-field path:std::swapat line 423 infft1d_inplace— the bit-reversal permutation step.std::maxat lines 672-673 — computing the zero-padded FFT sizes:next_pow2(std::max(oversample * cf.n1, 2))There's also a pre-existing use of
std::maxat line 790 in the progress reporting, but that code predates this feature. The FFT code alone requires the include.