add SOCKS5 proxy support (-p/--proxy flag)#6
Conversation
Adds a new --proxy CLI option that accepts socks5://host:port and routes HTTPS traffic through a SOCKS5 proxy. On the JVM, the proxy path uses a raw Socket(Proxy) + SSLSocket wrapper rather than hato, because the JDK HttpClient silently drops SOCKS proxies from a ProxySelector. Tracked as OpenJDK JDK-8214516 (Open, P4, since 2018, no fix planned). Raw Socket(Proxy) bypasses this because it talks HTTP/1.1 directly. On babashka, the proxy string is passed through to the underlying http-client. Native build is wired up with new reflection config and the socks URL protocol. Includes: - new SOCKS5 HTTP/1.1 client in src/r11y/lib/http.cljc with proper chunked transfer-encoding, gzip/deflate decoding, and SSL handshake - new --proxy flag in src/r11y/core.clj, threaded through extract-content-from-url - graal-config/reflect-config.json: add Proxy, Proxy$Type, ProxySelector, InetSocketAddress, SSLSocket, SSLSocketFactory, SSLContext - build-native.sh: add socks to --enable-url-protocols - README: document the --proxy option - new tests: chunked decoder, gunzip, maybe-decode, CLI parse
|
Hey @alekseysotnikov - thank you for the PR. At first glance (very first glance), the extensive Also, wouldn't a system level proxy be "just" transparently used? I'll take a deeper look this week. |
dazld
left a comment
There was a problem hiding this comment.
@alekseysotnikov - I've had a chance to look now.
Before looking directly at the PR does proxychains cover your use case?
If there is a reason it needs to be in-process, then here's what I'd want addressed before merging:
Security — two issues I'd consider blocking:
-
TLS hostname verification is dropped. A raw
SSLSocketfromSSLSocketFactoryvalidates the certificate chain but does not verify that the hostname matches the cert — unlikejava.net.http, which does it automatically. As written, any valid cert for any host is accepted, which is a MITM hole — and it matters most for exactly this feature, since the traffic is going through an untrusted/foreign network. It needs an explicit:(let [params (.getSSLParameters ssl-sock)] (.setEndpointIdentificationAlgorithm params "HTTPS") (.setSSLParameters ssl-sock params))
-
A malformed proxy string fails open.
proxy->optscatches the parse error and returnsnil, so--proxy <garbage>produces no marker and the request falls straight through to a direct connection. Someone proxying for privacy/geo reasons would silently leak the request to the target. It should fail closed (throw). Relatedly, the scheme isn't validated, so--proxy http://…is treated as SOCKS5.
There are also a couple of functional gaps worth a look: the SOCKS path doesn't appear to follow redirects (a 301/302 or http→https returns the redirect stub rather than content, which hato handles for you), and there's no read timeout set on the socket (only connect), so a stalled server hangs indefinitely.
Code placement: as in my earlier comment, I'd rather the #?(:clj (do …)) block — ~250 lines of hand-rolled HTTP/1.1 — live in its own r11y.lib.socks5 (plain .clj) namespace, pulled in via #?(:clj (:require …)). That there's a do in there is a helpful pointer that this isn't quite the right shape too - that shouldn't be needed.
The cljc file shrinks to dispatch, the client becomes independently testable, and linting with clj-kondo is more straightforward.
fetch-url in html.clj would also be dead code with this approach — extract-content-from-url builds an inline closure that duplicates its opts logic.
Please do check if proxychains works for you - and if not, we can take a second pass at this.
Adds a new --proxy CLI option that accepts socks5://host:port and routes HTTPS traffic through a SOCKS5 proxy. On the JVM, the proxy path uses a raw Socket(Proxy) + SSLSocket wrapper rather than hato, because the JDK HttpClient silently drops SOCKS proxies from a ProxySelector. Tracked as OpenJDK JDK-8214516 (Open, P4, since 2018, no fix planned). Raw Socket(Proxy) bypasses this because it talks HTTP/1.1 directly. On babashka, the proxy string is passed through to the underlying http-client. Native build is wired up with new reflection config and the socks URL protocol.
Includes: