| Projects

Validating Zscaler’s “Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports

“Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports: Enable to allow the service to block traffic that isn’t compliant with Request for Comments (RFC)…” (help.zscaler.com)

“Block Non‑HTTP Traffic on HTTP/HTTPS Ports: Enable to restrict traffic on HTTP/HTTPS ports to HTTP traffic only.” (help.zscaler.com)

1  | Executive summary

Over two work‑weeks we set out to prove whether Zscaler Internet Access (ZIA) really enforces RFC‑conformant HTTP syntax when the “Block Non‑RFC‑Compliant HTTP Traffic” feature is enabled. The journey produced a 30‑case regression pack, four daily (batch) test rounds, several surprises, and a handful of actionable bugs escalated to the vendor.

2  | Why this matters

Modern web gateways promise deep‑layer protocol validation, but in practice attackers still smuggle requests by abusing edge‑case grammar. Enforcing only proper HTTP on ports 80/443 dramatically reduces the surface for:

classic request‑smuggling (CL / TE desynchronisation)
tunnel pivoting and protocol confusion attacks (SSH‑over‑443 etc.)
header‑injection tricks that bypass downstream WAFs

If ZIA really blocks malformed handshakes early, every downstream control benefits.

3  | Phase 0 — quick‑and‑dirty curl probes

Before writing the raw‑socket harness we validated ZIA’s setting with the native Windows curl.exe (v8.7.0) against https://httpbin.org. Twelve malformed requests were selected to cover the most common RFC violations an attacker can create with a single‑line command.

#	Malformation	Goal / RFC violation	Expected (feature ON)	Observed (baseline)
1	GET with body (-d)	Disallowed payload on idempotent method (RFC 7231 §4.3.1)	Block	200 OK
2	POST with invalid Content‑Type	Invalid media‑type token (RFC 7231 §3.1.1.1)	Block	200 OK
3	FAKEMETHOD verb	Undefined method token (RFC 7230 §3.1.1)	Block (405/501)	200 OK
4	Custom header containing raw CRLF	Break header grammar, probe injection	Block	200 OK‡
5	Malformed JSON body {invalid: json,}	Ensure backend sees parse error	N/A	200 OK
6	Empty Host: header	Mandatory authority missing (RFC 7230 §5.4)	Block	not executed†
7	HTTP/0.9 request	Obsolete version on modern port	Block / drop	200 OK
8	Transfer‑Encoding: chunked + Content‑Length	TE.CL smuggling precursor	Block	200 OK
9	Invalid Accept: /; q=2.0	q‑value must be ≤ 1.0 (RFC 9110 §12.5.1)	Block	200 OK
10	PUT with Content‑Length: 10 but no body	Length mismatch	Block	200 OK
11	Duplicate Content‑Type headers	Conflicting value occurrences	Block	200 OK
12	GET with Transfer‑Encoding: chunked	Chunked framing on no‑body method	Block	200 OK

† Direct raw socket is mandatory; curl automatically re‑inserts a sane Host header when not using an explicit proxy.
‡ curl.exe strips raw control characters before sending; separate raw harness required for definitive test.

Outcome: 0/12 transactions were blocked, confirming the need for deeper testing.

4  | Phase 1 — vendor analysis  | Phase 1 — vendor analysis

Test done by Zscaler Resident Engineer (Nicolas) after escalation to vendor and results replicate ours:

Following your request I have retested in my lab all the use cases and found the following:

- Normal request : echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80

- HTTP version invalid - Detected - echo -e "GRT / HTTP/0.9\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80

- Method invalid - Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80

- Host empty - Not detected - echo -e "GET / HTTP/1.1\r\nHost: \r\n\r\n" | nc perdu.com 80

- Body in the Header - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nContent-Length: 17\r\n\r\nThisShouldNotBeHere" | nc perdu.com 80

- Invalid User-Agent - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nUser-Agent: InvalidUserAgent/1.0\r\n\r\n" | nc perdu.com 80

- Empty User-Agent - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nUser-Agent: \r\n\r\n" | nc perdu.com 80

- Invalid Header - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n: InvalidHeaderValue\r\n\r\n" | nc perdu.com 80

I have open a ticket (05642244) and this is the answer from the support. Only the last cases need to be investigated further as this is not expected.

"Block Non-RFC Compliant HTTP Traffic" feature typically targets HTTP requests that are so malformed that they: Don't adhere to the basic syntactic structure defined in HTTP RFCs (e.g., RFC 7230-7235 for HTTP/1.1). Could be used in certain attack vectors (e.g., HTTP Desync, request smuggling) if processed by a less robust downstream server. Represent clear violations of mandatory (MUST/REQUIRED) directives in the RFCs, especially regarding request line and essential headers.

It generally does not block: Violations of recommended (SHOULD/RECOMMENDED) directives, Semantically incorrect but syntactically valid requests and Uncommon but technically permissible constructs.

Regarding the test cases:
- Host empty : Why Not Detected: RFC 7230, Section 5.4 states: "A client MUST send a Host header field in all HTTP/1.1 request messages." While sending an empty value for the Host header is bad practice and will likely cause issues with the origin server (which would typically return a 400 Bad Request), the header line itself is present. The Zscaler feature might be checking for the presence of the Host header line, not necessarily the validity or non-emptiness of its value. Some interpretations focus on the complete absence of the Host: line as the primary violation.

- Body in the Header : Why Not Detected: RFC 7231, Section 4.3.1 (GET): "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request."
This means it's not forbidden by the RFC, just that its meaning is undefined and servers might reject it. It's not a syntactic violation for a GET request to have a body, just very unusual and typically ignored or problematic for the server. The feature likely doesn't see this as "non-RFC compliant" from a strict protocol structure perspective.

- Invalid User-Agent : Why Not Detected: RFC 7231, Section 5.5.3 states: "A user agent SHOULD send a User-Agent header field in each request..." SHOULD means it's recommended but not mandatory. Furthermore, the content of the User-Agent string is not strictly defined or validated by the HTTP protocol itself (though there are conventions). An "invalid" User-Agent string is still a syntactically valid header.

- Empty User-Agent : Why Not Detected: Same as above. The header is present, its value is empty. This is RFC compliant.

- Invalid Header : This shpuld be detected, we ar checking why this is not the case. A header field name must be a token (RFC 7230, Section 3.2.6). A token cannot be empty. So, a header line that starts with a colon :\s*InvalidHeaderValue (implying an empty header name) is indeed non-RFC compliant.

5  | Phase 2 — building a deterministic regression pack

We switched to raw‑socket scripting so that every byte on the wire is ours, free from curl’s silent sanitation.

https://github.com/Ispanikas/small_scripts/blob/8d8c15945f3615fc630e00aadd399d9ec75dc215/Http-Regression-Pack.ps1

Key design choices

Concept	Rationale
Raw TCP, no TLS	Simplest path; ZIA still parses layer‑7 before forwarding.
30 cases / 4 “DAY” batches	Mirrors a QA sprint cadence while isolating functional areas.
CSV logging	Single artefact that joins timestamp, verdict line, and full payload.
Idempotent host variable	Change $TargetHost once to reroute traffic to any lab origin.

Batch themes

Test Batch (Day)	Focus area	# cases
1 – Baseline & vendor repro	Known malformed examples	8
2 – Host‑header edge cases	Missing/dup/oversize/fold Host	7
3 – TE / CL smuggling probes	Chunked v Content‑Length	7
4 – Exotic tokens & line format	Oversize start‑line, UTF‑8 method, lone LF, etc.	8

6  | Phase 3 — execution & data gathering

Environment — Windows 11 22H2 VM behind Zscaler Client Connector > Z‑Tunnel 2.0 (dataplane). Every test batch was executed twice: first through a plain‑HTTP path (TLS inspection Off), then through the organisation’s full SSL inspection stack (TLS inspection On). Each run takes < 20 s and appends to HttpMalformedTest‑log.csv.

Batch (Day)	Run timestamp (UTC+3)	Client Connector ver.	Z‑Tunnel mode	TLS inspection	Runtime (s)	CSV rows	Notes
1‑A (Jul 10)	2025‑07‑10 09:12	4.5.0.232	2.0	Off	17	8	Baseline without SSL inspection
1‑B (Jul 10)	2025‑07‑10 09:19	4.5.0.232	2.0	On	18	8	Same batch through MITM TLS
2‑A (Jul 11)	2025‑07‑11 09:05	4.5.0.232	2.0	Off	18	7	Host‑header focus
2‑B (Jul 11)	2025‑07‑11 09:13	4.5.0.232	2.0	On	19	7	Host‑header with SSL inspection
3‑A (Jul 14)	2025‑07‑14 08:58	4.5.0.232	2.0	Off	19	7	TE/CL smuggling
3‑B (Jul 14)	2025‑07‑14 09:07	4.5.0.232	2.0	On	20	7	TE/CL with SSL inspection
4‑A (Jul 15)	2025‑07‑15 09:02	4.5.0.232	2.0	Off	19	8	Exotic syntax
4‑B (Jul 15)	2025‑07‑15 09:11	4.5.0.232	2.0	On	20	8	Exotic syntax under SSL inspection

Column explanations

Run timestamp — start of script execution in Europe/Sofia (UTC+3).
Client Connector ver. — upgraded to the 4.5 release stream that introduces a new DPI library; pinned for consistency.
Z‑Tunnel mode — 2.0 enables per‑packet proxy; 1.0 would alter flow handling.
TLS inspection — each batch evaluated with ZIA SSL inspection Off and On to reveal parsing differences in clear‑text vs decrypted flows.
Runtime — wall‑clock seconds measured by PowerShell Measure‑Command.
CSV rows — one per test‑case; sanity‑check against batch cardinality.

7  | Phase 4 — results & analysis  | Phase 4 — results & analysis

Date (2025)	Batch	Highlights	Deviations / bugs
Jul 10	Baseline	HTTP/0.9 dropped; invalid method rejected.	Host: with empty value only hits origin (400), not ZIA edge. Empty header name not blocked.
Jul 11	Host	Empty/Missing/Duplicate Host blocked.	IPv6 literal blocked (403) although RFC‑compliant.
Jul 14	TE/CL	Classic TE/CL smuggling blocked.	CL underflow accepted → high‑risk desync vector.
Jul 15	Exotic	Oversize lines & non‑ASCII methods blocked.	Lone LF delimiter & tab‑prefixed method accepted. CR in header value drops connection without reason code.

8  | Security implications

The good — ZIA detects most obvious smuggling tricks out‑of‑the‑box.
The bad — leniency around stray LFs, header‑name grammar, and CL‑underflow suggests the HTTP parser still follows “be liberal in what you accept”. Attackers can chain those quirks to:

inject ghost requests after the gateway
bypass content filters by wrapping payload in tolerant syntax
open covert tunnels that masquerade as broken browsers

Real results:
https://github.com/Ispanikas/small_scripts/blob/130f96cc85f712e62cfc13eab0b3a2b1af9648e1/HttpMalformedTest-(httpbin)log.csv
https://github.com/Ispanikas/small_scripts/blob/130f96cc85f712e62cfc13eab0b3a2b1af9648e1/HttpMalformedTest-(perdu)log.csv

Author: Tsvetomir Bogdanov • Edited: July 29 2025

6 months ago

Validating Zscaler’s “Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports

Validating Zscaler’s “Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports