Validating Zscaler’s “Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports

“Block Non‑RFC‑Compliant HTTP Traffic on HTTP/HTTPS Ports: Enable to allow the service to block traffic that isn’t compliant with Request for Comments (RFC)…” (help.zscaler.com)

“Block Non‑HTTP Traffic on HTTP/HTTPS Ports: Enable to restrict traffic on HTTP/HTTPS ports to HTTP traffic only.” (help.zscaler.com)


1  | Executive summary

Over two work‑weeks we set out to prove whether Zscaler Internet Access (ZIA) really enforces RFC‑conformant HTTP syntax when the “Block Non‑RFC‑Compliant HTTP Traffic” feature is enabled. The journey produced a 30‑case regression pack, four daily (batch) test rounds, several surprises, and a handful of actionable bugs escalated to the vendor.


2  | Why this matters

Modern web gateways promise deep‑layer protocol validation, but in practice attackers still smuggle requests by abusing edge‑case grammar. Enforcing only proper HTTP on ports 80/443 dramatically reduces the surface for:

  • classic request‑smuggling (CL / TE desynchronisation)
  • tunnel pivoting and protocol confusion attacks (SSH‑over‑443 etc.)
  • header‑injection tricks that bypass downstream WAFs

If ZIA really blocks malformed handshakes early, every downstream control benefits.


3  | Phase 0 — quick‑and‑dirty curl probes

Before writing the raw‑socket harness we validated ZIA’s setting with the native Windows curl.exe (v8.7.0) against https://httpbin.org. Twelve malformed requests were selected to cover the most common RFC violations an attacker can create with a single‑line command.

#

Malformation

Goal / RFC violation

Expected (feature ON)

Observed (baseline)

1

GET with body (-d)

Disallowed payload on idempotent method (RFC 7231 §4.3.1)

Block

200 OK

2

POST with invalid Content‑Type

Invalid media‑type token (RFC 7231 §3.1.1.1)

Block

200 OK

3

FAKEMETHOD verb

Undefined method token (RFC 7230 §3.1.1)

Block (405/501)

200 OK

4

Custom header containing raw CRLF

Break header grammar, probe injection

Block

200 OK‡

5

Malformed JSON body {invalid: json,}

Ensure backend sees parse error

N/A

200 OK

6

Empty Host: header

Mandatory authority missing (RFC 7230 §5.4)

Block

not executed

7

HTTP/0.9 request

Obsolete version on modern port

Block / drop

200 OK

8

Transfer‑Encoding: chunked + Content‑Length

TE.CL smuggling precursor

Block

200 OK

9

Invalid Accept: */*; q=2.0

q‑value must be ≤ 1.0 (RFC 9110 §12.5.1)

Block

200 OK

10

PUT with Content‑Length: 10 but no body

Length mismatch

Block

200 OK

11

Duplicate Content‑Type headers

Conflicting value occurrences

Block

200 OK

12

GET with Transfer‑Encoding: chunked

Chunked framing on no‑body method

Block

200 OK

† Direct raw socket is mandatory; curl automatically re‑inserts a sane Host header when not using an explicit proxy.
‡ 
curl.exe strips raw control characters before sending; separate raw harness required for definitive test.

Outcome: 0/12 transactions were blocked, confirming the need for deeper testing.

4  | Phase 1 — vendor analysis  | Phase 1 — vendor analysis

Test done by Zscaler Resident Engineer (Nicolas) after escalation to vendor and results replicate ours:

Following your request I have retested in my lab all the use cases and found the following:
 
 
Normal request : echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80
 
- HTTP version invalid - Detected - echo -e "GRT / HTTP/0.9\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80
- Method invalid - Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n\r\n" | nc perdu.com 80
 
- Host empty - Not detected - echo -e "GET / HTTP/1.1\r\nHost: \r\n\r\n" | nc perdu.com 80
- Body in the Header - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nContent-Length: 17\r\n\r\nThisShouldNotBeHere" | nc perdu.com 80
- Invalid User-Agent - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nUser-Agent: InvalidUserAgent/1.0\r\n\r\n" | nc perdu.com 80
- Empty User-Agent - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\nUser-Agent: \r\n\r\n" | nc perdu.com 80
- Invalid Header - Not Detected - echo -e "GET / HTTP/1.1\r\nHost: perdu.com\r\n: InvalidHeaderValue\r\n\r\n" | nc perdu.com 80
 
 
I have open a ticket (05642244) and this is the answer from the support. Only the last cases need to be investigated further as this is not expected.

"Block Non-RFC Compliant HTTP Traffic" feature typically targets HTTP requests that are so malformed that they: Don't adhere to the basic syntactic structure defined in HTTP RFCs (e.g., RFC 7230-7235 for HTTP/1.1). Could be used in certain attack vectors (e.g., HTTP Desync, request smuggling) if processed by a less robust downstream server. Represent clear violations of mandatory (MUST/REQUIRED) directives in the RFCs, especially regarding request line and essential headers.

It generally does not block: Violations of recommended (SHOULD/RECOMMENDED) directives, Semantically incorrect but syntactically valid requests and Uncommon but technically permissible constructs.

Regarding the test cases: 
- Host empty :  Why Not Detected: RFC 7230, Section 5.4 states: "A client MUST send a Host header field in all HTTP/1.1 request messages." While sending an empty value for the Host header is bad practice and will likely cause issues with the origin server (which would typically return a 400 Bad Request), the header line itself is present. The Zscaler feature might be checking for the presence of the Host header line, not necessarily the validity or non-emptiness of its value. Some interpretations focus on the complete absence of the Host: line as the primary violation.


- Body in the Header  : Why Not Detected: RFC 7231, Section 4.3.1 (GET): "A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request."
This means it's not forbidden by the RFC, just that its meaning is undefined and servers might reject it. It's not a syntactic violation for a GET request to have a body, just very unusual and typically ignored or problematic for the server. The feature likely doesn't see this as "non-RFC compliant" from a strict protocol structure perspective.

- Invalid User-Agent :  Why Not Detected: RFC 7231, Section 5.5.3 states: "A user agent SHOULD send a User-Agent header field in each request..." SHOULD means it's recommended but not mandatory. Furthermore, the content of the User-Agent string is not strictly defined or validated by the HTTP protocol itself (though there are conventions). An "invalid" User-Agent string is still a syntactically valid header.

- Empty User-Agent : Why Not Detected: Same as above. The header is present, its value is empty. This is RFC compliant.

- Invalid Header : This shpuld be detected, we ar checking why this is not the case. A header field name must be a token (RFC 7230, Section 3.2.6). A token cannot be empty. So, a header line that starts with a colon :\s*InvalidHeaderValue (implying an empty header name) is indeed non-RFC compliant.

5  | Phase 2 — building a deterministic regression pack

We switched to raw‑socket scripting so that every byte on the wire is ours, free from curl’s silent sanitation.

https://github.com/Ispanikas/small_scripts/blob/8d8c15945f3615fc630e00aadd399d9ec75dc215/Http-Regression-Pack.ps1

Key design choices

Concept

Rationale

Raw TCP, no TLS

Simplest path; ZIA still parses layer‑7 before forwarding.

30 cases / 4 “DAY” batches

Mirrors a QA sprint cadence while isolating functional areas.

CSV logging

Single artefact that joins timestamp, verdict line, and full payload.

Idempotent host variable

Change $TargetHost once to reroute traffic to any lab origin.

Batch themes

Test Batch (Day)

Focus area

# cases

1 – Baseline & vendor repro

Known malformed examples

8

2 – Host‑header edge cases

Missing/dup/oversize/fold Host

7

3 – TE / CL smuggling probes

Chunked v Content‑Length

7

4 – Exotic tokens & line format

Oversize start‑line, UTF‑8 method, lone LF, etc.

8


6  | Phase 3 — execution & data gathering

Environment — Windows 11 22H2 VM behind Zscaler Client Connector > Z‑Tunnel 2.0 (dataplane). Every test batch was executed twice: first through a plain‑HTTP path (TLS inspection Off), then through the organisation’s full SSL inspection stack (TLS inspection On). Each run takes < 20 s and appends to HttpMalformedTest‑log.csv.

Batch (Day)

Run timestamp (UTC+3)

Client Connector ver.

Z‑Tunnel mode

TLS inspection

Runtime (s)

CSV rows

Notes

1‑A (Jul 10)

2025‑07‑10 09:12

4.5.0.232

2.0

Off

17

8

Baseline without SSL inspection

1‑B (Jul 10)

2025‑07‑10 09:19

4.5.0.232

2.0

On

18

8

Same batch through MITM TLS

2‑A (Jul 11)

2025‑07‑11 09:05

4.5.0.232

2.0

Off

18

7

Host‑header focus

2‑B (Jul 11)

2025‑07‑11 09:13

4.5.0.232

2.0

On

19

7

Host‑header with SSL inspection

3‑A (Jul 14)

2025‑07‑14 08:58

4.5.0.232

2.0

Off

19

7

TE/CL smuggling

3‑B (Jul 14)

2025‑07‑14 09:07

4.5.0.232

2.0

On

20

7

TE/CL with SSL inspection

4‑A (Jul 15)

2025‑07‑15 09:02

4.5.0.232

2.0

Off

19

8

Exotic syntax

4‑B (Jul 15)

2025‑07‑15 09:11

4.5.0.232

2.0

On

20

8

Exotic syntax under SSL inspection

Column explanations

  • Run timestamp — start of script execution in Europe/Sofia (UTC+3).
  • Client Connector ver. — upgraded to the 4.5 release stream that introduces a new DPI library; pinned for consistency.
  • Z‑Tunnel mode — 2.0 enables per‑packet proxy; 1.0 would alter flow handling.
  • TLS inspection — each batch evaluated with ZIA SSL inspection Off and On to reveal parsing differences in clear‑text vs decrypted flows.
  • Runtime — wall‑clock seconds measured by PowerShell Measure‑Command.
  • CSV rows — one per test‑case; sanity‑check against batch cardinality.

7  | Phase 4 — results & analysis  | Phase 4 — results & analysis

Date (2025)

Batch

Highlights

Deviations / bugs

Jul 10

Baseline

HTTP/0.9 dropped; invalid method rejected.

Host: with empty value only hits origin (400), not ZIA edge. Empty header name not blocked.

Jul 11

Host

Empty/Missing/Duplicate Host blocked.

IPv6 literal blocked (403) although RFC‑compliant.

Jul 14

TE/CL

Classic TE/CL smuggling blocked.

CL underflow accepted → high‑risk desync vector.

Jul 15

Exotic

Oversize lines & non‑ASCII methods blocked.

Lone LF delimiter & tab‑prefixed method accepted. CR in header value drops connection without reason code.


8  | Security implications

The good — ZIA detects most obvious smuggling tricks out‑of‑the‑box.
The bad — leniency around stray LFs, header‑name grammar, and CL‑underflow suggests the HTTP parser still follows “be liberal in what you accept”. Attackers can chain those quirks to:

  • inject ghost requests after the gateway
  • bypass content filters by wrapping payload in tolerant syntax
  • open covert tunnels that masquerade as broken browsers



    Real results:
    https://github.com/Ispanikas/small_scripts/blob/130f96cc85f712e62cfc13eab0b3a2b1af9648e1/HttpMalformedTest-(httpbin)log.csv
    https://github.com/Ispanikas/small_scripts/blob/130f96cc85f712e62cfc13eab0b3a2b1af9648e1/HttpMalformedTest-(perdu)log.csv

Author: Tsvetomir Bogdanov   •   Edited: July 29 2025

 

4 months ago