diff options
author | Luke Shumaker <lukeshu@sbcglobal.net> | 2016-09-30 18:54:11 -0400 |
---|---|---|
committer | Luke Shumaker <lukeshu@sbcglobal.net> | 2016-09-30 18:54:13 -0400 |
commit | d02bb64f70cb45bfbc4260f6ebc82d0dabb5f71c (patch) | |
tree | a60b333b3b328135b0f2cea15ae1c34258df1a22 /public/http-notes.html | |
parent | 6035c2082b29564eb1e6d956f9653d188362832a (diff) | |
parent | 36e8932c2d3d42f7651ab6aae7af62175ba172e1 (diff) |
make: add http
Diffstat (limited to 'public/http-notes.html')
-rw-r--r-- | public/http-notes.html | 70 |
1 files changed, 70 insertions, 0 deletions
diff --git a/public/http-notes.html b/public/http-notes.html new file mode 100644 index 0000000..6b6c1b2 --- /dev/null +++ b/public/http-notes.html @@ -0,0 +1,70 @@ +<!DOCTYPE html> +<html lang="en"> +<head> + <meta charset="utf-8"> + <title>Notes on subtleties of HTTP implementation — Luke Shumaker</title> + <link rel="stylesheet" href="assets/style.css"> + <link rel="alternate" type="application/atom+xml" href="./index.atom" name="web log entries"/> +</head> +<body> +<header><a href="/">Luke Shumaker</a> » <a href=/blog>blog</a> » http-notes</header> +<article> +<h1 id="notes-on-subtleties-of-http-implementation">Notes on subtleties of HTTP implementation</h1> +<h1 id="why-the-absolute-form-used-for-proxy-requests">Why the absolute-form used for proxy requests</h1> +<p><a href="https://tools.ietf.org/html/rfc7230#section-5.3.2">RFC7230§5.3.2</a> says that a (non-CONNECT) request to an HTTP proxy should look like</p> +<pre><code>GET http://authority/path HTTP/1.1</code></pre> +<p>rather than the usual</p> +<pre><code>GET /path HTTP/1.1 +Host: authority</code></pre> +<p>And doesn't give a hint as to why the message syntax is different here.</p> +<p><a href="https://parsiya.net/blog/2016-07-28-thick-client-proxying---part-6-how-https-proxies-work/#3-1-1-why-not-use-the-host-header">A blog post by Parsia Hakimian</a> claims that the reason is that it's a legacy behavior inherited from HTTP/1.0, which had proxies, but not the Host header field. Which is mostly true. But we can also realize that the usual syntax does not allow specifying a URI scheme, which means that we cannot specify a transport. Sure, the only two HTTP transports we might expect to use today are TCP (scheme: http) and TLS (scheme: https), and TLS requires we use a CONNECT request to the proxy, meaning that the only option left is a TCP transport; but that is no reason to avoid building generality into the protocol.</p> +<h1 id="on-taking-short-cuts-based-on-early-header-field-values">On taking short-cuts based on early header field values</h1> +<p><a href="https://tools.ietf.org/html/rfc7230#section-3.2.2">RFC7230§3.2.2</a> says:</p> +<blockquote> +<pre><code>The order in which header fields with differing field names are +received is not significant. However, it is good practice to send +header fields that contain control data first, such as Host on +requests and Date on responses, so that implementations can decide +when not to handle a message as early as possible.</code></pre> +</blockquote> +<p>I took that as a notice that I can use the first Host or similar header to quickly route along to my sub-component before I've parsed the entire header field set.</p> +<p>However, it later states in <a href="https://tools.ietf.org/html/rfc7230#section-5.4">§5.4</a>:</p> +<blockquote> +<pre><code>A server MUST respond with a 400 (Bad Request) status code to any +HTTP/1.1 request message that lacks a Host header field and to any +request message that contains more than one Host header field or a +Host header field with an invalid field-value.</code></pre> +</blockquote> +<p>Which means that I must parse the entire header field set.</p> +<p>However, if I look a bit closer at §3.2.2, I see that this short-cut is only valid for deciding to <em>not handle</em> a message; if I am handling it, I cannot use this short-cut.</p> +<p>Except that if I decide not to handle a request based on the Host header field, the correct thing to do is to send a 404 status code. Which implies that I have parsed the remainder of the header field set to validate the message syntax. Oh no, what do I do?</p> +<p>Well, there are a number of "A server MUST respond with a XXX code if" rules that can all be triggered on the same request. So we get to choose which to use.</p> +<p>And fortunately for optimizing implementations, <a href="https://tools.ietf.org/html/rfc7230#section-3.2.5">§3.2.5</a> gave us:</p> +<blockquote> +<pre><code>A server that receives a ... set of fields, +larger than it wishes to process MUST respond with an appropriate 4xx +(Client Error) status code.</code></pre> +</blockquote> +<p>And since the header field set is longer than we want to process (since we want to short-cut processing), we are free to respond with whichever 4XX status code we like!</p> +<h1 id="on-normalizing-target-uris">On normalizing target URIs</h1> +<p>An implementer is tempted to normalize URIs all over the place, just for safety and sanitation. After all, <a href="https://tools.ietf.org/html/rfc3986#section-6.1">RFC3986§6.1</a> says it's safe!</p> +<p>Unfortunately, most URI normalizers implementations will normalize an empty path to "/". Which is not always save; <a href="https://tools.ietf.org/html/rfc7230#section-2.7.3">RFC7230§2.7.3</a>, which defines this "equivalence", actually says:</p> +<blockquote> +<pre><code> When not being used in +absolute form as the request target of an OPTIONS request, an empty +path component is equivalent to an absolute path of "/", so the +normal form is to provide a path of "/" instead.</code></pre> +</blockquote> +<p>Which means we can't use the usual normalizer implementation if we are making an OPTIONS request!</p> +<p>Why is that? Well, if we turn to <a href="https://tools.ietf.org/html/rfc7230#section-5.3.4">§5.3.4</a>, we find the answer. One of the special cases for when the request target is not a URI, is that we may use "*" as the target for an OPTIONS request to request information about the origin server itself, rather than a resource on that server.</p> +<p>However, as discussed above, the target in a request to a proxy must be an absolute URI (and <a href="https://tools.ietf.org/html/rfc7230#section-5.3.2">§5.3.2</a> says that the origin server must also understand this syntax). So, we must define a way to map "*" to an absolute URI.</p> +<p>Naively, one might be tempted to use "/*" as the path. But that would make it impossible to have a resource actually named "/*". So, we must define a special case in the URI syntax that doesn't obstruct a real path.</p> +<p>If we didn't have this special case in the URI normalizer, and we handled the "/" path as the same as empty in the OPTIONS handler of the last proxy server, then it would be impossible to request OPTIONS for the "/" resources, as it would get translated into "*" and treated as OPTIONS for the entire server.</p> + +</article> +<footer> +<p>The content of this page is Copyright © 2016 <a href="mailto:lukeshu@sbcglobal.net">Luke Shumaker</a>.</p> +<p>This page is licensed under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA-3.0</a> license.</p> +</footer> +</body> +</html> |