Andrii's Blog

The most efficient way to reuse http request body in go

Sometimes, there is a need to read an HTTP body before sending it. A common pattern to solve this problem is using io.ReadAll to read the body. However, ReadAll consumes the body, leaving it empty when the request is sent. The often suggested solution is to replace the request body with the read bytes wrapped by NopCloser:

body, err := io.ReadAll(req.Body)
// ...
req.Body = io.NopCloser(bytes.NewBuffer(body))

Recently, I was reviewing an HTTP client middleware that did exactly this: reading the request body and calculating an HMAC hash based on it. Here’s a little story about how I managed to improve the performance of the following function.

func (h HMACAuth) Authorize(w hash.Hash, req *http.Request, ts string) error {
	var body string

	if req.Body != nil {
		defer req.Body.Close()

		bodyBytes, err := io.ReadAll(req.Body)
		if err != nil {
			return fmt.Errorf("failed to read request body: %w", err)
		}
		body = string(bodyBytes)

		req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))
	}

	_, err := w.Write([]byte(fmt.Sprintf("%s\n%s\n%s", req.URL.String(), body, ts)))
	if err != nil {
		return fmt.Errorf("failed to generate hmac auth token: %w", err)
	}

	// hmacHash := base64.StdEncoding.EncodeToString(w.Sum(nil))

	// req.Header.Set("X-Timestamp", ts)
	// req.Header.Set("Authorization", "HMAC "+hmacHash)

	return nil
}

I've commented out parts that are not relevant for the optimization.

The reason I started optimizing this function was noticing that the body was being converted into a string, which can be avoided since we need to write []byte into w.

Eliminating unnecessary string conversion

After removing the unnecessary conversion, the function became 25% faster and 43% more memory-efficient. Here is the improved version of the function:

func (h HMACAuth) Authorize(w hash.Hash, req *http.Request, ts string) error {
	_, err := w.Write([]byte(req.URL.String() + "\n"))
	if err != nil {
		return fmt.Errorf("failed to write url into hmac hash: %w", err)
	}

	if req.Body != nil {
		body, err := io.ReadAll(req.Body)
		if err != nil {
			return fmt.Errorf("failed to read request body: %w", err)
		}

		_, err = w.Write(body)
		if err != nil {
		    return fmt.Errorf("failed to write body into hmac hash: %w", err)
		}

		req.Body.Close()
		req.Body = io.NopCloser(bytes.NewBuffer(body))
	}

	_, err = w.Write([]byte("\n" + ts))
	if err != nil {
		return fmt.Errorf("failed to write timestamp into hmac hash: %w", err)
	}

In my opinion, eliminating the string conversion didn't make the code much less readable, and the performance boost was definitely worth it.

Using TeeReader

Some answers on Stack Overflow suggested that io.TeeReader could improve the performance of the code that reads the request body and then writes it into some writer. I tried it, but unfortunately, I didn't see any performance boost. The benchmark numbers were identical to the previous version of the code, which makes me think that TeeReader performs more or less the same operations as my previous version. However, the code has one less error check, so it might be a better option in some cases. Here is the updated portion of the code:

	if req.Body != nil {
		b := io.TeeReader(req.Body, w)
		body, err := io.ReadAll(b)
		if err != nil {
			return fmt.Errorf("failed to read request body: %w", err)
		}

		req.Body.Close()
		req.Body = io.NopCloser(bytes.NewBuffer(body))
	}

Leveraging GetBody

Even though the performance boost was significant compared to the original code, I knew the code was not optimal yet. This is because we don't actually need to read the body into a temporary variable. Profiling confirmed that we spend a lot of CPU cycles transferring body bytes into this temporary variable. But do we really need this unnecessary byte transferring just because there's no other way of copying the request body into some Writer without consuming it? Actually, there is. It turns out that the request exposes a GetBody function, which allows you to read from the body without consuming it. The GetBody function is nil when the request is used in the context of a server, but when the request is used inside an HTTP client, it does exactly what we need.

Why does it work? It works because when we use the request inside an HTTP client, the request has more information about the body. It "knows" that the body is either bytes.Buffer, bytes.Reader, or strings.Reader, and reading from these readers doesn't empty the underlying buffer upon reading it.

Converting the original code to use GetBody made it 60% faster and 84% more memory-efficient. Here is the final function:

Final Code

func (h HMACAuth) Authorize(w hash.Hash, req *http.Request, ts string) error {
	_, err := w.Write([]byte(req.URL.String() + "\n"))
	if err != nil {
		return fmt.Errorf("failed to write url into hmac hash: %w", err)
	}

	if req.Body != nil {
		body, err := req.GetBody()
		if err != nil {
			return fmt.Errorf("failed to get request body: %w", err)
		}

		_, err = io.Copy(w, body)
		if err != nil {
			return fmt.Errorf("failed to write request body into hash: %w", err)
		}

		body.Close()
	}

	_, err := w.Write([]byte(fmt.Sprintf("%s\n%s\n%s", req.URL.String(), body, ts)))
	if err != nil {
		return fmt.Errorf("failed to generate hmac auth token: %w", err)
	}

	hmacHash := base64.StdEncoding.EncodeToString(w.Sum(nil))

	req.Header.Set("X-Timestamp", ts)
	req.Header.Set("Authorization", "HMAC "+hmacHash)

	return nil
}

Benchmark Results

Benchmarking was performed on a 4kb test payload.

goos: linux
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i5-12500
BenchmarkHmacAuth/original-12               1497504    24047 ns/op    32520 B/op    29 allocs/op
BenchmarkHmacAuth/no_intermidiate_buf-12    2041945    18062 ns/op    18608 B/op    25 allocs/op
BenchmarkHmacAuth/teereader-12              1985746    17922 ns/op    18640 B/op    26 allocs/op
BenchmarkHmacAuth/GetBody-12                3595358     9788 ns/op     5280 B/op    19 allocs/op

Here is the benchmark function that I used to do the tests:

func BenchmarkHmacAuth(b *testing.B) {
	auth := HMACAuth{}

	// appart form hmac I tested other hashers not included into the blog
	// post, therefore such a goofy test setup
	hmacTester := func(impl func(hash.Hash, *http.Request, string) error) func(*http.Request) error {
		return func(r *http.Request) error {
			w := hmac.New(sha256.New, []byte("secret"))
			return impl(w, r, "timestamp")
		}
	}

	for testName, impl := range map[string]func(*http.Request) error{
		"original":            hmacTester(auth.AuthorizeOriginal),
		"no intermidiate buf": hmacTester(auth.AuthorizeNoIntermidiateBuf),
		"teereader":           hmacTester(auth.AuthorizeTeeReader),
		"GetBody":             hmacTester(auth.AuthorizeGetBody),
	} {
		b.Run(testName, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r, err := http.NewRequest("POST", "http://example.com", strings.NewReader(bigBody))
				require.NoError(b, err)
				err = impl(r)
				require.NoError(b, err)
			}
		})
	}
}

Conclusion

By eliminating unnecessary string conversion, exploring io.TeeReader, and ultimately leveraging the GetBody function, I was able to significantly optimize the performance and memory efficiency of the HTTP client middleware. These changes not only improved the code's performance but also maintained its readability.

This post should provide a comprehensive overview of the steps taken to optimize the HTTP request body handling in Go, showcasing both the thought process and the tangible benefits of each optimization step.