Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for streaming on recent Darwin platforms #24

Merged
merged 3 commits into from
Nov 17, 2023

Conversation

simonjbeaumont
Copy link
Collaborator

@simonjbeaumont simonjbeaumont commented Nov 15, 2023

Motivation

Recently, the ClientTransport protocol was updated to be in terms of HTTPBody, which is an AsyncSequence<ArraySlice<UInt8>>. Until now, the URLSession transport has buffered the request and response bodies in memory. This PR seeks to bring streaming support to recent Darwin platforms.

Modifications

On new enough Darwin platforms1, use URLSession delegate to perform bidirectional streaming with backpressure. Request body backpressure is provided by the use of StreamDelegate with a bound stream pair. Response body backpressure is provided by the use of AsyncBackpressuredStream which allows for a high and low watermark to suspend and resume the URLSession task.

In more detail:

  • Vendor internal AsyncBackpressuredStream implementation from SE-0406.
  • Add HTTPBodyOutputStreamBridge which provides the OutputStreamDelegate required to bridge the AsyncSequence-based HTTPBody to an OutputStream.
  • Add BidirectionalStreamingURLSessionDelegate which makes use of HTTPBodyOutputStreamBridge for streaming request bodies and also uses the delegate callbacks to bridge the response body to HTTPBody.
  • Add URLSession.bidirectionalStreamingRequest(...) async throws, which provides a high-level URLSession-like API for setting up the appropriate URLSession task and delegate.
  • Add URLSession.bufferedRequest(...) async throws to provide a similar interfaces on platforms that don't support the streaming APIs on which we depend.
  • Add internal enum URLSessionTransport.Configuration.Implementation to control whether to use buffered or streaming implementation.
  • Detect appropriate implementation depending on the platform.

Result

  • On new enough Darwin platforms1, streaming will be used.

Test Plan

  • Add a set of tests that run with both buffered and streaming implementation on platforms that support streaming.
  • Add a set of tests that only run on platforms that support streaming, to test request flows only possible with streaming.

However, it's worth noting that our CI only runs on Linux, so we won't be testing the majority of this new feature in CI.

Footnotes

  1. macOS 12, iOS 15, tvOS 15, watchOS 8 ↩ ↩2

Copy link
Contributor

@czechboy0 czechboy0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great on my first pass, left a few questions. I'll take one more look once the NIO changes landed and this is finalized.

@simonjbeaumont simonjbeaumont force-pushed the sb/streaming-support branch 7 times, most recently from 825d2b3 to daec73b Compare November 16, 2023 14:56
Copy link
Contributor

@czechboy0 czechboy0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lgtm


enum Implementation {
case buffering
case streaming(requestBodyStreamBufferSize: Int, responseBodyStreamWatermarks: (low: Int, high: Int))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to not expose requestBodyStreamBufferSize. It is an extremely confusing parameter that does not reflect buffering done in CFNetwork and Network framework. We could have gotten away with not using bounded stream pair if NSStream subclassing were easier in Swift, and it will likely be unnecessary in the future.

We'd like to retain the ability to fine tune these values on our platforms without needing developers to change their code, e.g. reducing the buffer size on low memory devices like Apple Watch, or even adjusting it dynamically based on memory pressure / app state. Most developers are not equipped to decide the best buffer size to use.

Would they be better as environment variables for advanced use cases that we don't need to commit as APIs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference is to not expose requestBodyStreamBufferSize. It is an extremely confusing parameter that does not reflect buffering done in CFNetwork and Network framework.
...
We'd like to retain the ability to fine tune these values on our platforms without needing developers to change their code, e.g. reducing the buffer size on low memory devices like Apple Watch, or even adjusting it dynamically based on memory pressure / app state. Most developers are not equipped to decide the best buffer size to use.

This is not exposed. This enum Implementation is an internal type, only used in the internal init(session:implementation:). The only public initializer is public init(session:) which calls internal init(session:implementation:) with Implementation.platformDefault, which has some hard-coded buffer sizes and is also internal, which could allow us to tweak on a per-platform basis.

We could have gotten away with not using bounded stream pair if NSStream subclassing were easier in Swift, and it will likely be unnecessary in the future.

Oh, that's interesting to know. I'll watch this space 👀

Would they be better as environment variables for advanced use cases that we don't need to commit as APIs?

Hadn't considered an environment variable based approach. I think we could incorporate that too, for advanced users.

One thing to call out here that we were considering exposing was a way to force buffering. Right now, when providing an HTTPBody with iterationBehavior == .single, the observed behaviour is different when faced with a HTTP redirect. Specifically, it will succeed when using buffering, but fail when using streaming, because we cannot rewind the request body stream.

We have other ideas on how we could solve this—e.g. wrapping the provided AsyncSequence in another that buffers the first N bytes and might have a better chance of replaying on redirect—but in general we may want to expose the ability to just turn off streaming.

Curious if you have any thoughts on this?

//cc @dnadoba @czechboy0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's start with the parameter being internal, and only make it public if a genuine use case comes up.

I'm not a fan of libraries checking environment variables, it makes them difficult to test and reason about.

if hasAlreadyIteratedRequestBody {
guard requestBody!.iterationBehavior == .multiple else {
debug("Task delegate: Cannot rewind request body, cancelling task")
task.cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning nil is sufficient, no need to do both

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I just tried removing task.cancel() here and testHTTPRedirect_singleIterationBehavior_fails never completes.

Without the task.cancel(), the needNewBodyStreamForTask callback is called in a loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🔨 semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants