Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Power of Two Choices alogrithm implementation for Spring Cloud LoadBalancer #601

Open
OlgaMaciaszek opened this issue Sep 9, 2019 · 16 comments
Assignees

Comments

@OlgaMaciaszek
Copy link
Collaborator

Provide a Power of Two Choices (https://www.nginx.com/blog/nginx-power-of-two-choices-load-balancing-algorithm/) implementation for Spring Cloud LoadBalancer.

Original issue: #595

@OlgaMaciaszek OlgaMaciaszek added this to the 2.2.0.RC2 milestone Oct 29, 2019
@OlgaMaciaszek OlgaMaciaszek self-assigned this Oct 29, 2019
@ryanjbaxter ryanjbaxter removed this from the 2.2.0.RC2 milestone Nov 18, 2019
@OlgaMaciaszek
Copy link
Collaborator Author

The research underlying this approach.

@OlgaMaciaszek
Copy link
Collaborator Author

The use cases examined within the whitepaper (linked in the comment above) assume that the clients (in our case, possibly client-side load balancer instances) communicate directly with the servers to choose the best server to pass the request to.

The servers interact with the clients informing them about their current load. The communication between the clients and the servers is structured into rounds. The research includes the following (alternative) approaches:

  • As the clients send requests, the servers respond to each requesting client informing it which position in the FIFO queue the client would get within that round. Based on that each client picks the best server.
  • As the clients send requests, the servers either reject them or accept them based on a preset load threshold number - the clients keep retrying the requests in future rounds till they get accepted.

As the paper demonstrates, these approaches allow getting good results when it comes to load distribution (and resulting max loads vs numbers of requests and servers). However, they are communication-heavy, so the resulting communication costs have to be low enough to allow this approach to be profitable.

With that in mind, the research centres on what is the minimum number of servers (and, thus, smaller communication cost) that each client should communicate that would still give good results in terms of load distribution between the servers. The author proves that while there's a very substantial benefit from contacting two random servers and choosing the best one of them over just randomly picking one server to send requests to, adding any additional number of servers to choose from leads to much less significant improvements.

@OlgaMaciaszek
Copy link
Collaborator Author

In the context of implementing this within the Spring Cloud LoadBalancer, there are some issues to consider. The implementation of the ReactiveLoadBalancer itself should not be problematic, given that we could use the data on the current servers' load stored in the ServiceInstance metadata, however, the ServiceInstanceListSupplier implementation and how this information should be collected should probably be discussed.

Some things that come to mind:

  1. While the algorithm centres around the scenario where clients interact directly with end-servers during the load-balancing, the most used ServiceInstanceListSupplier that we have is the DiscoveryClientServiceInstanceListSupplier, that interacts with service discovery; we could approach this in various ways:
  • we could select the 2 services from SD and then communicate with them to get their load state and choose the best one for the final request - this approach requires an additional round of communication (as compared to the research), and might be too costly over HTTP, but that would probably be most in keeping with PoTC approach
  • we could pass load info as service instance metadata while registering service instances in SD; then we could get this information directly from SD; the issue I see here is that the recommended ServiceInstanceListSupplier is the caching one - so given the fact that load tends to change quite dynamically we would have stale data; using a reactive push-based ServiceInstanceListSupplier implementation instead might be a possible solution, but it still may generate unnecessarily heavy traffic if we update it on each load change on each of the servers; also, if we use this approach, we might as well just pick the best server and one of two best servers as we have the list there already;
  1. Server instances would have to be instrumented to know their load and have an endpoint where that could be queried (possibly using micrometer?)

  2. We would have a different, interesting scenario with Spring Cloud RSocket - there the Gateway/ LB instances actually will have much more knowledge of the network topology, and we might be able to find best instances based on that

@OlgaMaciaszek
Copy link
Collaborator Author

I would like to start a discussion on the best implementation of PoTC. Have put some considerations in the comments above.

Please join the discussion and add any ideas and comments.

@spencergibb @ryanjbaxter @marcingrzejszczak @dsyer @Haybu @smaldini @nebhale @TYsewyn @elandau

@Haybu
Copy link

Haybu commented Dec 4, 2019

2. Server instances would have to be instrumented to know their load and have an endpoint where that could be queried (possibly using micrometer?)

in non-RSocket cases and given that services are instrumented, two things come to mind to help a client with decision making:

• could a client probably consult a monitoring toolkit such as Prometheus or its collected metrics somewhere via a Prometheus writer (to a time-series db..ect.)? the downside this happens at request time and could cause a slight delay
• Prometheus writer to write instrumented metrics to a stream broker (such as Kafka), leveraging cloud stream a client to act as a listener/sink and proactively maintains services’ instances metrics (kafka KStream/Ktable) that could be consulted at the time of target services invocations.

I can see both options involve extra components :)

@ojhughes
Copy link

ojhughes commented Dec 5, 2019

As the paper demonstrates, these approaches allow getting good results when it comes to load distribution (and resulting max loads vs numbers of requests and servers). However, they are communication-heavy, so the resulting communication costs have to be low enough to allow this approach to be profitable.

It could be possible to piggy back on top of underlying service discovering systems, so that when a client requests registration information it also includes data about the services load

@OlgaMaciaszek
Copy link
Collaborator Author

• could a client probably consult a monitoring toolkit such as Prometheus or its collected metrics somewhere via a Prometheus writer (to a time-series db..ect.)? the downside this happens at request time and could cause a slight delay
• Prometheus writer to write instrumented metrics to a stream broker (such as Kafka), leveraging cloud stream a client to act as a listener/sink and proactively maintains services’ instances metrics (Kafka KStream/Ktable) that could be consulted at the time of target services invocations.

I like the second idea more cause would probably be more performant; although I think it's worth considering if we could get the information from the just with Micrometer, without Prometheus so that we require as little obligatory external components as possible; wdyt?

@dsyer
Copy link
Contributor

dsyer commented Dec 6, 2019

Micrometer can only tell you about metrics for the local process, right? I also think prometheus is a good idea, but overkill for the simplest use cases. I definitely wouldn’t want it to be a mandatory dependency. We’ll probably end up rewriting Eureka if we take this idea much further.

@OlgaMaciaszek
Copy link
Collaborator Author

Micrometer can only tell you about metrics for the local process, right? I also think prometheus is a good idea, but overkill for the simplest use cases. I definitely wouldn’t want it to be a mandatory dependency. We’ll probably end up rewriting Eureka if we take this idea much further.

That's true, but for this scenario, it could work:

we could select the 2 services from SD and then communicate with them to get their load state and choose the best one for the final request - this approach requires an additional round of communication (as compared to the research), and might be too costly over HTTP, but that would probably be most in keeping with PoTC approach.

That being said, I'm also not convinced that this would be more efficient than using SD.

@jleibund
Copy link

jleibund commented Dec 6, 2019

I've picked up tracking this from the ticket @elandau registered for this. Just maybe a little input from our gRPC P2C impl that might apply here:

  1. we use SD (eureka for us but could be any) for the active list to choose from
  2. we do use metrics, connection state, etc when picking to include least pending requests on a connection/instance
  3. we track prior failures and quarantine those nodes (locally) as a tie breaker (if one is quarantined)
  4. we do not communicate with the two choices before choosing - there is some overhead mentioned, pathological cases, and diminishing returns over just metrics
  5. we implement exponential backoff in the LB

Our metrics, internally are Spectator, I think Prometheus sounds good - but the provider, ideally would be plug-gable and ditto for the metrics/status evaluation algorithm - depending on the metrics.
one has access to. Also, certainly leveraging pluggable SD would be helpful.

@ryanjbaxter
Copy link
Contributor

I'm wondering if it makes sense that when a service sends its heart beat data to the service discovery server it also includes its current load data. This data could then be updated to service discovery clients when they send their heart beat delta and get updates about newly registered services. Yes, the cached data might be out of date, but we are already fine if the list of services is out of date as well.

@Haybu
Copy link

Haybu commented Dec 11, 2019

a metrics-aware discovery service is optimal if we are fine with outdated cached instances, as @dsyer mentioned it looks like a new enhanced Eureka.

@dsyer
Copy link
Contributor

dsyer commented Dec 12, 2019

I think I would be happy with local metrics to start with (we have statistics about requests we sent to each service instance from the current JVM at least). It might help if we design the API in such a way that it doesn't preclude a remote implementation. Shouldn't be too hard though.

@OlgaMaciaszek
Copy link
Collaborator Author

we do not communicate with the two choices before choosing - there is some overhead mentioned, pathological cases, and diminishing returns over just metrics

It also makes sense to me to avoid that communication since we are communicating with the SR anyway and we can get information from there; however, in this context, why would we do the 2 random choices approach? It seems that the main benefit discussed in the paper is gaining a good load distribution while reducing communication to only 2 instances (the scenario that the algorithm concentrates on is one where we need to contact each instance to get its load status); if we can communicate only with one external app, i.e. the service registry, and get all that data in 1 communication, I'm not sure why we should not just pick the best instance instead of randomly picking 2 to chose one from them?

@TYsewyn
Copy link
Contributor

TYsewyn commented Dec 18, 2019

I'm wondering if it makes sense that when a service sends its heartbeat data to the service discovery server it also includes its current load data.

That's a really good idea but it won't work for all SRs unfortunately, eg. Cloud Foundry or Kubernetes.

I'm also not sure what implications that would have for the heartbeats.
IIRC the default setting for most SD clients is to send a heartbeat every 30 seconds.
If the system is under high load that information would be out of date and invalid sub-second.

@jleibund
Copy link

Olga - if the SR is always up-to-date and propagation delay is acceptable then you could be right. Under real world situations with SR lag, outtages, bursty traffic, etc for us it was important for each instance to adapt for periods in isolation so in that sense we may have a variation of the paper as strictly interpreted. SR can suffer from production and replication issues, depending on its design, and refresh characteristics, caching, etc.

My comments pertain to a gRPC implementation (not straight http/servlets) so we have access to some additional information provided by the gRPC API that may or may not apply here. Our filtering flow looks approximately like this:

  1. use SR to filter down to list of active candidates, track this locally (re your comment, this could be the first limit to remove overly loaded servers, but we do not currently do this)
  2. for each candidate channel we locally (instance) track count of failures, count of backoffs (for an exp backoff strategy), whether we are in the middle of retrying (we support retries), last failure time (for backoff calc)
  3. during choice of two, pick two random candidates from the active candidates pool
  4. if one is quarantined (defined below) pick the other
  5. if both are quarantined or both are ok, then apply least loaded logic (defined below)

In the above, our least loaded logic is based on the gRPC subchannel active count (basically the backlog of queued requests). So this (step 5, above) is a local loading decision that can adapt and not suffer from issues with a distributed SR caching / lag.

For the quarantine decision in step 4:

  1. if its retrying, the candidate is quarantined for this decision
  2. if its not in READY state (gRPC-specific) and the active count > 0 then its quarantined
  3. if its tracked count of failures > 0 we apply the exponential backoff strategy based on the count of backoffs, last failure time and current time to decide if its quarantined

The above is just food for thought. Our gRPC choice of 2 may have started as a strict reading of the research but has evolved over time to account for a number of failure situations we have encountered in under high traffic volume. But its one of many possible, valid implementations that could more strictly apply the choice of 2 research as written or extend beyond the a controlled research environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants