Throttling

An area you'll want to take a look at when building a WCF service is service throttling. Throttling refers to the number of sessions, calls, and/or service instances that are available at any given time. Here is an example of how throttling is configured in a configuration file:

<system.servicemodel>
  <behaviors>
    <serviceBehaviors>
        <behavior name="throttling">
           <serviceThrottling maxConcurrentCalls="10" maxConcurrentInstances="10" maxConcurrentSessions="10"/>
        </behavior>
    </serviceBehaviors>
  </behaviors>
</system.servicemodel>

Here are what the attributes in the serviceThrottling element represent:

  • maxConcurrentCalls: This value represents the number of messages that can be concurrently processed across all instances of your service at any given time. If we have ten clients accessing the service at the "same" time and we assume that each of these requests is inside a service instance being processed, then we have ten concurrent calls. Pending calls do not count against this value--only currently processing calls.
  • maxConcurrentInstances: This value represents the number of service "instances" that can be in play at any time. When I say "number of instances," I am conceptually referring to how many instances of your service class are currently handling requests. Because of this, the number of instances is also dependent upon the InstanceContextMode attribute (an attribute you apply to your service class) applied to your service. InstanceContextMode can be set to either PerCall, PerSession, or Single. If the InstanceContextMode is set to Single then it doesn't matter what you set maxConcurrentInstances to because your service is a singleton and therefore there will only ever be one instance. If InstanceContextMode is set to PerSession, then maxConcurrentInstances keeps the number of instances limited in relation to the number of client sessions (PerSession means that each client accessing the service has their own service instance). If InstanceContextMode is set to PerCall, then a new service instance is created upon every client request (regardless of which client is responsible for the request), and so maxConcurrentInstances relates to the number of instances in this way.
  • maxConcurrentSessions: This value represents the number of sessions that can be active across all instances of your service at any given time. When I say "active across all instances of your service" I could also say "against your service host," as one service host (an instance of the ServiceHost class) is created for a WCF service (which houses all the service instances).

Here is what I think is a good example of how this works in practice:

I could potentially have a service with an InstanceContextMode set to PerCall (meaning a new service instance is created for every call), and have the following throttling values:

  • maxConcurrentCalls: 1
  • maxConcurrentInstances: 40
  • maxConcurrentSessions: 1

Let's imagine I have 30 clients hitting my service at the same time. In this case the service can handle all the requests, but it might be a little slow because I have told my service via the throttling configuration that it is only to deal with 1 concurrent call at a time and it is only to create one instance. This will result in a "queue" of requests in my service that essentially get handled one at a time. In practice, it seems as though maxConcurrentCalls and maxConcurrentInstances control how fast your service responds, while if you have maxConcurrentSessions set too low you can pretty much bring down your service because it gets overwhelmed with requests (I have seen this occur when hosting a service in a console application but haven't tried it when hosting in IIS). Also, keep in mind that a session is preserved until the client closes their connection to the service or until the timeout for their connection lapses (the binding timeout can be set via the configuration file on both the client and server--and the shorter timeout wins). Given the ability to set the timeout value for a connection via the server configuration file, if you set this timeout very low then you can protect your server from being taken down--provided a low timeout value works for your application.