First, Because Math. Remember this.
Now on the real subject. How to actually setup Distributed Cache and Security Token Service in SharePoint with SAML (ADFS) with Load Balancing across a 6 tier farm. There is a literal ton of information on these services. Problem is none of it is fully put together to understand what is what and how it affects your farm, your users and your sleep. For what it’s worth I will cite the appropriate people as your information finally lead me to the working configuration.
First up is Distributed Cache (DC). First, make sure you have AppFabric CU5 or higher installed. It is not rolled up with SharePoint or Windows updates so you have to get it yourself. First make sure you understand the reason behind DC. It saves the login token so that your user doesn’t have to login each time. This applies to Forms and SAML Claims users. First thing to do is find out which servers need to run DC and AppFabric. First go to a WFE server and run Get-CacheHost, you might need to use Use-CacheCluster command first. Then run
Get-SPServiceInstance | ? {($_.service.tostring()) -eq "SPDistributedCacheService Name=AppFabricCachingService"} | select Server, Status
Those commands are thanks to Samuel Betts at <http://blogs.msdn.com/b/sambetts/archive/2014/03/19/sharepoint-2013-distributed-cache-appfabric-troubleshooting.aspx>. I highly suggest you read that post before continuing reading this.
Do Not run DC on a search crawler. Bad things will happen, like AppFabric service crashing and blowing out all the other cache databases. You have been warned.
Now there is a very important thing you need to do before continuing. You need to login to each server that is running a WFE or Search Query. You will need to run Add-SPDistributedCacheServiceInstance – which adds the server to AppFabric and SharePoint both. Then run
Set-NetFirewallRule -DisplayName "File and Printer Sharing (echo request - ICMPv4-In)" -Enabled True
This information is from Sahil Malik <http://www.codemag.com/Article/1309021>. Why is it needed? Sahil describes it very well, and you can read his post.
Now you are sort of setup, maybe. So have you configured your Trusted Identity Token Issuer? No?, go do that then come back.
Now we need to setup DC and STS. First there is a DC Bug <http://habaneroconsulting.com/insights/SharePoint-2013-Distributed-Cache-Bug>. Read his post to get more information, however some of his information is a little off, specifically MaxConnectionsToServer. This property specifies how many connections are allowed to the Cache server to check for a login, which sounds like you want it high to accommodate the number of users you have. However M$ support informed us after three weeks that this value is limited to the number of processors you have. So keeping the default value of 2 is probably best. However if you have a beefier machine with more processors then up this. If you set this higher then you will get intermittent hangs and high memory usage.
RequestTimeout = # of milliseconds to wait to find logon token. This is not your friend in SAML Claims. Setting this higher to say, 10 seconds seems to be good. Basically you want to avoid going to local cache as this almost always makes a reauth.
The other thing to consider here is your load balancer. Do you use persistence or not? F5 says to not use persistence for SP2013. Supposedly DC works good enough, of course if it did this blog post wouldn’t exist, nor would the others I linked to. So I caution you that if you don’t user persistence, you better get this right or you could fall into a bad reauth cycle.
The code below only needs to run once. Though it wouldn’t hurt to check that they received the update with the Get command below. Then restart the service on each server.
$timeout = 10000
$maxConnections = [Max # Processors per WFE]
$DLTC = Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache
$DLTC.RequestTimeout = $timeout
$DLTC.ChannelOpenTimeOut = $timeout
$DLTC.MaxConnectionsToServer = $maxConnections
Set-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache $DLTC
Get-SPDistributedCacheClientSetting -ContainerType DistributedLogonTokenCache
$DLVSC = Get-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache
$DLVSC.ChannelOpenTimeOut = $timeout
$DLVSC.RequestTimeout = $timeout
$DLVSC.MaxConnectionsToServer = $maxConnections
Set-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache -DistributedCacheClientSettings $DLVSC
Get-SPDistributedCacheClientSetting -ContainerType DistributedViewStateCache
# This should run [Restart-Service -Name AppFabricCachingService] on each cache host
Restart-CacheCluster
Don’t use Session Cookies unless you have a documented security reason. Otherwise your users will hate you. You can change it, but I highly suggest you don’t. Now on to more math. MaxServiceTokenCacheItems and MaxLogonTokenCacheItems are in memory tokens given by the SP STS, per server. You want this to match the MaxConnectionsToServer setting if your load balancer is running persistence. If not you might need a higher value. The Service Token is used whenever your user hits the Search Query or other service application service. The LogonTokenCacheExpirationWindow is used for sliding sessions. Again more math.
The code below only needs to run once on the farm. However you will need to iisreset each server.
$maxTokens = [# of concurrent user / WFEs]
$sts = Get-SPSecurityTokenServiceConfig
$sts.UseSessionCookies = $false
$sts.MaxServiceTokenCacheItems = $maxTokens
$sts.MaxLogonTokenCacheItems = $maxTokens
$sts.LogonTokenCacheExpirationWindow = (New-TimeSpan -second [MATH])
$sts.Update()
iisreset
After I got this all configured and run an 8 hour test, the farm performed great and as we had originally expected. As always your value may very.
Now I need a beer and cigar. Seriously though I hope this helps someone and keeps the frustration down.