5 Easy Ways to Break Your SSO System — and How to Fix It

Single sign-on (SSO) implementations tend to be very stable — unless you make specific system changes without planning ahead.

Over the years, I’ve seen a handful problems occur more often than any others. None of the problems are complicated or costly to fix, although they can be difficult to diagnose for those who don’t regularly deal with the intricacies of the Kerberos protocol and identity mapping.

Here are the top 5 causes of SSO disruption, in no particular order.

1. Upgraded Windows Domain controller

Here’s an oldie but goodie. And yes — I do still have a few customers who could be affected by this!

When Microsoft released Windows 2008 R2, not only did they replace DES with AES as the preferred algorithm, but they removed DES from the Kerberos encryption suite altogether. On the other side, IBM i did not include AES in the Kerberos encryption suite.

Microsoft provided configuration options for the domain controller to continue to accept DES and for Windows workstations to negotiate DES encryption. That was a bit of a pain, but we got through it, and IBM eventually added support for AES to the Kerberos encryption suite for the IBM i 6.x and 7.x releases.

It’s pretty easy to find and fix this problem. If you recently upgraded to Windows 2008 R2 then this is almost certainly the problem. Depending on the level of the IBM i OS you are running, the solution is to just uncheck the “Use DES only” Windows user account option for each of the service principal user accounts. If you are running on V5R3, or aren’t up to date on your CUM tapes for V5R3 or 6.1, you may have to install a PTF prior to unchecking the “Use DES only” user account option.

2. Disabled Kerberos Service Principal userID in Active Directory (AD)

We run into this problem periodically. Out of nowhere a customer will start getting “Service Principal not found” errors. Most often, a Windows administrator unwittingly causes this problem because — not knowing what one of the service principal user accounts is being used for — they disable the user account just to be on the safe side. While disabling unrecognized user accounts is not a bad practice, it can cause a temporary SSO glitch for the particular service enabled by a particular user account.

Re-enabling the service principal user account fixes this problem.

3. Changed password in AD account OR keytab file — but not both

Once in a while, a well-meaning Windows administrator will find user accounts whose passwords have not been changed in the required password change interval period. If the Kerberos Service Principal Windows account password is changed without changing the password for the same Service Principal in the Kerberos keytab file on IBM i, then SSO for that particular service will break.

The solution to this problem is also quite easy. Just make sure that both passwords — the Windows account and the keytab entry for that particular service — are changed to be the same.

Note that when the Windows account password is changed for a Service Principal, you need to change it using the KTPASS command once you agree on what the password will be in AD and in the Kerberos keytab entry on the IBM i.

4. LDAP server not available

Sometime in the very early days of IBM i release 7.1, several customers ran into a mysterious problem where the IBM Tivoli Directory Server (a.k.a. ITDS) would look to be started, only to fail almost immediately. The vast majority of IBM i shops only use the ITDS (i.e. LDAP server) for anything other than Enterprise Identity Mapping (EIM). ITDS normally starts by default when the OS comes up. Once EIM is configured, most shops don’t ever think much about LDAP anymore, or its role in getting users logged-in under the appropriate IBM i user profile after a successful Kerberos authentication.

Sometimes wiping out the LDAP configuration and starting from scratch fixes the problem. Sometimes it doesn’t. We had to get IBM support to look into and solve the “LDAP won’t start” problem for a couple of customers.

The problem didn’t seem to manifest itself in any one particular way. In one case the client finally ended up fixing an entry in the system DB2 cross-reference tables. I have not run into the problem in the last couple of months, so perhaps a PTF or cum tape contained the fix for this bug.

5. Missing reverse lookup of target IBM i

I wrote about this one in detail in a blog post last month. When resolving the IP address, many Kerberos clients will do a forward lookup of the name of the system to which the client is to connect, followed by a reverse lookup. The reason for this is that many targets have multiple IP hostnames. One of them is generally configured in DNS as the primary name and the rest are considered aliases or secondary names.

Most Kerberos clients take advantage of a DNS behavior that returns a consistent IP address and hostname regardless of the hostname actually provided. When performing a reverse DNS lookup (i.e. retrieving a hostname from an IP address), DNS servers always return the primary IP hostname. So when a “Kerberized” client application attempts to connect and authenticate to a remote service, it will use the DNS server to find the address associated with the given hostname. It then uses the IP address provided by DNS and does a reverse lookup to find the primary hostname. When clients perform hostname resolution this way, Kerberos Service Principals only need to be defined for the primary hostname of the service – not every alias for that IP address.

One of the clients that does this is the PC5250 Telnet emulator.

Switching to high availability role swaps, many shops will just change the hostname in DNS after a role swap. The hostname for the IP address that was associated with the backup system (e.g. “sysbackup”) is changed to the name of the production system (e.g. “sysprod”) and vice-versa. If they do not also add the reverse lookup for these entries, Kerberos-enabled client applications will not be able to connect.

Again the fix is simple. Add the reverse lookups after changing the hostnames and all will be well.

Well, there they are. My top 5 SSO disrupters for IBM i and how to fix them.

Of course, if you are an SSO stat! customer, you can fix these and any other SSO problems you might run into by simply giving us a call. I haven’t kept formal statistics, but I believe we solve most SSO problems — when they do occur — in about 10 minutes. While it’s rare to encounter SSO problems once you are up and running, our expertise in understanding the protocols and how they are implemented on various systems goes a long way towards a quick fix.

If you would like an assessment of what’s required to get SSO up and running in your environment, just contact me.