Thursday, August 30, 2012

The perfect username

In almost any FIM project, the discussion on username generation and format comes up. This is a discussion which of course has a lot of technical issues but also a lot of political issues. Often times, a process or algorithm for generating usernames is already in place and when we implement FIM, we need to stick to this.

I know that it’s a huge task to change usernames in big organizations and probably this won’t be a part of a FIM project; however, when the decision to implement a product like FIM has been made, I think it’s only fair to bring up the subject of usernames and maybe spend a little time reevaluating the current algorithm for generating username – and evaluate whether or not, it is still the right one for the organization?

Sometimes though, I’ve even been asked to come up with a suggestion or a new standard as part of the FIM project. This has actually happened throughout my latest projects and that made me think about my former colleague, Per Østergaard. He compiled some good points and thoughts on the subject – and he and I have had some good discussions on this on different occasions.

So, I thought, I’d share the thoughts and a suggestion for “the perfect username”.

Thoughts on the “problem”
Visiting lots of companies, I’ve seen a lot of different ways of naming users. Conventions for usernames could be things like –
  • Initials
  • Initials + “.” + Department
  • Department + Initials
  • First name + first character of last name
  • Numeric employee id
  • 3 letters (originating from user name) +  3 digits

All the above conventions have their specific advantages and disadvantages. Now, from a pure Windows perspective, it doesn’t really matter, since a user can simply be renamed as long as the name is unique. But Windows is not the only system in the world (or most infrastructures) and this fact presents some challenges, i.e.  –
  • You cannot rename a user in SAP. You have to delete the old user, loosing auditing information and create a new one
  • Other systems requires a unique id for ever
  • Renaming a user is a complex process, where home directories paths, profile directories, email addresses, mailboxes, Lotus Notes id files and such might or should be renaming as well
  • Some systems are limited in the number of characters you can use
  • Some systems are limited in the character set you can use.

The problem with renames can be solved by simply saying ‘no’ to such demands, but if your current username is based on a person’s name, there is likely to be some demands anyway that cannot be ignored or perhaps legal stuff will prevent the ‘no’; sex change operations, transsexual, dissolved marriage/divorces with bad consequences, conversion to another belief, witness protection program and such.
Using pure digits and usernames, i.e. an employee number, could also prove problematic. It’s not easy to manage; hard to remember for occasional users, a number can easily be mistyped and employees might feel just like a number.

The ground rules for “the perfect username”
So, thinking about it, the ideal username convention should meet these demands –
  • Must not relate to the person’s name
  • Must not be a pure number
  • Parts of the id must be easy to remember
  • Must prevent unfortunate character sequences
  • Must never be subject to change
  • Must ‘never’ be reused (at least for several years depending on legal demands/policies)
  • Must be less than eight (8) characters
  • Must have sufficient value spread. For a user to pick/type in the wrong number can be minimized by randomizing some part of the name. If users are named U0001, U0002 etc., a mistake is much easier to make than if you are using U1782, U8232. 
  • Naturally, the convention must have far more values than current users to avoid reuse.
  • Only the letters 0-9 and the letters A-Z are used, but the letters ‘I’ and ‘O’ are not used, avoiding conflict with the digits 0 and 1. The combinations below doesn’t exclude the letters ‘I’ and ‘O’, though.
None of the current standards I’ve met so far honors these demands. 

Here’s the “perfect” username algorithm
The following suggested standard is probably not going to be widely accepted. So, you should regard this as a good starting point for a discussion and feel free to use it when the discussion on usernames emerges (and it will now and again).

The basic idea is to use a person’s birthday. The year is excluded to avoid problems with person not wanting to reveal their age. Using the birthday makes remembering that part of the user name simple (for most people). We always use the 2-digit month representation (01 for January etc.) and also a 2-digit day representation (09 for the 9th). All letters must never follow each other to avoid problematic combinations.

These “rules” presents some base combinations depending on the size of the organization –
  • Combination 1 – for smaller organizations
    • Random letter + month + day + random letter
    • Total combinations will be 8.784 (366 dates)
    • Maximum of 24 combinations per date
    • For good spread, expect only 10 users per date
    • Will give 3.660 potential usernames
    • A user will only have to remember one letter to remember their username
    • Examples: R1002, T1225
  • Combination 2 – for medium organizations
    • Random letter + month + day + random letter
    • Total combinations will be 210.816 (366 dates)
    • Maximum of 576 combinations per date
    • For good spread, expect only 200 users per date
    • Will give 73.200 potential usernames
    • A user will only have to remember two letters to remember their username
    • Examples: R1002T, T1225X
  • Combination 3 – large organizations
    • Random letter + month + random letter + day + random letter
    • Total combinations will be 5.059.584(366 dates)
    • Maximum of 13.824 combinations per date
    • For good spread, expect only 4000 users per date
    • Will give 1.464.000 potential usernames
    • A user will only have to remember three letters to remember their username
    • Examples: R10Q02T, T12A25X
So, in effect, a user will have a username that is partly random, partly well-known. It never needs to change or be reused – and it’s relatively short. The chance of mistyping a single character in R1002T and ‘hitting’ the wrong username is reduced a fair amount compared to an algorithm based on sequential ID’s or initials.

What do you think? Will it work? Thoughts, ideas and suggestions are most welcome. And let me know if you've implemented the suggestion above or a variation of it.

(some of the text above is directly copied from a former colleague of mine, Per Østergaard’s “thoughts” and writing. I took the liberty of adjusting a little bit for this blog)

No comments: