abbreviate(names, minlength = 4, use.classes = T, dot = F)
THE ALGORITHM. The abbreviation algorithm does not simply truncate. It has a threshold, according to which it will drop: 1) non-printing characters and white space, 2) lower case vowels, 3) lower case consonants and punctuation and finally 4) upper case letters and special characters.
If use.classes is FALSE, there is only the distinction between white space and other characters. Each string is broken up into words, separated by white space. For a given value of the threshold, eligible letters are dropped from the end of each word, one more letter from each word on each iteration, until the desired minimum length is reached. At least one letter is kept from each word. If the abbreviation is too long, the threshold is raised and the process is repeated.
This algorithm may still not produce unique abbreviations. If it does not, then minlength will be increased and the algorithm will be applied again, but only to those names not distinguished by the previous round. The end result may be that some of the abbreviations will be longer than the requested length, but as few of these as possible given the algorithm. (See the third example below.)
The method assumes you want identical names to produce identical abbreviations. The result of all this tends to be abbreviations not quite like anything you've ever seen before, but usually fairly intuitive when the input names are English text.
abbreviate(state.name[1:10]) # Alabama Alaska Arizona Arkansas California Colorado # "Albm" "Alsk" "Arzn" "Arkn" "Clfr" "Clrd"# Connecticut Delaware Florida Georgia # "Cnnc" "Dlwr" "Flrd" "Gerg"
abbreviate(state.name, 2)["New Jersey"] # New Jersey # "NJ"
ab2 <- abbreviate(state.name, 2) table(nchar(ab2)) # 2 3 4 # 32 15 3
ab2[nchar(ab2)==4] # Massachusetts Mississippi Missouri # "Mssc" "Msss" "Mssr"