I'm dreaming of a… support volunteer?
Monday, 15 September 2003 08:38Last night, I dreamed that I was with
rho; it was her birthday. We were wandering through a town and finally came to where she lived.
Last night, I dreamed that I was with
rho; it was her birthday. We were wandering through a town and finally came to where she lived.
Why would one Dilbert strip generate 439 comments?
I don't think I want to read them.
Edit: OK, I had a peek. Judging by the lengths of threads (which were, fortunately, reduced to poster/subject pairs), it looked like a typical day on Slashdot. *sigh*
I thought I had this great idea for counting letter frequencies in Klingon.
You see, I thought that in order to count Klingon letters, I could ignore the multi-letter graphemes (is that the right word?) such as "tlh" and "gh" and simply count letter by (ASCII) letter and compensate afterwards.
My theory was that there are only three multi-letter graphemes in the traditional Latin-based orthography: ch, gh, tlh. Also, every "c" occurs only as part of "ch" and every "g" only as part of "gh". Hence, if you subtract one "h" for every "c" and "g" you've seen, any "h"s left will have to come from "tlh". Then you can subtract that many "t"s and "l"s so that the proper count for those letters can be found.
The weakness of this plan manifested itself when I had another look at the alphabet and saw "ng". That means that I cannot tell from the number of "g"s in the input data how many "gh"s and how many "ng"s there were. Also, I can't compensate by counting "n"s because that letter can also occur alone. Bummer.
So it looks as if the only correct way is to take into account the orthography and include a switch "if the letter is c, g, n, or t, look ahead to see whether more letters follow that contribute to the current grapheme". (Or, alternatively, use a regular expression that explicitly lists the multi-letter graphemes first, something like (tlh|ch|gh|ng|['abDeHijlmnopqQrStuvwy]).)