Thursday, October 1, 2009
I's and J's and U's and V's
So one problem with Numen is that it doesn't recognize the different possibilities when dealing with I's and J's and U's and V's. As you know, the J and the U were not Classical Latin letters. There has been a lot of back-and-forth over the past 200 years -- some editors prefer the originals and some prefer the modern versions.
But how should Numen deal with this issue? Internally, the computer is more precise and less forgiving than a human, and so in order to provide highly sensitive and accurate searches, the data needs to be "normalized". For example, I recently normalized verbs for consistency by changing all deponent verbs into their active forms and simply marking them as deponent with a data flag. Now, when you search for a deponent verb, the flashcard still shows something like sequor but internally it's stored as sequo. The reasoning here is simple: deponent verbs, regardless of their dictionary form and traditional morphology, still have active participles and their imperfect/pluperfect subjunctives are still formed from active infinitives.
But what about the I's and J's? Those are easy. Convert all the J's to I's, and most Latin readers won't have a problem -- this has been the convention for quite some time now. But then what about the V's and U's? Should I convert all the U's to V's? The opposite is true here: most Latinists would be mildly irritated by this form: uiuus (vivus).
The solution, which would be similar to the one for the deponent problem, would be to mark internally everything with I's and V's but then show the contemporary I's and U's and V's to the end users. That way, the computer can do accurate searches, but users get the information they are used to.
So, in the coming weeks, Numen will undergo this under-the-hood transformation. For the most part, users will never even notice -- except in one area. Searching for uiuus will be the same as searching for vivus!