I've finished the preliminary data load from Lewis and Short's "A Latin Dictionary". For those interested, there are approximately 51500 lemmata* (compared to 17500 in the Lewis Elementary).
The data is not live for users yet, so you won't be able to see the new data. But I wanted to tell you about this great breakthrough that I've made. The data needs a little bit more massaging to be considered production ready, but it's very, very clean data so far. I spent several weeks mining the data, and my heuristic algorithms** are getting pretty smart. A few more tweaks!
The big news is how this is going to affect the dictionary. I plan on adding a smart option-box in the bottom right-hand corner that will do two things. 1) List which dictionaries are available for searching and 2) allow the user (that's you!) to change the order in which they are searched and turn them on or off. You can see my mockup of this concept to the right. (Incidentally, this is my first post with graphics!)
The biggest benefit of the LNS (Lewis and Short) dictionary is that it contains 3 times the number of words -- granted, most of them are proper names and place names, but sometimes it's nice to know who and where those crazy ancient authors are talking about.
Keep your eyes peeled. It will only be a matter of weeks until this new data is live!
*For those who don't have experience in the field of lexicography, a lemma is a "head word" ...
** Experience-based methods ...
Tuesday, July 20, 2010
Friday, July 2, 2010
Regular Updates
What's going on now? As always, work goes on in the background, but nothing big has changed. Regardless, I want to make sure the front page stays fresh, so this update is to let you know that I continue to make small improvements to the dictionary data and the paradigms that the Latin parsing engine runs on.
Mostly, I spend a good amount of time correcting bad data, fixing wonky definitions, etc. But from time to time I find an error in a paradigm (for instance, recently I found macrons on -unt verbs) and fix it. Just a few days ago I discovered that short form 3rd adverbs like potenter don't parse properly; that's one of my current projects.
What's going on in the future? I'm still pulling data out of the big Lewis dictionary. I haven't loaded it yet because I'm happily discovering that the big Lewis dictionary has a wealth of information that can be extracted. The hard part, as it turns out, is extracting it. I don't know when the new data will be online, but rest assured I'm working on it often.
Until next time, feedback is always welcome!
Mostly, I spend a good amount of time correcting bad data, fixing wonky definitions, etc. But from time to time I find an error in a paradigm (for instance, recently I found macrons on -unt verbs) and fix it. Just a few days ago I discovered that short form 3rd adverbs like potenter don't parse properly; that's one of my current projects.
What's going on in the future? I'm still pulling data out of the big Lewis dictionary. I haven't loaded it yet because I'm happily discovering that the big Lewis dictionary has a wealth of information that can be extracted. The hard part, as it turns out, is extracting it. I don't know when the new data will be online, but rest assured I'm working on it often.
Until next time, feedback is always welcome!
Subscribe to:
Posts (Atom)