09:24 <@Schuyler> mike_: one of the most interesting issues I've found so far is that of character encoding. 09:25 < mike_> non standard greek? 09:25 <@Schuyler> just character encoding in general. 09:25 < mike_> aren't there public domain fonts? 09:26 <@Schuyler> I mean the actual binary encoding of accented characters and so on. Unicode is the right way to go, but Unicode handling isn't mature across all development languages and platforms. 09:26 < mike_> surely digital scholars have already dealt wiht this 09:27 < mike_> if you want, i can connect you right away with people working in this space 09:28 <@Schuyler> well, this is a lark for me really, I actually have stuff I'm supposed to be working on. 09:28 < mike_> whenever, let me know... 09:28 < pere> Schuyler: I would recommend UTF-8 instead of Unicode, and ISO 10646 over Unicode. 09:29 <@Schuyler> I meant Unicode in general. Of course I'm attempting (I think unsuccessfully) to convert everything to UTF-8. 09:30 <@Schuyler> but dear lord this is slow. I need to find a better way to make use of pgsql's indexes. 09:33 <@Schuyler> another issue. The name "Petra" is mentioned once by Thucydides, but the gazetteer found about 15 places with that name in my area of interest, probably because the name undoubtedly means something like "rock". Which one was Thucydides referring to? 09:35 < mike_> in jordan probably 09:35 <@Schuyler> Actually, Jordan's not in my search area. 09:35 < mike_> famous big temple there 09:36 <@Schuyler> I rather foolishly left out Syria, Jordan, Lebanon, Israel and Palestine. 09:36 <@Schuyler> since I don't think anything important happened there during the war. But I'm less than halfway through the book, so I could be totally wrong. 09:36 < mike_> h'mm let me connect you wiht my friend archeologist. do i use the nocat address? 09:36 <@Schuyler> oh here's a good one. 09:37 <@Schuyler> the name "Apollo" is mentioned 15 times, no doubt in reference to the god, *but* of course the gazetteer finds it. 09:37 < sxpert-carpc> http://www.esitcom.org/gallery/ivar/dscn2232 09:38 <@Schuyler> ha. here's another good one. "polis". 09:39 <@Schuyler> in general I'm concerned about spelling also. "Attica" versus "Attika". A metaphone index would fix that, but it would also introduce more false positives. 09:43 <@Schuyler> however, useful things can be done in the mapping phase by looking at the relative occurrences of things 09:43 <@Schuyler> nevertheless I strongly expect that concerted human intervention will still be required to clean up the finished dataset.