![]() ![]() ![]() The founder - Fawad Ahmad Muslim - as far as I've been able to figure out, simply kept updating the site as more news came in. ![]() The AfghanistanNewsCenter wasn't very well known, and it doesn't seem to have done too much to publicise its activities. If you’re a professional researcher or writer using DevonThink as your notes database - and quite frankly, if not, why not? - the Highlights app will probably please you.Įarlier this year, without any fanfare, a truly great resource and archive relating to Afghanistan's recent past went offline. I can take highlights in-app, export all the quotations as separate text or HTML files and have have DevonThink go do its thing without all the intermediary hassle. Now, post-Highlights-installation, my workflow is much less laborious. These would be fed into DevonThink’s AI engine and magic would happen. I would laboriously copy and paste whatever text snippet or quotation I wanted to preserve along with its page reference. Any notes I took while reading the file were written up manually in separate files. Thus far, my workflow has been to read PDFs on my Mac. These are a mix of reports, archived copies of websites, scanned-and-OCRed photos and a thousand-and-one things in between. In fact, I did a quick search in DevonThink, and I am informed that I have 52,244 PDFs in my library. Like many readers of this blog, I get sent (and occasionally read) a lot of PDFs. I may end up coding up a version that has high accuracy on Afghan names because it's a scenario in which I often find myself, but I'll have to explore the other more mathematically-driven options to see if I can find a happy medium. (See this article, for example, on the confusion over spellings of Muslim names and how this leads to law enforcement mistakes). But you probably want to code in some common rules for things which come up often. (I happen to have read a lot of the materials relating to Afghanistan, so I know that these variations of names exist and that there is a single entity that unites the various spellings of Kunduz, for example). You want something that is agnostic about content in the case of situations where you don't have domain knowledge. I imagine the most accurate solution is a mixture of both approaches. I imagine that there are many other possible metrics that one could use to detect how much two strings resemble one another. ![]() One, commonly referenced, is a Python package called ' FuzzyWuzzy' it uses a mathematical metric called the Levenshtein distance to measure how similar or not two strings are. These various rules could then be coded in a system that would collect all the possible spelling variations of a particular string and then search the database for all the different variations.įollowing a bit of duckduckgo-ing around, I've since learnt that there are quite extensive discussions of this problem as well as approaches to solution that have been proposed. The 'K' at the beginning could also, in certain circumstances, become 'Q' or 'Qh'. The Kunduz example from above reveals that vowels are a key point of contention: the 'u' can also be spelt 'o'. My first thought was to be prescriptive about the various rules and transformations that happen when people make different spelling choices. In Afghanistan, this ranges from people's names - Muhammad, Mohammad, Muhammed, Mohammed etc - to place and province names - Kunduz, Konduz, Kondoz, Qonduz, Qhunduz etc.ĭevonThink actually has a 'fuzzy search' option that you can toggle but it isn't clear to me how it works or whether it's reliable as a replacement for a more systematic approach.Īs I'm currently doing more and more work using Python, I was considering what my options would be for making my own fuzzy search emulator. That works a bit better, but I have to tweak the spelling several times until I can really claim the search has been exhaustively performed.Īnyone who's done work in and on a place where a lot of material is generated without fixed spellings for transliteration. I tweak the name slightly to see if a slightly different spelling brings more results. I let the search compute for a short while, but nothing comes up. I hop over to DevonThink to do a full-text search over all my databases. I'll think to myself that it'd be useful at this point to get a bit of extra information before I continue reading. Here's something that happens fairly often: I'll be reading something in a book and someone's name is mentioned. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |