Has anyone parsed Wiktionary? [closed] Has anyone parsed Wiktionary? [closed] python python

Has anyone parsed Wiktionary? [closed]


I had at one time downloaded a wiktionary dump, trying to gather together words and definitions for slavic languages. I approached it using elementtree to go thru the xml file that is the dump. I would avoid trying to scrape or crawl the site, and just download the xml dump that wikimedia provides for wiktionary. Go to the wikimedia downloads, look for the english wiktionary dumps (enwiktionary) and go to the most recent dump. You'll probably want the pages-articles.xml.bz2 file, which is just the article content, no history or comments. Parse this with whatever xml processing libraries you prefer in python. I personally prefer elementtree. Good luck.


Wiktionary runs on MediaWiki, which has an API.

One of the subpages for the API documentation is Client code, which lists some Python libraries.


wordnik has done a good job parsing-out definitions, etcand they have a great api

like the others have mentioned, wiktionary is a formatting-disaster, and was not built to be computer-readable