TL;DR version: If you decide to stop reading midway through, you're probably not interested in the idea anyway, so you won't be missing out. It starts out with the basics then gets more detailed as you read.
People may think it's foolish, so let me explain. You may be familiar with the MSDN library.
I want to get a laptop (need funds) which means I'll be add-on programming on the road (yes, I'm that much of a dork - no offense to others). I won't always have internet access; however, the wowwiki reference will still be a desirable resource.
Currently for myself, I am setting up a portable offline MediaWiki to access a static dump of WoWWiki. It's portable in the sense that it doesn't require any "install" on the running system, but it requires Windows (not OS portable). Also, when all is set up and ready to go, it consumes about 2GB.
Before I spend the time to make a "tutorial," I want to poll to see how many people would actually use it. I know it's an "off" idea, and kind of specific purpose, so I don't expect many people to vote for it.
Existing "reader" applications like WikiTaxi, BZReader and WikiFilter were not too friendly with WoWWiki's dump for some reason (most didn't even work at all with it). So I use a portable apache, php, mysql, and mediawiki solution.
Importing the XML dump into the sql database can take up to 48 hours on a 2.6 ghz dual core. I have to look into if I can legally distribute a sql database backup that I make from the XML, which reduces the import to about 1 hour, and greatly simplifies initial set-up.
In case I can't distribute the SQL dump, I wrote small C++ CLI program that reads the large XML from STDIN (allowing for programs like bzip/gzip to be used in a pipe), and breaks it up into smaller (default 16MB) XML files that are imported. This allows for "pausing" so you don't need 48 consecutive CPU hours to import a whole XML dump.
I'm currently running my CPU at 3.2 ghz to see if it goes a little faster, but I'm likely only going to shave a few hours off since most of the time is spent accessing the hdd. And the last time I attached a Dremel to the platters to overclock my hard drive, it didn't go over so well for data integrity - jkjk.
In the end, all of the text content and search features work - however it's missing images and the WoWWiki look. Functionally it's great, but fashionably it may lack - however you'd likely be reading it to make an add-on, not to take it to the prom.
AFAIK that's just a book. It's not really "searchable" and it lacks a certain kind of interactivity I crave... I hate books. It does come with the "companion site," but that requires internet access.
Plus, if we're talking portability without internet access, we're probably talking laptop on the road. A software dump would already be on the laptop; wheras with the book you'd need to have both handy. That is unless people are still programming add-ons with punch cards.
That is unless people are still programming add-ons with punch cards.
I lol'd IRL.
Maybe compiling it into a CHM file will reduce its size? I have no knowledge on how this is done or if its searchability is sufficient for your (and your potential "client" base's) needs, but if it is, it could be worth looking into to avoid having to use a full webserver.
That's actually not a bad idea I might want to look into.
I'm currently modifying my CLI program that splits the XML dump up, so that it has an option to exclude User and User Talk pages, as well as an option to exclude all Talk pages. That should basically cut the number of articles down. Which in turn should speed up the import time, and decrease the size.
The user pages are nice, but are not really needed for an offline dump I think. The Talk pages sometimes have good problem/solutions; but the majority of that information is likely useless to the general reader (good to have online, not really needed off). However, since some might disagree with the necessity of that information, I'm leaving it as an option for both.
WoWWiki offers a XML dump? I've only found a page to export single pages and not a complete dump.
However I would prefer something which doesn't require massive importing or locally installed Webserver/SQL-Server. I think maybe using XSLT to convert the stuff to static HTML is faster than SQL import (I say maybe... but they could be shared if allowed) and static HTML is okay for an offline wiki.
The conversion could be done through a .NET application which - if done right - will run everywhere where Mono can run.
I don't know if creating CHM can be automated and what you need to do this but this could be a step after the creation of the HTML so that atleast you have the HTML files even if you don't have the applications to generate the CHM.
Really though, you should talk to Clad and Kael, maybe an offline dump of the book's API docs is something they're willing to do.
That's a great book, but of course, its API reference is now ~slightly~ out of date (though by and large, it's still quite handy for meat-and-potatos wow addon programming)
Their companion site is apparently going through an update... heck, I submitted a couple updated descriptions myself.
In theory, if you had an easy way to freshen it, and only included the wow API, events, and other addon-authoring-specific information, a portable version would be pretty useful.
For my money, I just try never to travel anywhere without Internet access. I only stay in hotels with High Speed net access, and for those really horribly out-of-the-way situations, I tether my cell phone. Honestly, if I'm somewhere so far afield that I can't connect (hotel, hotspot, other non-mentioned methods), I'm probably more concerned with something far more important, like how many of my traveling companions I'm going to have to kill, cook, and eat before we are rescued. :p
--Size and performance issue
I decided to try using MyISAM tables instead of the default InnoDB for the MySQL database. The result is a faster, smaller database. Now it is just about 900MB with search index done - about 700MB without search index. This is also with using my program to filter out the "talk" and "user" pages during import (about a 33% reduction).
--But isn't InnoDB better than MyISAM for a wiki?
Technically it is, however, the advantages are really for supporting many concurrent users doing reading and writing at the same time. For your offline wiki, you probably won't be doing any writing, and if you do, you won't be editing multiple pages at the same time (maybe consecutively, but that's different).
Now that I'm using MyISAM, the import time from an XML is only about 12 hours - I know, only. However, if I do that work once, and make a SQL dump to distribute, the import is only about 15 MINUTES!! Lol, that was kind of exciting to see. The not so exciting part -- I don't know if I have the legal rights to distribute the SQL dump. I contacted a WoWWiki admin who referred me to contact the Wikia community. I am now awaiting a response from them.
--Performance on a flash drive
The initial set-up requires a lot of writes, and should therefore be done on a hard-drive for both speed and to prevent undue wear on the flash memory. However, once it is all set, it can easily be copied to a flash drive and the read performance is not bad at all. You do see a difference between HDD and Flash reads, but it is both understandable and not really appeciable (maybe a second or two longer on some page loads).
--I don't want to locally install a Web Server!
I am using a portable package for that, "MoWeS" by CH Software. It contains functioning apache, mysql, php5, and MediaWiki applications; however, they are "portable apps" as in there is no "permanent" install. They are designed to run in the folder they are extracted in without attaching in anyway to the host. You can even move the folder around at will (while the servers are down), and they'll still work next time. This is especially good for Flash memory where the volume letter may not always be the same.
Additionally, the package is installed with "default" settings. There's very few changes needed to make it work for the WoWWiki dump. It's mostly fool proof - but I'm sure somebody will mess up.
--User level restrictions
None of the code requires administrative priveledges. You may get Firewall warnings, however all the "internet" access is done through the localhost loopback interface. So even if a firewall blocks it, the localhost loopback interface should continue to function. The reason the Firewall will come up is because Appache and MySQL will each create a socket for listening. Anytime a listening socket is opened, a good firewall will let you know about it - however, a good firewall shouldn't affect the localhost loopback interface (I'm not sure if it's even possible or not). I actually recommend blocking the programs from WAN access, since it's not needed.
--I'm opening ports!?
MoWeS comes with an option (default on) to only allow access through the localhost interface. Additionally, you can safely and easily block both the appache server and the mysql server with your firewall. The only inbound access will be through the local loopback interface.
Because it's easy for me to do, I know the procedure, and I prefer to maintain MediaWiki's search functionality.
If you'd like to give some insight into how to convert the XML to HTML, in a way that maintains format (and supports the ParserFunctions extensions), that's fast and small that'd be great. I don't know if you've downloaded the XML dump already, it's about 300MB uncompressed, and about 164,000 articles. The static HTML dump will probably be larger than 300 MB (probably not by too much) because of the extra HTML markup.
I'm not reall sure if CHM is a good approach for such a large database.
BTW, if I do get permission to redistribute the SQL dump (I imagine I should, the Encylopodia guy [haven't kept up, dunno if he still does] used to distribute processed Wikipedia databases); then the import process is only about 15 minutes. That should be much faster than an HTML approach, as the static HTML dump will require parsing of each and every article. The dynamic parsing on MediaWiki only parses the pages as you view them (then optionally stores them in a cache).
Furthermore on my tone. I really do apologize if I sound angry - I'm really not in the mood (not because of you or this) to be replying. I really am curious of if you have any insight to a fast HTML approach or anything else.
I implied that the article content in the XML is with XML-like markup - but it's logical that it isn't if these dumps are mainly for import to MediaWiki. As to the parsing-time I would include only API-articles because the other aren't much relevant for coding. For static HTML instead of search functionality there could be a table of contents (or CHM which includes search functionality).
That the XML includes MediaWiki markup increases the work which has to be done so now I think offline-MediaWiki is easier ;)
I hope you'll get the permission to redistribute the SQL dump.
PS: Do you want to include every article of WoWWiki or API articles only? If former then it would be nice if there would be a "only API" dump/option.
The Wiki XML dumps only use XML as the "storage" method, not for article markup. There's page tag for a page, title, etc.. but the actual "meat" of the articles is still in the Wiki markup.
It appears I will have the rights to distribute the SQL dump, as long as I reference (and include a copy of) the GNU Free Document License. Compressed, the SQL dump is about 100mb (before importing) - I'll probably use gzip or bzip2 so it can be easily piped to mysql.
Unfortunately there's no real way of distinguishing API related pages programmatically (most are labeled "API" but some important pages like Events aren't). The best I can do is try to remove other distinguishable non-API pages. Currently I have filters for Users, talk pages, quest pages, and image pages (no image files are included in the dump, so no need to keep the descriptions). With these four filters, it drops to about half the articles - but I didn't have time yet to see how it affects the size of the database.
I will probably just supply 2 SQL dumps, the full version, and the "slim" version. And if it's popular enough, maybe I'll work on slimming it even moreso down the road.
I added more exclusion filters, currently they are: User Pages, Talk Pages, Image Pages, Quest Pages, NPC Pages and Item Pages - as most of the info will not be needed offline. The import from XML takes about 2.5 hours with all those filters, as there's about 75% less articles. However, I'll still provide the even faster SQL dump for the "slim" database - this is about a 1 minute import (instead of the 2.5 hours from XML).
For the people who prefer the XML import (slower):
I optimized the code of WoWWikiSplit (my C++ program for filtering it) for speed. I hard-coded the search and comparison keys, instead of using the POSIX strnicmp, strlwr, and strstr functions. Among other optimizations I reduced the CPU time (not including disk I/O) of WoWWikiSplit of a whole database with all filters from about 5 minutes to 2 seconds. When you actually do an import using WoWWikiSplit with all filters on a WoWWiki XML dump, it'll take about 2.5 hours - about 2-5 minutes of that will be CPU and disk I/O time due to WoWWikiSplit itself - the rest is MediaWiki's importdump script. I think that ratio should be acceptable given the benefit WoWWikiSplit gives.
Additionally WoWWikiSplit get's the Split part of its name by splitting the import into sections. The default is about 16MB sections. If it needs 19 splits, you can have it do 1-7 for now, then do the rest another time. Between each split it gives you a 5 second "ctrl+c safe" window. Hitting ctrl+c at that point will gracefully close WoWWikiSplit at a point which it can be safely resumed later. That will all be explained in the procedure.
The source code for WoWWikiSplit will be supplied along with the compiled binary when it's time.
This slim database takes up about 300mb after all is set-up. I think things are looking better and better in the practicality aspect.