so i have a dataminer based mostly on libpt's dataminer. it scrapes wowhead to get data.
it seems that the data i get using the "getpage()" function is not the same as the data i get if i use my browser and view source. namely, the missing data is the data i'm hoping to get.
i believe they've shared data with norganna. not sure if they're into sharing with others or what. since they get scraped anyways, you'd think a low bandwidth version of their data would save everybody a ton of trouble.
i'm wondering if my problem is related to them trying to detect my browser so they can format appropriately (like for a phone or something). the data i'm not getting is the tabbed columns of things like milling data (view a pigment and then collect which herbs it comes from).
i don't know if libpt's data scanner serves its own purposes, but the code used to get data from wowhead doesn't work for me. it used to.
basically, i know squat about how one goes about collecting data from a web site. so i just copied the basic code from the libpt miner. i'm basically calling "getpage()" which takes a url and gives you a big text string representation of the html page located at that address.
but the results from getpage() don't contain that info. it USED to, but i'm thinking maybe there's some token being fed to the server that indicates the browser i'm using to assist in formatting. that's the only reason i can figure why one means of collecting the data fails where the other succeeds.
just tried again to see if maybe it was change to wowhead... no luck.
i get data for the page, but if i copy and paste that data to a file (x.html) and then open that file, i get a "tooltip" page instead of the normal webpage with all the extra columns and such.
i'll see about setting up irc, but i'm not really "at my computer" at the moment.
edit: maybe i'll try it on my mac. i'm thinking it might be related to different installed libs and interactions with the web...
it seems that the data i get using the "getpage()" function is not the same as the data i get if i use my browser and view source. namely, the missing data is the data i'm hoping to get.
anybody familiar with the libpt dataminer code?
i'm wondering if my problem is related to them trying to detect my browser so they can format appropriately (like for a phone or something). the data i'm not getting is the tabbed columns of things like milling data (view a pigment and then collect which herbs it comes from).
What exactly is your problem with the datamining?
basically, i know squat about how one goes about collecting data from a web site. so i just copied the basic code from the libpt miner. i'm basically calling "getpage()" which takes a url and gives you a big text string representation of the html page located at that address.
i'm querying by itemID. so i call getpage("http://www.wowhead.com/item=39340")
so if you look at the html source of the page in a browser, there is a group of "listview" tables ("milled-from" or example).
http://www.wowhead.com/item=39340
but the results from getpage() don't contain that info. it USED to, but i'm thinking maybe there's some token being fed to the server that indicates the browser i'm using to assist in formatting. that's the only reason i can figure why one means of collecting the data fails where the other succeeds.
what do you reget as return from the call?
so you used the latest libpt's "getpage()" function on that itemID and it gave you a page that included the "milled-from" data?
hmm...
btw: hop on IRC (#wowace on irc.freenode.net) if you can, easier talking than via forums :)
i get data for the page, but if i copy and paste that data to a file (x.html) and then open that file, i get a "tooltip" page instead of the normal webpage with all the extra columns and such.
namely, this data i need that's missing:
this along with all the other Listview blocks...
i'll see about setting up irc, but i'm not really "at my computer" at the moment.
edit: maybe i'll try it on my mac. i'm thinking it might be related to different installed libs and interactions with the web...
also, the wowhead data now includes detailed info on stack counts for mills and prospects. woot.