Web Data Parser Update: It's much more powerful!
December 18, 2008
If you saw the demo video for my initial beta version of Web Data Parser and left feedback, than you very much!
In case you're not sure what Web Data Parser is, it's a powerful tool lets you extract the data from any table on any web page and save it in a useful format (CSV for spreadsheets, TSV for databases, and as an HTML table). It also will extract the links and link text from ANY section of any page you choose.
I've added a number of great features since the initial beta video. They are:
- The ability to automate submitting forms and extracting data from the resulting page.
- The ability to select the table columns you want to export.
- The ability to save the data filtering settings you select so you can easily reuse them later.
- The ability to have WDP filter duplicate records out of the data when appending data to a file.
Ready to see these powerful features in action? Then click here to view the new video!
I hope to make the tool available next week, but at the latest it should be out the first week in January.
Please be sure to come back and leave your thoughts and ideas in a comment.
Comments
59 Responses to “Web Data Parser Update: It's much more powerful!”














Awesome Jon!
When are we going to have it in our hands to test with? I have an idea. This is definitely a cool app.
Buddy
Jon,
you are still on the ball. This utility would have helped me big time when writing my SEO self-help how to guide because I could have more easily converted ranking results from the web into graphs and charts. Obviously there are many other applications but i thought of this first. good stuff.
Wow, that automation piece looks real good, Jon. Nicely done.
Oh, and delete that first comment, wrong url
Jon,
thanks for such a great tool. i have a site where i need to pull data and the pages change from day to day as new data is added, but the pages are sequential, so i could create a long list of domain urls to check. if i automated the process could this program remember where it ran in to a bad url (meaning that page hasn't been created yet) and then return to see if it exists later?
thanks,
Kevin
Jon, WebDataParser is brilliant! Wish I'd thought of it.
It appears that WebDataParser is Windows-only, is that correct?
It appears that WebDataParser is Windows-only, is that correct?
Yes, it's Windows only.
Looks like you've got a winner there! And sure good to see how you implemented so many viewers' suggestions.
So now to the crucial question: for when do you anticipate launch date? And how much will it cost? As if I expected you to answer that last one here, heh…
But seriously: if you don't make it unduly expensive I'd really expect it to sell like hot cakes - just wait till the likes of CNet and even Slashdot catch on.
Keep up the excellent work!
I cant wait to use it. It will help with my book keeping so much.
I'm anxiously awaiting the release of this great tool. I suggested a feature last time but I was cryptic about it. Did you get my meaning?
It would be the ability to extract questions from web pages.
Along with everything else this does, that function would take it over the top.
So, when are we going to get our hands on this?
Jon,…
It keeps getting better and better! Will number columns and tables in (.pdf)'s be gatherable as well?
Thanks,
Skip…
Dr SEO Services
guaranteed seo services
Kudos Jon, for client interactive specification + fast turnaround of constructive feedback into implementation. Heap powerful medicine.
This round of improvements has transformed a neat "give away" grade utility into a professional tool worth paying for. In fact, it looks good enough already to freeze for v1.0 pre-release to selected clients.
Rather than needlessly complicate with marginal functionality, discover what (if anything) more is demanded by folks that actually USE the tool. Put another way, not all feedback is created equal!
I suspect the integration of WDP with complementary tools will turn out to be of more practical benefit than further raw functionality.
That looks great. Can it be used to access data on secure web pages I have to login to?
Among other things I previously wrote
how will you get around getting the ip address that is scraping the data from getting banned by the source url?
also would be nice if it could do pdf documents
Additional comments
it would be nice to have pre-established filters for your drop down box for
a) youtube
b) amazon
c) ebay
d) meta titles only
e) email addresses
f) phone numbers
This is a fantastic tool. Some responders will tend to use a lot of tek-ese to review your product. Let me just say that even a near beginner could benefit. GOOD JOB!
My gosh, Jonathan! This is absolutely awesome! I missed the first video so was I WOWed with this latest video. I can hardly wait to get my hands on it. Please keep programming as fast as you can. This is going to save me HOURS every week!
Jeanette
Great Tool. Perhaps I have done something wrong. The video for the original appears as the updated video as well. doesn't matter if I click the link from the email or from the blog site the video is the same.
Please let me know how to preview the newest video.
Donald
I did watch it clear through…
I too missed the first video for some reason but am sure glad I got to see this one! What a great tool and most creative of you to do it and then implement the suggestions. I look forward to the launch
Harry
Hi Jon,
Great Utility.
Would it be possible to Automate the automations, by this I mean run them at predetermined times, for example, no.1 runs at 1:00am, nom 2 runs after that, then no.3. etc
I have 5 or 6 sets of data I want to collect on a regular basis and it would be good if it was all ready in excel when I want to analyise it.
Does that make sense.
Thanks
Jim
Will It Crawl Multiple Pages? Like start on one result page and go through 100+ pages with same extraction filter?
Jonathan you are a legend!!
great video, look forward to using the software when it comes out.
Jonathan,
This is really looking to be very useful.
Does / Can the automation include username & password entries so as to automate that aspect too - like Roboform, but specific to the specific extracts of the moment?
Looking forward to next week's release…
Meantime, may I wish you & your loved ones, and everyone on this website, a very Merry Christmas & a Wonderful & Prosperous New Year.
Looks like a really good product.
I'd like to test it to see how it deals with different types of output though as the devil is always in the detail with scraping.
How does it deal with less structured output?
(ie. content not in a straight table)
I've tried to find software that will download an entire forum of threads, however, none actually owrks when it comes to password protected forums. Would be cool to find something like that.
Great stuff Jon. You really take your customers' requests seriously. This is a very good way to do business and I commend you highly.
Christmas wishes to you and your family.
Kind regards,
Barry
WOW!! This tool has gone from a pretty darn good tool to a fantastic tool. I was able to see the video by using a different browser. I am using a Mac and the original browser was Firefox 3.05. No matter what link I clicked on I got the old video. I switched to Safari 3.2.1 and Viola it worked just fine. So anybody out there that is having the same problem with viewing and is using a Mac and Firefox switch to Safari and it will work…
Jon, you didn't show an example of the software bringing in images and the accompanying text, it looks like it wouldn't be a problem. The example as suggested by the post by "Rhonda Morin" where she had the dressing table images and accompanying text and links is what I am referring to. I have assumed that if it is a table that it will get it, but rather than assume I will ask if it truly can get these types of tables?
Jon, as usual you have come up with a particularly useful piece of software and made it into a full blown extremely labor reducing, desirable, money making machine. If there were a prize for the most useful software producer of the year, I would certainly vote for you to win.
Donald
Since it is not available for us to test on our own application needs, I wonder if you could indicate which if any of these sites would this utility NOT work on. The reason I ask, is that these are less traditional data forms.
www.digitalcamerareview.com/deals/
www.yellowpages.com for example phoenix Insurance agents
www.hotels.com for example Phoenix hotels
movies.msn.com for example search for Will Smith
Thanks,
Hi John, This will be really useful, when will this be out in market.
Madhan.
Jon,
Dude, you've done it again! I don't know how you always come up with this stuff in your head…
Have you thought about having the resulted data automatically exported into WebCompAnalysis or even IAW?
If that would bog down and run too slow you could have it choose from 5-15 keywords and have them automatically open up WCA and run an analysis or you could even have an advanced/upgraded WebDataParcer that has WCA incorporated into it and also or either or both IAW incorporated into it so that once people have their information from WordTracker and Google it would automatically be imported into these other programs and have them running in the background or something like that..
It would add a heck of a lot of value and you could price it well enough that if the customer bought the Triple Deluxe WebDataParcer they would save a certain percentage by buying the deluxed version over buying them individually.
Just some thoughts…
For forms with captchas, perhaps you can create a popup window to insert the captchas, and put the process on hold. Enter captcha, press enter, and everything runs again. I'm sure you can identify which field is for captcha and what's not.
I wonder if the form automation part can also automates drop down menu selection. If that works out, it gonna kick butt.
Super and great software btw, can't wait for it to be out.
Hi Jon,
You seriously inspire me mate. This looks like a great tool. How would it handle tableless sites that are done in CSS? Thanks for creating such awesome tools, and showing us HOW to use them, and others to make money.
Best,
Dallas
This software is looking great, Jon.
There's so many things this is going to save me a ton of time with. Looking forward to the release.
It would be awesome if you could analyze multiple pages in succession. Still looks like a winner.
Do you have an eta on availability?
I see possibilities for gathering data for investment analysis.
Loved the new features - auto work flow was great,
Suggestion: I require a feature to also grab images from a web page/site - any chance of this feature being added - I am currently looking at website scrapers at the moment
This software is powerful and inspiring- cant wait for its release
Wow, huge improvement Jon… Way to Go!!!
I can't wait to take it for a test drive
Tracy
Hi,
Jon. Quite brilliant! You never cease to amaze me. One quick idea:
Could you have a 'search for this type of form' and then when found 'insert data from here into relevant boxes'. So instead of just repeating the same exercise on one site, it could go off and find other examples of the same form? Could be quite powerful.
Please let me know. Thanks, Allen
PS Would love to be a beta tester if needed…
Hey Jon,
Now I'm interested, you have a very nifty and useful application there. I wish I could have had it ages ago, it would have saved me dozens of hours in the last year alone, I can't imagine going without it in 2009!
Bring it on, please keep me posted.
Phil
Hello Jonathan,
Themelis Cuiper here from
Amsterdam The Netherlands.
We know each other from the other forums.
As a professional SEO data analyst and Oracle Database
Developer I must say you made a practical little tool that is more
handy in use than all the big brothers in this genre.
Your are THAT close to make it into a smart business-
intelligence tool that combines data into new data.
(like Dapper and the Kapow robots).
I know it is a data-parser but you are a few steps away to
build it into " The Marketers Mashup Toolkit"
To make that happen i recommend that you take Jim's
suggestion : automate the automation !!
Just another iterative round the block and you have the perfect
mashup tool.
You could let this run serieel, on time and ready signals
or multiple instances to go parallel
an automated preset , crawling multiple pages as set,
that can be set on time , taking or giving data from or to
another preset (for the same or other webpage as defined in this
second preset)
Now you can build sets of data.
You can put blocks before or after the other one.
You get diverse senarios: A1+A2=B1 B1+C1=D1
or any other combinations
let me write that out for you.
same webpage weburl
preset1 preset2 result=preset3
Webpage1CSV1 -feeds- Webpage1CSV2 -feeds- 0CSV1
other webpage url
preset3 preset4 result=preset5
0CSV1 -feeds- Webpage2CSV1 -feeds- 0CSV2
and indeed: treat your other program like WDP or other output CSV as a plugin preset.
How to make that practical with just few code:
You can keep the structured indata logica by puting in
a master detail relationship by primary-key and
foreign-key real simple by building a counter.
Just to do this WITHOUT a database-engine to
build in (bake in) the logica in plain textfiles
like in a way that early dBase used to do.
2extra column as to make a unique primary key more easy for
each CSV.
putin the same rownumber into an extra column for your CSV1 as
primary-key building your CSV2 with extra column as foreignkey.
table1 table2
PK R Name Transp PK Color Weight FK1 R FK2 R
CSV1 1 Smith Car CSV2 red 1ton CSV1 1 CSV6 9
CSV1 2 Jones Bike CSV2 green 200gr CSV1 2 CSV6 1
CSV1 3 Cook Plane CSV2 blue 10ton CSV1 3 CSV6 7
table3
table4
table5
table6
So now you keep the logic that
4 seats (table6 CSV6 9) are an instance
of the red car of mr.Jones
CSV2 was appended later when preset6 was made looping
trough the records on rownum (rowcount we have no database)
output CSV that are from other programs without the unique id
import -> give 2 new columns with
Unique ID(filename and rowcount)
if you append to an excisting file ID=filename and lastrow+1
another way to make this more clean is
with intersection tables
table1
|
|
/|\
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
| intersection
| table
| _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
\|/ \|/ \|/
| | |
| | |
table2 table3 table4
|
|
/|\
_ _ _ _ _
| intersec | 3 5
| 2 | 1 1
|_ _ _ _ _ | 1 8
\|/
|
|
table5
maybe a SQL export
writeto file%tablename%,sql;
create table as %tablename%;
loop
writeto file
insert into %tablename% values (%field1%, %field2%, field_n%);
en loop
Recap of the suggestion.
the automation preset you have put in extra :
login, jobschedule, crawl multiple pages,
let output be input selectable on fields to build new preset,
let the standard browser DNS be changed into port :8118
to work with Tor Union proxy or any other to set proxy portnr.
Just my 2c to you to help you get some creative ideas as
my teacher (the Toad we called him) said
" Themelis there is not only but 1 ways,
but get off yours to go many or you never see Rome".
And then i answered the Toad:
"Yes Sir, but Frank says I did it My way".
Johnathan !
I must say Kudo's Kudo's Kudo's to you
for this sweet program, i love it I love it! Great potential.
Themelis Cuiper
Great improvement. Good job!
This is seriously good. (I wasn't able to watch your first video as it froze up at 3.08, I think). As I use datafeeds a lot I find myself extracting data from tables a lot of the time from sites like Wikipaedia and About. It's a pain to do that. This will save me many hours every month.
Nice one!
Thanks so much for this - you've saved me literally hours of work. When's your next update planned? Thanks,
I love the automation feature. When will we have the chance to use it ourselves? And will you be offering any introductory pricing?
I can see many applications of this tool.
The last video looked good, but it just keeps getting better. I don't know how you do it! You already have so many awesome products, but you just don't stop. I bet in two years you'll have an empire, kind of like the one in Star Wars but less evil and no lightsabers.
This is a very interesting and useful tool you've created. It will be very useful for many applications.
Hi, I am a Houston realtor and I have been following your blog for sometime now but this is the first time I have commented. I just wanted to say thank you for the help you have given me as it has assisted in bringing me to the second page on a couple of keywords. I would like to build on this and I can see a couple of ideas coming forth using this tool. I appreciate everything you put up for your readers. Keep up the good work!
Hmmm… great ideas… These are things I need to spec out and build for my site. Never enough time for all of this building!
Absolutely awesome Jon! My head is spinning with all the potential uses for this. Cannot wait to get my hands on it, give it a good workout, and give you further feedback. Great work.
Logan
1. Please test and release your code for OSX Leopard Macs besides PCs.
2. I need to have a script that would use your utility to a) log into a subscription site every day with a defined account and password for their login form, b) extract the specified filtered tables, and c) export them to a newly named file for each day.
3. I can use Apple scheduler and Apple Script to do it, if your program can be driven easily from a script. Having to manually highlight in order to select the table data would be a drawback.
That is a big reask the same question? Will you be having a presale sale on this program? It sounds great.
Interesting. I've been looking at my googl analytics results when doing keyword research to know how people are finding me and using those keywords for more traffic. Anyway, Analytics has shown me results close to what you mention - the largest portion of my traffic comes from Google, a significantly lower amount from yahoo. However, when I search my keywords on Google vs Yahoo, many times they show up on the first page in top positions on Yahoo. Now I think I understand why…thanks for sharing.
Very good information. I really like the help you provide. i look forward to reading more from you in the future
I am currently in the process of building a brand new store. I cannot tell you how much time and effort your Web Parser is going to save me. It came along at precisely that right time for me!
We are going to give it a try and see how it helps us. I think we can really benefit with it.
Thanks for an awesome product!
This looks like a great tool Will definitely save us time and energy.
You com up with some awesome ideas.
All the best,
Eren
Awesome. I never knew such things before. Gonna make a good use of this data