Hi mark92, welcome to the forums!
When you post code, please use "code" tags. Put [code] at the start of the code, and [/code] at the end: that way the code will be formatted and readable when it appears on a web page.
---
Is there some reason why you don't use some code to properly parse the returned page content? Any robust alternative would seem to involve a lot of reading of the specifications for html/css/
js and even then you would have to contend with the nonstandard content all over teh internet.
But, if you want to hack about with the returned text - and this *is* a reasonable exercise in using regex and the Java String methods - then you have already been given a good answer at
codecall.net. (download the file, parse out the links with regex and/or indexOf(), and download their content) Your best bet would be to engage with the poster in that other thread and *ask* about anything that is unclear.
If, in any case, you want to have the discussion in multiple places then it is only polite to provide links at each site to the others so that everybody taking part in the discussion knows what else is being said. Bear in mind that many people will not respond to cross posts for fear of wasting their time contributing to something that has been dealt with elsewhere.
Also at
codecall.net