Yes , i want to filter the links that contain "html" .
Yes, its just prints out every link !
Okay, I don't want to confuse you so i will show the program without having regular expression and an example of how it works.
import org.htmlparser.util.*;
import java.util.Iterator;
import org.htmlparser.*;
import org.htmlparser.tags.*;
import org.htmlparser.filters.*;
import java.util.HashSet;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Hash
{
static Matcher m;
static String st;
public static HashSet<String> visit (String s)
{
HashSet <String> s1 = new HashSet();
try{
Parser parser1 = new Parser (s);
NodeList list1 = parser1.parse (new LinkStringFilter("http:"));
for (int i=0;i<list1.size();i++)
{
String st = ((LinkTag)(list1.elementAt(i))).extractLink();
s1.add(st);
}
return s1;
}
catch (Exception e)
{
return new HashSet();
}
}
}
This the output of the program :
So all i want to do is implement a regular expression method onto the code. So it can filter all the links that the program has collected.
So that where u saw my fail attempt above - where i tried to do regex but all it did was print all the links instead filtering,
I hope i made it more sense to u now
!