Simple Hashsets and regular expression problem !

**burningflower** · March 10th, 2011, 08:56 AM

Hey people,

I am trying to make a program on which is able to gather some url from a web page through the method html parser and then filter them.

The section where the url is gathered is done - its is placed under hashsets.

But i find it hard to take the links thats is in the hashtable and filter them with regular expression !

I just want to take the hyperlinks and filter !

I am sure this is a simple problem - but java is not my speciality so i am slow when it comes to this.

here is the program so far !

 
import org.htmlparser.util.*;
import java.util.Iterator;	
import org.htmlparser.*;
import org.htmlparser.tags.*;
import org.htmlparser.filters.*;
import java.util.HashSet;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Hash
{
 
	static Matcher m;
	static String st;
 
	public static HashSet<String> visit (String s) 
	{
		HashSet <String> s1 = new HashSet(); 
		try{
			Parser parser1 = new Parser (s);
		    NodeList list1 = parser1.parse (new LinkStringFilter("http:")); 
 
 
 
			for (int i=0;i<list1.size();i++)
 
			{
 
			    String st = ((LinkTag)(list1.elementAt(i))).extractLink();
 
	        		 s1.add(st);	
	                Iterator iter = s1.iterator();
	                while (iter.hasNext()){
 
	               String str = (String)iter.next() ;
 
 
	                	Pattern pattern = Pattern.compile("html");
	        		    m = pattern.matcher(str);
	                }
	                	if (m.find()){
 
 
 
	        		     System.out.println(m);
 
 
 
			}
			}
 
 
 
 
 
 
 
 
 
 
 
 
			return s1;
		   }		
		catch (Exception e)
		{
			return new HashSet();
 
		}	
 
}

I try to put an regular expression method but its prints out all of the links instead of the links with html on it.

I hope someone could help me !

**copeg** · March 10th, 2011, 10:19 AM

I'm not sure I understanding what you are asking. Are you just trying to just get the links that end in ".html"? Does your code print out every link? Your code is quite difficult to read with the non-preserved tabbing and excess spaces - and I would recommend posting an SSCCE with a hardcoded example that demonstrates the issue you are having.

**burningflower** · March 10th, 2011, 01:43 PM

Yes , i want to filter the links that contain "html" .
Yes, its just prints out every link !

Okay, I don't want to confuse you so i will show the program without having regular expression and an example of how it works.

import org.htmlparser.util.*;
import java.util.Iterator;
import org.htmlparser.*;
import org.htmlparser.tags.*;
import org.htmlparser.filters.*;
import java.util.HashSet;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Hash
{

static Matcher m;
static String st;

public static HashSet<String> visit (String s)
{
HashSet <String> s1 = new HashSet();
try{
Parser parser1 = new Parser (s);
NodeList list1 = parser1.parse (new LinkStringFilter("http:"));

for (int i=0;i<list1.size();i++)

{

String st = ((LinkTag)(list1.elementAt(i))).extractLink();

s1.add(st);

}

return s1;
}
catch (Exception e)
{
return new HashSet();

}
}
}

This the output of the program :

Week 3 - Revision
CIS228/CIS229 2nd Year Programming
Week 1 Revision and intro to Eclipse
Week 3 - Revision
CIS228/CIS229 2nd Year Programming
Week 1 Revision and intro to Eclipse
http://sebastian.doc.gold.ac.uk/cis229/sort.java
http://sebastian.doc.gold.ac.uk/cis229/oxo1.class
http://sebastian.doc.gold.ac.uk/cis2...tsMissing.java
Sebastian Danicic
Computing, Goldsmiths, University of London

So all i want to do is implement a regular expression method onto the code. So it can filter all the links that the program has collected.

So that where u saw my fail attempt above - where i tried to do regex but all it did was print all the links instead filtering,

I hope i made it more sense to u now

!

Thread: Simple Hashsets and regular expression problem !

LinkBack

Thread Tools

Display

Simple Hashsets and regular expression problem !

Related threads:

Re: Simple Hashsets and regular expression problem !

Re: Simple Hashsets and regular expression problem !

Similar Threads

Basic Math Expression Java Problem

Very simple problem...PLEASE HELP!

Simple problem...

Basic Math Expression Java Problem

Using Regular Expression (regex) in Java Programming

Tags for this Thread