Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 7 of 7

Thread: Possible regex problem?

  1. #1
    Junior Member
    Join Date
    Oct 2009
    Posts
    26
    Thanks
    5
    Thanked 2 Times in 2 Posts

    Default Possible regex problem?

    Ok, before I explain right away, I want to mention that the purpose of this program I'm making is to create a class which contains general use methods that we use in projects for my Computer Science class. Anyway, the purpose behind this particular method I'm having trouble with is to take the input String (like a line of text the user enters) and it will replace all non-letter characters (~!@#$%^&* and such) with a new String that the user or programmer chooses.

    For example: The user is asked to enter a line of text with symbols included in it (!~@#$%^&*, etc.) and then chooses what the replace every character, the non-letter ones, with. So if they want to replace them with a blank space it would do this:

    Input: This*^ is a&test!@sente~nce
    Output: This is a test sente nce

    Or if they want something other than a blank space, like an underscore:

    Input: This*^ is a&test!@sente~nce
    Output: This___is_a _test__sente_nce

    Now, my program works fine up to this point. However, when I get to the point where I want to change a [ into, for example, a blank space, I get this error:

    Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 0
    [
    ^
    	at java.util.regex.Pattern.error(Pattern.java:1713)
    	at java.util.regex.Pattern.clazz(Pattern.java:2254)
    	at java.util.regex.Pattern.sequence(Pattern.java:1818)
    	at java.util.regex.Pattern.expr(Pattern.java:1752)
    	at java.util.regex.Pattern.compile(Pattern.java:1460)
    	at java.util.regex.Pattern.<init>(Pattern.java:1133)
    	at java.util.regex.Pattern.compile(Pattern.java:823)
    	at java.lang.String.replaceAll(String.java:2189)
    	at CommonUse.replaceChars(CommonUse.java:68)
    	at Test.main(Test.java:9)

    CommonUse is the name of the class which contains replaceChars(), obviously, and the Test class is just that, a class to test my method.
    So, at line 68 in my CommonUse class:
    input = input.replaceAll("[", replacement);

    I have looked around in the Java API on the regex (regular-expression) constructs and think this has something to do with the Groups and Capturing since the [ opens but never closes. I've tried messing around with escape characters like \\[ but that doesn't help any. And after reading on the Groups and Capturing, I tried all kind of combos using the ?: with no luck. I don't fully understand how regex works, other than the basic escape sequences for outputs.

    So, can anyone help me figure out how to change any [ into another character/string? Also, I believe I'll have the same problem with characters like ], {, }, <, >, etc., so will one method apply to all of them?

    Just for reference, here are some bits of code:

    CommonUse class:
    // Returns a new String of the parameter (input) after replacing all non-letter characters
    	// ( ~!@#$%^&*()_+`-=[]\{}|;':",./<>?) with the string/character of choice defined in the second
    	// parameter (replacement).
    	public static String replaceChars(String input, String replacement) {
    		input = input.replaceAll("0", replacement);
    		input = input.replaceAll("1", replacement);
    		input = input.replaceAll("2", replacement);
    		input = input.replaceAll("3", replacement);
    		input = input.replaceAll("4", replacement);
    		input = input.replaceAll("5", replacement);
    		input = input.replaceAll("6", replacement);
    		input = input.replaceAll("7", replacement);
    		input = input.replaceAll("8", replacement);
    		input = input.replaceAll("9", replacement);
     
    		input = input.replaceAll(" ", replacement);
    		input = input.replaceAll("~", replacement);
    		input = input.replaceAll("!", replacement);
    		input = input.replaceAll("@", replacement);
    		input = input.replaceAll("#", replacement);
    		input = input.replaceAll("\\$", replacement);
    		input = input.replaceAll("%", replacement);
    		input = input.replaceAll("\\^", replacement);
    		input = input.replaceAll("&", replacement);
    		input = input.replaceAll("\\*", replacement);
    		input = input.replaceAll("\\(", replacement);
    		input = input.replaceAll("\\)", replacement);
    		input = input.replaceAll("_", replacement);
    		input = input.replaceAll("\\+", replacement);
    		input = input.replaceAll("\\`", replacement);
    		input = input.replaceAll("\\-", replacement);
    		input = input.replaceAll("\\=", replacement);
    		input = input.replaceAll("[", replacement);
     
    		/* Commented out because I want to debug the [ problem first
    		input = input.replaceAll("]", replacement);
    		input = input.replaceAll("\\", replacement);
    		input = input.replaceAll("{", replacement);
    		input = input.replaceAll("}", replacement);
    		input = input.replaceAll("\\|", replacement);
    		input = input.replaceAll(";", replacement);
    		input = input.replaceAll("'", replacement);
    		input = input.replaceAll(":", replacement);
    		input = input.replaceAll("\"", replacement);
    		input = input.replaceAll(",", replacement);
    		input = input.replaceAll(".", replacement);
    		input = input.replaceAll("/", replacement);
    		input = input.replaceAll("<", replacement);
    		input = input.replaceAll(">", replacement);
    		input = input.replaceAll("\\?", replacement);
    		*/
    		return input;
    	}

    Test class:
    public class Test {
    	public static void main(String[] args) {
    		// Working with my own test string before trying user input
    		String example = "this is a test ~!@#$%^& of  the *()_+ pro`-=[ gram";
     
    		System.out.println(example);
    		example = CommonUse.replaceChars(example, " ");
    		System.out.println(example);
    	}
    }

    Any help is appreciated! Thanks!


    EDIT: I managed to get it to work if I use an escape sequence for the unicode of [:
    input = input.replaceAll("\\u005B", replacement);

    So I suppose I just need to find the Unicode for all of those characters I have problems with? Is there a better way? And I'd really appreciate it if someone could explain to me WHY "\\u005B" works but "\\[" doesn't.

    Thanks again!
    Last edited by Bill_H; October 24th, 2009 at 02:16 AM.


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,895
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Possible regex problem?

    Use the "quote" character to take the characters as is instead of parsing them. Quotes start with \Q and end with \E (but since it needs to be passed as a string, use \\Q to start and \\E to end)

  3. The Following User Says Thank You to helloworld922 For This Useful Post:

    Bill_H (October 24th, 2009)

  4. #3
    Junior Member
    Join Date
    Oct 2009
    Posts
    26
    Thanks
    5
    Thanked 2 Times in 2 Posts

    Default Re: Possible regex problem?

    Quote Originally Posted by helloworld922 View Post
    Use the "quote" character to take the characters as is instead of parsing them. Quotes start with \Q and end with \E (but since it needs to be passed as a string, use \\Q to start and \\E to end)
    Thanks! Learn something new every day

  5. #4
    Java kindergarten chronoz13's Avatar
    Join Date
    Mar 2009
    Location
    Philippines
    Posts
    659
    Thanks
    177
    Thanked 30 Times in 28 Posts

    Default Re: Possible regex problem?

    your program is intereting for me... if you dont mind.. can you post it (the whole ) .. so we (i mean me).. can learn from it as well?... if it is ok?

  6. #5
    Junior Member
    Join Date
    Oct 2009
    Posts
    26
    Thanks
    5
    Thanked 2 Times in 2 Posts

    Default Re: Possible regex problem?

    Quote Originally Posted by chronoz13 View Post
    your program is intereting for me... if you dont mind.. can you post it (the whole ) .. so we (i mean me).. can learn from it as well?... if it is ok?
    Sure! At the moment I've only got two methods, the one I was asking about in this thread, and you may notice the other one from my other thread, chronoz13. At the time of posting this, I have altered the code slightly after receiving help, so the doAgain() method does work. The replaceChars() method is also complete and fully tested.

    import java.util.Scanner;
     
    public class CommonUse {
     
    	// Asks if user wants to repeat something. Parameter (output) is defined by programmer when calling
    	// this method based on the wording they want to use for the method. Example: "Would you like to
    	// retry?" or "Would you like to play again?"
    	public static boolean doAgain(String output) {
    		boolean repeat = true;
    		char response;
    		Scanner keyboard = new Scanner(System.in);
     
    		System.out.println(output);
    		System.out.println("Y or y = Yes\nN or n = No");
    		response = keyboard.nextLine().charAt(0);
     
    		if ((response == 'Y') || (response == 'y'))
    			repeat = true;
    		else if ((response == 'N') || (response == 'n'))
    			repeat = false;
    		else {
    			System.out.println("Incorrect character entered. Please try again.\n");
    			return doAgain(output);
    		}
    		return repeat;
    	}
     
    	// Returns a new String of the parameter (input) after replacing all non-letter characters
    	// (0123456789 ~!@#$%^&*()_+`-=[]\{}|;':",./<>?) with the string/character of choice defined
    	// in the second parameter (replacement).
    	public static String replaceChars(String input, String replacement) {
    		input = input.replaceAll("0", replacement);
    		input = input.replaceAll("1", replacement);
    		input = input.replaceAll("2", replacement);
    		input = input.replaceAll("3", replacement);
    		input = input.replaceAll("4", replacement);
    		input = input.replaceAll("5", replacement);
    		input = input.replaceAll("6", replacement);
    		input = input.replaceAll("7", replacement);
    		input = input.replaceAll("8", replacement);
    		input = input.replaceAll("9", replacement);
     
    		input = input.replaceAll(" ", replacement);
    		input = input.replaceAll("~", replacement);
    		input = input.replaceAll("!", replacement);
    		input = input.replaceAll("@", replacement);
    		input = input.replaceAll("#", replacement);
    		input = input.replaceAll("\\$", replacement);
    		input = input.replaceAll("%", replacement);
    		input = input.replaceAll("\\^", replacement);
    		input = input.replaceAll("&", replacement);
    		input = input.replaceAll("\\*", replacement);
    		input = input.replaceAll("\\(", replacement);
    		input = input.replaceAll("\\)", replacement);
    		input = input.replaceAll("_", replacement);
    		input = input.replaceAll("\\+", replacement);
    		input = input.replaceAll("\\`", replacement);
    		input = input.replaceAll("\\-", replacement);
    		input = input.replaceAll("\\=", replacement);
    		input = input.replaceAll("\\Q[\\E", replacement);
    		input = input.replaceAll("]", replacement);
    		input = input.replaceAll("\\Q\\\\E", replacement);
    		input = input.replaceAll("\\Q{\\E", replacement);
    		input = input.replaceAll("}", replacement);
    		input = input.replaceAll("\\|", replacement);
    		input = input.replaceAll(";", replacement);
    		input = input.replaceAll("'", replacement);
    		input = input.replaceAll(":", replacement);
    		input = input.replaceAll("\"", replacement);
    		input = input.replaceAll(",", replacement);
    		input = input.replaceAll("\\Q.\\E", replacement);
    		input = input.replaceAll("/", replacement);
    		input = input.replaceAll("<", replacement);
    		input = input.replaceAll(">", replacement);
    		input = input.replaceAll("\\?", replacement);
     
    		return input;
    	}
    }

    Also note that for now, I only include symbols to be replaced which are able to by easily typed using a US keyboard. Accented letters and other symbols which require Unicode or some other form of input are not accounted for. The reason being is that if I really need to override a particular character/symbol, I can easily include the one line of the replaceAll() method in a program rather than calling this entire method for one or two instances. Also, I do plan on extending the method to allow the programmer/user to choose which characters to omit when replacing. For example, a third parameter might be the character/symbol to omit, therefor the following:
    System.out.println(CommonUse.replaceChars("This#is a tes&t", "_", " ");

    would give replaceChars() the text "This#is a tes&t", telling it to replace all non-letter characters with underscores "_", but any blank spaces " " which appear in the text will not be replaced. Thus the output will be:
    This_is a tes_t

    Feel free to alter the code how you want!
    Last edited by Bill_H; October 24th, 2009 at 11:00 AM.

  7. The Following User Says Thank You to Bill_H For This Useful Post:

    chronoz13 (October 25th, 2009)

  8. #6
    Java kindergarten chronoz13's Avatar
    Join Date
    Mar 2009
    Location
    Philippines
    Posts
    659
    Thanks
    177
    Thanked 30 Times in 28 Posts

    Default Re: Possible regex problem?

    i just want to see (to keep it ) the last change that you made on your code.. and ill study it.
    tnx friend!!







    Sure! At the moment I've only got two methods, the one I was asking about in this thread, and you may notice the other one from my other thread, chronoz13. At the time of posting this, I have altered the code slightly after receiving help, so the doAgain() method does work. The replaceChars() method is also complete and fully tested.
    ahh so thats why you need a char input not a string one... ahh hehe.. sori i didnt make a help on your other post ..
    Last edited by chronoz13; October 25th, 2009 at 12:17 AM.

  9. #7
    Junior Member
    Join Date
    Oct 2009
    Posts
    26
    Thanks
    5
    Thanked 2 Times in 2 Posts

    Default Re: Possible regex problem?

    Quote Originally Posted by chronoz13 View Post
    i just want to see (to keep it ) the last change that you made on your code.. and ill study it.
    tnx friend!!









    ahh so thats why you need a char input not a string one... ahh hehe.. sori i didnt make a help on your other post ..
    That's alright, a string would also work, but in this case I believe a single character input would limit the amount of possible errors and outcomes. In theory string will work though, as long as you use the equalsIgnoreCase() method.

    And you're welcome!

Similar Threads

  1. Regex Question
    By igniteflow in forum Java SE APIs
    Replies: 1
    Last Post: August 28th, 2009, 11:46 AM
  2. [SOLVED] Java Regular Expressions (regex) Greif
    By username9000 in forum Java SE APIs
    Replies: 4
    Last Post: June 11th, 2009, 05:53 PM
  3. Replies: 1
    Last Post: February 28th, 2009, 10:05 PM