Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 8 of 8

Thread: finding unescaped XML characters

  1. #1
    Junior Member
    Join Date
    Nov 2010
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default finding unescaped XML characters

    hi,

    I want to find the unscaped characters in some XML input in order to replace them with their escape sequences through my code.

    For instance, my XML input would consist unescaped and escaped & characters in it. What i want to achieve is that find only all the unescaped & characters and replace them with '&' i.e. their escape character.

    I have thought of using pattern matching in Java to solve this problem and have ofund out the pattern &[a-z]+; would help find all the escape sequences in the XML input.

    however, i am not able to create a pattern for finding the unescaped & characters in the XML input.

    please advise.

    regads,
    Diptee


  2. #2
    Junior Member
    Join Date
    Nov 2010
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: finding unescaped XML characters

    sorry the pattern i used for matching the escape sequences is &[\\w]+;

  3. #3
    mmm.. coffee JavaPF's Avatar
    Join Date
    May 2008
    Location
    United Kingdom
    Posts
    3,336
    My Mood
    Mellow
    Thanks
    258
    Thanked 294 Times in 227 Posts
    Blog Entries
    4

    Red face Re: finding unescaped XML characters

    Quote Originally Posted by diptee View Post
    sorry the pattern i used for matching the escape sequences is &[\\w]+;
    http://www.javaprogrammingforums.com...explained.html

    Please show me the code you have and an example xml input
    Please use [highlight=Java] code [/highlight] tags when posting your code.
    Forum Tip: Add to peoples reputation by clicking the button on their useful posts.

  4. #4
    Junior Member
    Join Date
    Nov 2010
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: finding unescaped XML characters

    The code i have used is as below
    /*
     * Created on 10/11/2008
     *
     */
     
    import java.io.*;
    import java.util.Map;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    import java.util.regex.PatternSyntaxException;
     
     
    public class SearchAndReplaceUnescapedAmpersand
    {
     
    	public static void main(String[] args) throws PatternSyntaxException
    	{
    		Console console = System.console();
    		String str = new String("&li; &");
    		try
    		{
    			Pattern notEscSeq = Pattern.compile("&[\\w]+;",Pattern.CASE_INSENSITIVE); //this pattern finds all the escape sequences I need to get the pattern right so that it finds the unescaped & chars
    			Matcher matcher=notEscSeq.matcher(str);
    			while (matcher.find()) {
                    		console.format("I found the text \"%s\" starting at " + "index %d and ending at index %d.%n", matcher.group(), matcher.start(), matcher.end());
    			}
    			System.out.println("Before: "+str);
    			//str = matcher.replaceAll("& ");
    		}
    		catch(PatternSyntaxException pse)
    		{
    			throw pse;
    		}
    		System.out.println("After: "+str);	
     
    	}	
    }
    and the XML input containing the unescaped & char would be like
    <Q-ENV:Attachment>
            <Q-ENV:Content-Type>application/zip</Q-ENV:Content-Type>
            <Q-ENV:Message-ID>urn:x-commerceone:package:com:commerceone:INV-1020464.01 & .02-attachment.zip</Q-ENV:Message-ID>
            <Q-ENV:Encoding>base64</Q-ENV:Encoding>
    The XML input might include escaped & chars also but not in the above location.

    regards,
    Diptee
    Last edited by copeg; November 16th, 2010 at 11:20 PM. Reason: Code and Highlight tags to remove smilies

  5. #5
    Administrator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,318
    Thanks
    181
    Thanked 833 Times in 772 Posts
    Blog Entries
    5

    Default Re: finding unescaped XML characters

    I've edited your post to place the code in formatting tags. In the future, it helps to flank code with [highlight=java]Code goes hear [/highlight] or [code]Code goes hear [/code] so those smilies are parsed into emoticons. You might consider using negations to exclude already escaped values. This won't work, but may get you started: "&[^;]{2,6}[^;]"

  6. #6
    Junior Member
    Join Date
    Nov 2010
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: finding unescaped XML characters

    hi i am slightly new to java patterns so can you please explain what would be achieved with the pattern you sugested &[^;]{2,6}[^;]

    All i undertsnd is that it will find the & characters which are not follwed by a ;

    please correct me if i am wrong here.

    and i tried to use the pattern you provided but it didnt help .

    Thanks for letting me know the correct way to post code

  7. #7
    Administrator copeg's Avatar
    Join Date
    Oct 2009
    Location
    US
    Posts
    5,318
    Thanks
    181
    Thanked 833 Times in 772 Posts
    Blog Entries
    5

    Default Re: finding unescaped XML characters

    Quote Originally Posted by diptee View Post
    hi i am slightly new to java patterns so can you please explain what would be achieved with the pattern you sugested &[^;]{2,6}[^;]
    All i undertsnd is that it will find the & characters which are not follwed by a ;
    please correct me if i am wrong here.
    and i tried to use the pattern you provided but it didnt help .
    Told you it wouldn't work , but thought it might lead you in the right direction. Basically, I thought your question was that you wanted to replace all '&' with '&amp;', but leave the other '&espacetext;' type of escaped values alone. Just doing a replace on the & obviously won't work. The regular expression I posted was along the lines of trying to look for amps that are not followed by the typical & escaped pattern. If it doesn't make much sense, I suggest looking through Regular Expressions for great info on regular expressions.

  8. #8
    Junior Member
    Join Date
    Nov 2010
    Posts
    5
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: finding unescaped XML characters

    yeah i started with the use of negations in my reg ex but coud nt really achieve wat i needed, will go thru the link u provided and get back if reqd.

    thanks

Similar Threads

  1. finding pi
    By gonfreecks in forum File I/O & Other I/O Streams
    Replies: 4
    Last Post: November 2nd, 2010, 05:15 PM
  2. Finding frequency and probability of characters in a string
    By Aberforth in forum What's Wrong With My Code?
    Replies: 4
    Last Post: October 31st, 2010, 02:02 AM
  3. regular expressions, characters unallowed in file names
    By chopficaro in forum Java Theory & Questions
    Replies: 3
    Last Post: May 6th, 2010, 03:17 PM
  4. How to check whether the string contains only the specified characters ????
    By j_kathiresan in forum Java Theory & Questions
    Replies: 3
    Last Post: April 30th, 2010, 08:49 AM
  5. Certain Chinese Characters not displayed properly.
    By kerwintang in forum File I/O & Other I/O Streams
    Replies: 1
    Last Post: August 20th, 2009, 08:23 AM