I want to decode data with the encoding ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:
to standard text. For example =0D=0A turns into \n and so on, but I was unable to find a specific standard java method that automatically does this...
Welcome to the Java Programming Forums
The professional, friendly Java community. 21,500 members and growing!
The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.
>> REGISTER NOW TO START POSTING
Members have full access to the forums. Advertisements are removed for registered users.
I want to decode data with the encoding ENCODING=QUOTED-PRINTABLE;CHARSET=UTF-8:
to standard text. For example =0D=0A turns into \n and so on, but I was unable to find a specific standard java method that automatically does this...
What format is the source and what format do you want the output in.For example =0D=0A turns into \n
The value of '\n' is 0x0A and '\r' is 0x0D
The source is a contact VCARD, and each entry goes on one line if not quoted printable but goes in one or more is used the ENCODING=QUOTED-PRINTABLE string, indicating an ending with the '=' character that continues on the next line:
BEGIN:VCARD VERSION:2.1 N:Gump;Forrest FN:Forrest Gump ORG:Bubba Gump Shrimp Co. TITLE:Shrimp Man TEL;WORK;VOICE:(111) 555-1212 TEL;HOME;VOICE:(404) 555-1212 ADR;WORK:;;100 Waters Edge;Baytown;LA;30314;United States of America LABEL;WORK;ENCODING=QUOTED-PRINTABLE:100 Waters Edge=0D=0ABaytown, LA 30314=0D=0AUnited States of America ADR;HOME:;;42 Plantation St.;Baytown;LA;30314;United States of America LABEL;HOME;ENCODING=QUOTED-PRINTABLE:42 Plantation St.=0D=0ABaytown, LA 30314=0D=0AUnited States of America EMAIL;PREF;INTERNET:forrestgump@example.com REV:20080424T195243Z END:VCARD
http://en.wikipedia.org/wiki/VCard
The characters that can be encoded are a lot, not just CRLF, but also {,=, etc.
Last edited by efluvio; June 5th, 2011 at 12:17 PM.
Sorry I don't understand what the input is that you need to convert.
Can you post some simple examples without all the other stuff?
Show the input String and the output String you want.
It looks like you want to convert the String "=0D" to the String "\r"
Last edited by Norm; June 5th, 2011 at 12:24 PM.
Yes, but convert all the cases.
You can check here a webpage that does online conversion, but I would like to do it with software: Encode/Decode Quoted Printable - Webatic
Last edited by efluvio; August 21st, 2012 at 01:15 PM.
If you change all the = to % there may be an URL decoding class that would convert the hex representation for a character to the character.
Look at the URLDecoder class.
Last edited by Norm; June 5th, 2011 at 01:02 PM.
It doesn't work if there are some characters (that are not supposed to be in a URL):
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "*%" at java.net.URLDecoder.decode(Unknown Source) at java.net.URLDecoder.decode(Unknown Source)
Last edited by efluvio; June 13th, 2011 at 04:26 PM.
Can you write a simple program with a String to decode that demonstrates the problem.
Since the % is a special char to the decoder, you'll need to change it to something else before decoding and then change it back to the %. Or change it to %25 and let the decoder change back.
I've solved it using the org.apache.commons.codec package from Apache. The source java files are available for download:
Codec - Home
It contains the method org.apache.commons.codec.net.QuotedPrintableCodec. decodeQuotedPrintable that does the UTF-8 quoted-encoding decodification.
readString is a string read from a file
String decodedString = new String(org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(readString.getBytes()), "UTF-8");
The readString must contain all the encoded string into a single line, thus if the encoded string is multiline (represented by a '=' at the end of each line), first we must remove each final '=' and append the next string. Then send it to decode.
Of course if the original code is with a different codification from UTF-8 the appropriate must be selected.
The complete UTF-8 character table with its Unicode and Hexadecimal values is shown here:
http://www.utf8-chartable.de/
Also, the output file must be saved in UTF-8 (due to characters in quoted-printable that when decoded can't be represented in ASCII).
So, for input file there must be specified also the input charset.
Last edited by efluvio; June 25th, 2011 at 05:36 PM.