Regular Expression in Java for HTML pages

**pari89** · February 18th, 2014, 04:49 AM

hi,

I need to parse an html web page to extract specific information from the tags in Java. For example,

Species Strain </td>

I need to look for the Strain info (Strain is variable in length) in the page. The whole web page is stored as a huge string. I need a regular expression that can help me identify all the Species and retrieve their corresponding strain info.

Does someone has a clue how to do this or can propose some clever string manipulation methods in Java.

Thank you.

**GregBrannon** · February 18th, 2014, 06:26 AM

Welcome to the Forum! Please read this topic to learn how to post code correctly and other useful tips for newcomers.

What have you tried?

**andbin** · February 18th, 2014, 08:15 AM

Originally Posted by pari89

I need to parse an html web page to extract specific information from the tags in Java. For example,

Species Strain </td>

I need to look for the Strain info (Strain is variable in length) in the page. The whole web page is stored as a huge string.

In general you have two options: the (possibly heavy) use of regular expressions (or other String functions) or the use of a specialized library like jsoup (an HTML parser, search on the web).

In either cases, you have to clearly know what are the precise rules to find/extract the text you want. For example: is "xyz" (what you want) always at end of <td> tag? Is there always a tag before what you want? Is it possible that there is a (or other) instead of ? Do you want only last word or all words at the end? Do you care about surrounding spaces?

Thread: Regular Expression in Java for HTML pages

LinkBack

Thread Tools

Display

Regular Expression in Java for HTML pages

Related threads:

Re: Regular Expression in Java for HTML pages

Re: Regular Expression in Java for HTML pages

Similar Threads

validate a String in java Regular Expression

[SOLVED] Regular Expression Difficulties...

Java regular expression optimization - help needed

Regular Expression help

Using Regular Expression (regex) in Java Programming

Tags for this Thread