Originally Posted by
pari89
I need to parse an html web page to extract specific information from the tags in Java. For example,
<b>Species </b> Strain </td>
I need to look for the Strain info (Strain is variable in length) in the page. The whole web page is stored as a huge string.
In general you have two options: the (possibly heavy) use of regular expressions (or other String functions) or the use of a specialized library like jsoup (an HTML parser, search on the web).
In either cases, you have to clearly know what are the precise rules to find/extract the text you want. For example: is "xyz" (what you want) always at end of <td> tag? Is there always a <b> tag before what you want? Is it possible that there is a <i> (or other) instead of <b>? Do you want only last word or all words at the end? Do you care about surrounding spaces?