I'm a new Computer Science student and I have been given a task to scan the contents of a websites source code, and use delimiters to extract all hyperlinks from the site and display them. We havent been told anything about how to do this so after some looking around online this is what I have so far:
import java.io.BufferedReader; import java.io.IOException; import java.io.InputStreamReader; import java.net.MalformedURLException; import java.net.URL; import java.util.Scanner; public class HyperlinkMain { public static void main(String[] args) { try { Scanner in = new Scanner (System.in); String URL = in.next(); URL website = new URL(URL); BufferedReader input = new BufferedReader(new InputStreamReader(website.openStream())); String inputLine; while ((inputLine = input.readLine()) != null) { // Process each line. System.out.println(inputLine); } in.close(); } catch (MalformedURLException me) { System.out.println(me); } catch (IOException ioe) { System.out.println(ioe); } } }
}
So my program can extract each line from the source code of a website and display it, but realistically I want it to extract each WORD as such from the source code rather than every line. I've looked around online but I don't really know how it's done because I keep getting errors when I use input.read();
Could anyone help me understand how to make it extract each word from the source code? Would be highly appreciated