How to extract text from web

**HelloAll** · January 9th, 2011, 07:18 AM

hi all,
i wanted to know whats the best way to extract text from a website? and then load it into a database?
at present i am using a web crawler to get the text from web. is there any other way of doing this?

**JavaHater** · January 9th, 2011, 08:15 AM

Originally Posted by HelloAll

hi all,
i wanted to know whats the best way to extract text from a website?

you can see an example here

**HelloAll** · January 9th, 2011, 10:54 PM

Thanks for the example.
But i need to crawl text not just from given url..but from url's within a given website(url).

**JavaHater** · January 9th, 2011, 11:22 PM

well, once you can work with getting the text from 1 url, you can parse the text, search for further links, and then do a url connection to get contents from those links found. you have to do some recursive stuff here.

**JavaPF** · January 10th, 2011, 05:50 AM

How about this in the code snippers forum:

http://www.javaprogrammingforums.com...bsite-url.html

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
 
public class GrabHTML {
 
 public static void Connect() throws Exception{
 
  //Set URL
  URL url = new URL("http://website here.com");
  URLConnection spoof = url.openConnection();
 
  //Spoof the connection so we look like a web browser
  spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
  BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
  String strLine = "";
 
  //Loop through every line in the source
  while ((strLine = in.readLine()) != null){
 
   //Prints each line to the console
   System.out.println(strLine);
  }
 
  System.out.println("End of page.");
 }
 
 public static void main(String[] args){
 
  try{
   //Calling the Connect method
   Connect();
  }catch(Exception e){
 
  }
 }
}

It will grab the HTML source of a webpage. You can then process it as you wish..

Thread: How to extract text from web

LinkBack

Thread Tools

Display

How to extract text from web

Related threads:

Re: How to extract text from web

Re: How to extract text from web

Re: How to extract text from web

Re: How to extract text from web

Similar Threads

java program to copy a text file to onother text file

Text in Swing

How to extract a particular element details which has more references ???

how to extract variables,keywords,operator...