Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 6 of 6

Thread: Web Crawling is too Slow!

  1. #1
    Junior Member
    Join Date
    Sep 2010
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Exclamation Web Crawling is too Slow!

    Hello Sir,

    my code (pasted below) working properly but at very slow rate. can you please suggest any changes required (if any) from programming side to make to work fast. thanks in advance.
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.net.URL;
    import java.net.URLConnection;
    import java.net.PasswordAuthentication;
    import java.net.*;
    import java.io.*;
    import java.lang.Object;
     
    class ProxyAuthenticator extends java.net.Authenticator {
     
        private String user, password;
     
        public ProxyAuthenticator(String user, String password) {
            this.user = user;
            this.password = password;
        }
     
        protected java.net.PasswordAuthentication getPasswordAuthentication() {
            return new PasswordAuthentication(user, password.toCharArray());
        }
     
    }
     
    public class CrawlWeb1 {
     
    	public static void main(String args[]) {
    		try {
    		  	Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(<IPAddr>, <Port>));
          			Authenticator.setDefault(new ProxyAuthenticator(<username>, <password>));
    			String surl = "http://www.timesofindia.com/";
    			URL asksearch = new URL(surl);
    			URLConnection yc = asksearch.openConnection(proxy);
    			BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
    			String inputLine;
    			FileWriter fw = new FileWriter("timeofindia.html");
    			BufferedWriter out = new BufferedWriter(fw);
    			Boolean body = false;
    			while ((inputLine = in.readLine()) != null) {
    				out.write(inputLine);
    			}
    			System.out.println("Crawling Done...! URL : " +surl);
    			out.close();
    			fw.close();
    			in.close();
    		} catch (Exception e) {
    			e.printStackTrace();
    		}
    	}
    }
    Last edited by copeg; September 20th, 2010 at 08:39 AM.


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,895
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Web Crawling is too Slow!

    That is essentially as fast as that code is going to be (other than possibly a few insignificant optimizations).

    The only way I can see this code running faster is if your network speed gets faster, or if the website you're connecting to decides to allocate more bandwidth for you. Whichever is the slower of the two will limit what your actual download speed is.

  3. #3
    Super Moderator Json's Avatar
    Join Date
    Jul 2009
    Location
    Warrington, United Kingdom
    Posts
    1,274
    My Mood
    Happy
    Thanks
    70
    Thanked 156 Times in 152 Posts

    Default Re: Web Crawling is too Slow!

    In what way is your crawl slow, where exactly does the code stop for a while?

    Another reason for the url fetch to be slow could be the fact that the server you are connecting to is very slow in handling requests.

    // Json

  4. #4
    Junior Member
    Join Date
    Sep 2010
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Re: Web Crawling is too Slow!

    thank you both. my n/w is very slow now a days!. just want to confirm that nothing can be done from programming side.

  5. #5
    mmm.. coffee JavaPF's Avatar
    Join Date
    May 2008
    Location
    United Kingdom
    Posts
    3,336
    My Mood
    Mellow
    Thanks
    258
    Thanked 294 Times in 227 Posts
    Blog Entries
    4

    Default Re: Web Crawling is too Slow!

    I don't really think much can be done from the programming side. The code looks OK to me. I think the speed will totally depend on the internet connection and server response time.
    Please use [highlight=Java] code [/highlight] tags when posting your code.
    Forum Tip: Add to peoples reputation by clicking the button on their useful posts.

  6. #6
    Super Moderator Json's Avatar
    Join Date
    Jul 2009
    Location
    Warrington, United Kingdom
    Posts
    1,274
    My Mood
    Happy
    Thanks
    70
    Thanked 156 Times in 152 Posts

    Default Re: Web Crawling is too Slow!

    Another thing which can be a MAJOR factor is the DNS lookup but from a programming/code perspective there's probably not much you can do without getting into a lower level.

    // Json

Similar Threads

  1. Blackjack program runs excrutiatingly slow.
    By sina12345 in forum What's Wrong With My Code?
    Replies: 5
    Last Post: December 10th, 2010, 04:55 AM
  2. Audioclip play makes applets slow
    By Marcus in forum What's Wrong With My Code?
    Replies: 0
    Last Post: February 19th, 2010, 01:20 PM

Tags for this Thread