salamzz my final year project is java crawler nd em confused related to starting plzz help me..which one is the best algo for the crawler
Welcome to the Java Programming Forums
The professional, friendly Java community. 21,500 members and growing!
The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.
>> REGISTER NOW TO START POSTING
Members have full access to the forums. Advertisements are removed for registered users.
salamzz my final year project is java crawler nd em confused related to starting plzz help me..which one is the best algo for the crawler
Consider this course. They'll teach you the basics of programming on Python on a web crawler example.
bt python is different language and i have to make web crawler in java bt cn't understand frm where i start my work
Doesn't matter. They will show you an algorithm and explain it, so that you know where to start. Crawler is a trivial project, who can be implemented just in one simple function, and if you know the material you were supposed to learn this year, you'll definitely be able to do it.
okhzz thnkss sir can you send m the useful link frm where i gt the usefull inf..r which algo is best for the crawler
one of my uni sir said use specificc website frm where u want to access data bt other teachers said k thats nt possible.r i js want relevent data nt other stuff frm the otherweb sites..
The relevancy can be achieved by using filtering function during graph traversal.
Hi,
You can write a crawler using java with the help of existing API’s. You can use the below algorithm to start you java crawler final year project. while(list of unvisited URLs is not empty) {
take URL from list
fetch content
record whatever it is you want to about the content
if content is HTML {
parse out URLs from links
foreach URL {
if it matches your rules
and it's not already in either the visited or unvisited list
add it to the unvisited list
}
}
}
Please, please, please, don't resurrect dead threads. Check the date the thread was started AND the date of the last post. If they're not within the past 2 or 3 weeks, the thread isn't relevant anymore.
Closing this one.