The idea of multithreading is new to me, and in order to practice I have decided to make my Web Crawler a bit faster.
After implementing the new process using a thread pool with ExecutorService, and an ArrayBlockingQueue to organize the links to crawl through, the process still works at the same rate as the previous program (10 threads compared to a single thread).
Again, I'm new and am not aware of all of the pitfalls that come with multithreading. Any help would be great.
Here's my shortened code to provide an example.
// For reference (not actual code) ArrayBlockingQueue<String> links = new ArrayBlockingQueue<String>(10000); ExecutorService executor = Executors.newFixedThreadPool(10); TreeSet<String> foundLinks = new TreeSet<String>(); // Actual code public void actionPerformed(ActionEvent event) { links.clear(); foundLinks.clear(); links.add(addressField.getText()); foundLinks.add(addressField.getText()); int activeThreads = 0; while (!links.isEmpty() || (activeThreads = ((ThreadPoolExecutor) executor) .getActiveCount()) != 0) { System.out.println(activeThreads); try { executor.submit(new LinkSearch(links.take())); } catch (InterruptedException e) { e.printStackTrace(); } } }
Here is what a LinkSearch is:
class LinkSearch implements Runnable { String link; public LinkSearch(String link) { this.link = link; } public void run() { searchLinks(link); } } // The method serachLinks(String link) is below: public void searchLinks(String link) { try { URL url = new URL(link); String src = getSource(url); src = src.toLowerCase(); int index = 0; while ((index = src.indexOf("<a", index)) != -1) { if ((index = src.indexOf("href", index)) == -1) break; if ((index = src.indexOf("=", index)) == -1) break; index++; String remaining = src.substring(index); StringTokenizer st = new StringTokenizer(remaining, "\t\n\r\">#"); String strLink = st.nextToken(); URL urlLink; try { urlLink = new URL(url, strLink); strLink = urlLink.toString(); } catch (MalformedURLException e) { continue; } if (urlLink.getProtocol().compareTo("http") != 0) break; if (isHTML(strLink)) { if (strLink.contains(addressField.getText()) && linkNotFound(strLink) && exists(strLink)) { System.out.println(strLink); links.add(strLink); addValidLink(strLink); pageOrganizer.add(new ButtonLink(strLink)); addSource(src); repaint(); } } } } catch (MalformedURLException e) { e.printStackTrace(); return; } }
Hopefully it is all understandable/readable.