I have been trying to implement a parallel Depth First Search in Java for undirected graph. The approach I decided to implement uses one global stack which is shared by all the threads and n local stacks where n is the number of threads. Each thread stores the nodes of its sub-tree in its local stack. Initially the global stack contains the root of the tree and only one thread gets access to it while the other threads are waiting to be woken up by the working thread. The working thread retrieves and processes the root from the global stack, adds one successor to its local stack then pushes the rest of the successors, if they exist, to the global stack to be processed by other threads and wakes up all the waiting threads. All the other threads follow the same approach (i.e. when threads get a node from the global stack they push one successor to their local stack and the rest to the global stack then start accessing their local stack until it becomes empty).
The problem is it doesn't speed up. Will give me any suggestions on how to improve it? Thank you in advance.
(I attached my code)
DFSearch_v2.zip