Hi everyone
I'm currently working on a program that will read in a file, break it up into sentences, check for duplicates (and print them to an outfile) and pull out specific sentences. I was trying to have the sentences saved as a String and then inputting each String into an Array (using ".", "!", etc. as cut off points so the program knows when the sentence is finished). For the duplicates, I was planning on counting the frequency of each sentence and print out the ones that occur more thna once. I was doing it this way as I also have to count the frequency of each word and pring the 100 words that occur more often, so I want to reuse as much code as possible.
This is what I have so far
import java.io.*; import java.util.*; public class four { public static String[] Array = new String[5000]; public static int[] Frequency = new int[500000]; public static void main(String[] args) throws Exception { BufferedReader br = new BufferedReader(new FileReader("X:\\Desktop\\book.txt")); StringBuilder sb = new StringBuilder(); BufferedWriter bw = new BufferedWriter(new FileWriter("X:\\Desktop\\new.txt")); String line = br.readLine(); while (line != null) { sb.append(line + "\n"); line = br.readLine(); } br.close(); String text = sb.toString(); getSentences(text); int nbWords = 0; int Sentences = 0; } public static int getSentences(String word) { int i = 0; String abc = ""; int result = 0; char[] chars = word.toCharArray(); for(Character c : chars) { abc += c; if(c == '.' || c == '!' || c == '?' || c == ']' /*|| c == 'k'*/) { Array[i] = abc; abc=""; } i++; if (Array[i].isFull()==true) { writeline(Array[i]); } for (int n=0; n<Array.length; n++) { for (int j=0; j<Array.length; j++) { if (Array[n]==Array[j]) { Frequency[n]++; } } } for (int p=0; p<Array.length; p++) { if (Frequency [p] > 1) System.out.println(Array[p] = "=" + Frequency[p]); } System.out.println(Array[23] + Array [189] + Array[590] + Array[690] + Array [847]); return result; } } }
Any input anyone could give would be greatly appreciated! I usually use Perl so I'm out of my league with all this Java.