Hello i had an online test in some company and i didn't succeed in doing it(they gave me a timeline) this was the test:
Problem:
1) You have a bunch of log files in:
C:\CloudShareCodeChallenge\Challenge
Sample data with expected results (below) in C:\CloudShareCodeChallenge\Sample
2) Each file may contain two types of records, user records and payment records;
you must process all files
3) User records are in the form of UR,<User Identifier>,<First name>,<Last name>
User records are guaranteed to be unique
4) Payment records are in the form of PR,<Payment Identifier>,<User Identifier>,<Amount>
There is a bug in the payments processor, and same payments MAY appear more than once
(hence *double booking*)
You should only take into account one payment per payment id.
5) Important notes:
You may assume all records are correct (parsing wise): only one comma per field, no decimal
point in amount and no missing fields, so you may use split etc. for that.
Files are not ordered, so depending on how you read the files you may get a payment record
before the corresponding user record
Each file name contains a two letter field records_M2_2TFA7.log only payment records starting
with those two characters will be in the file
Payments with same prefix may span on more than one file.
All log files must be processed
Keep it clean and simple
If you feel you must add comment, rewrite the code to be self-explanatory.
All the clues/notes above are important
6) Your tasks:
Provide the list of the top 10 paying users: Full name and amount
Provide the most common and least common first names with count
Note: There may be more than one name
Provide the number of *double booking* payments:
Either in thousandth of percent from the total number of payments 3 decimal points
For 5% you should show 50.000
For 0.2345% you will show 2.345
Or a fraction 12,3435 / 100,000,000
Bonus Questions:
How would you write the generator for the above problem
What would you take under consideration, what would be the challenges
Sample data processing results:
Payments Ratio = 45.898 (235 / 5120)
Most common name(s): Hugh: 2
Most common name(s): Julio: 2
Most common name(s): Nelson: 2
Most common name(s): Sofia: 2
Most common name(s): Zelma: 2
Least common name(s): Annabelle: 1
Least common name(s): Chandra: 1
Least common name(s): Clayton: 1
Least common name(s): Cody: 1
Least common name(s): Gay: 1
Least common name(s): Guy: 1
Least common name(s): Jerri: 1
Least common name(s): Jessie: 1
Least common name(s): Julianne: 1
Least common name(s): Serena: 1
Top paying users:
Nelson Sensabaugh : 42669
Hugh Maginnis : 39375
Annabelle Glade : 30252
Jerri Bartee : 26393
Chandra Bottorff : 21221
Julianne Deller : 19624
Zelma Lubinsky : 17051
Nelson Caicedo : 16191
Sofia Lucena : 13779
Guy Gratton : 12150
There were 545 files and in each one there were approximitally 260,000 lines Where each line can be a payment record and the line is looking like this one: PR,U3VT2406XY5AC2,I8MKX,53 (of course Payment Identifier,User Identifier,Amount are not the same for each line)
And there can be lines which represent user record and those lines are like this one: UR,B7R1W,Nelson,Hornyak (of course User Identifier,First name,Last name are not the same for each line)
They gave an eclipse environment and said to do it in Java(Regular java programming),but I didn't know how and what things to use? Do I make an HashMap of more than a 100 million entries? Do I use different things? What is the algorithms that I should use and how? I just graduated so unfortunately I didn't succeed in doing this test but I really want to know how to do it?