Ok, so this is sort of hard to explain, so stay with me. I have an array of Objects that contain a Name and a Number. The Names are dates (months and years). The Numbers need to be averaged, but based on conditions. I need to average two number together for each month (January through December). However, not all months for every year is there, not all months have data, the two most recent years must be averaged, and I cant average two months together that will give misleading averages (unless those are my only options).
My array objects will look something like this:
Apr-06 2,476
May-06 2,201
Jun-06 1,783
Jul-06 2,048
Aug-06 1,557
Sep-06 1,533
Oct-06 2,614
Nov-06 2,804
Dec-06 2,951
Jan-07 3,644
Feb-07 3,250
Mar-07 3,279
Apr-07 3,007
May-07 3,273
Jun-07 2,340
Jul-07 2,276
Aug-07 1,819
Sep-07 1,519
Oct-07 1,921
Nov-07 1,983
Dec-07 2,200
Jan-08 2,398
Feb-08 2,604
Mar-08 2,664
Apr-08 1,930
May-08 1,316
Jun-08 1,105
Jul-08 1,090
Aug-08 593
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09
Dec-09 827
Jan-10 1,539
Feb-10 1,607
Mar-10 1,823
So for this array, I would want to do the following averages:
Jan: Jan-10 and Jan-08
Feb: Feb-10 and Feb-08
Mar: Mar-10 and Mar-08
Apr: Apr-08 and Apr-07
May: May-08 and May-07
Jun: Jun-08 and Jun-07
Jul: Jul-08 and Jul-07
Aug: Aug-07 and Aug-06
Sep: Sep-07 and Sep-06
Oct: Oct-07 and Oct-06
Nov: Nov-07 and Nov-06
Dec: Dec-07 and Dec-06
Notice how I dont want to average in Aug-08 and Dec-09 because they will make the averages misleading (as they are well below the trends of the rest of the data). Also, I dont want to average in Sep-08 through Nov-09 because those months dont have data for them.
Now, not factoring in the months that arent there and not factoring in the months without data are not too much of a hassle. But I cant think of a way to determine if the data is too small to include in the averaging or not. This is because each data series can have mediums of below 1000 or above 7000 or possibily higher or lower. Also, just because the data is below a certain point for the series, doesnt mean that it will make the data misleading. For instance, looking at the data, January's data is usually much higher than September's data. We dont want January's high data making it so September's data doesnt get counted because it gets considered too low.
Can anyone help me think of a process to determine if the data is inconsistent for the corresponding data for its month?