
Hi All,
I'm not sure how to describe this other than by providing an example, so here goes:
I want a php function that will be provided with an array containging numbers, for example:
array = (3.5, 3.6, 3.4, 8.1, 2.5)
Rather than, just calculate an average of the numbers in the array, I want it to first filter out the numbers that are much higher or lower than the rest of the numbers. For example, the function would remove "8.1" from the array, and then calculate an average of the numbers that are left (and return the result).
Any body have an idea of how to code this?
Thanks,
Richard
define much higher or lower.
I'm not sure...
Let's say, + 10%.
then get the average add and destract 10% and remove all numbers that are higher or lower.
Code: 
$array = array(3.5, 3.6, 3.4, 8.1, 2.5);
function average($a){
return array_sum($a)/count($a) ;
}
$av = average(array);
$up_av = $av * 1.1;
$low_av = $av * 0.9;
foreach($array as $num) {
if (($num < $up_av) && ($num > $low_av)) {
$new_array[] = $num;
}
}
$new_av = average($new_array);

Hm, I'd be tempted to go with the upper and lower quartiles, then take outliers. Erm... maybe ksort, count and averaging would help there.
RVEC  I'm not sure your function would always work  Lets say I had the values:
array(50, 50, 50 , 250)
The average would be 100 so all values would be outside the +10% range.
But thanks for trying.
where do you get this problem from?
richard270384 wrote:  RVEC  I'm not sure your function would always work  Lets say I had the values:
array(50, 50, 50 , 250)
The average would be 100 so all values would be outside the +10% range.
But thanks for trying. 
true, but you said you wanted + 10%, so I wrote one for that. If you want to make it include at least half the results I could write it to do that.
edit: please tell me exactly what numbers you might expect, and what the output should be. The more you tell me the better the script will be
It's for my music database site.
The same song can appear on multiple albums, and will no doubt run for a different length of time on each album, but the length of the song would generally be similar. From time to time though (for example on an acoustic version of a song) the song could be substantially longer.
On the song page of my site, I would like at some point in the future to display the approximate length of the song.
If I were to just calculate an average of all the song occurrences on albums, it wouldn't always work because of the occasional "extended length" occurrences.
I want a function that I would submit the track lengths to, and which would then filter out the outer lying track times and calculate an average of what numbers are left. The function would return the resulting average.
I don't have any real test data at the moment, but an example could be where the track appears say on 5 albums:
Album 1  Length 5:01 (301 seconds)
Album 2  Length 3:56 (236 seconds)
Album 3  Length 3:52 (232 seconds)
Album 4  Length 3:37 (217 seconds)
Album 5  Length 4:03 (243 seconds)
I would submit the track lengths in seconds, so I would send an array to the function that looks like this:
array (301, 236, 232, 217, 243)
In this particular example, I would want the 301 to be filtered out by the function before calculating an average. If the 301 was not removed the average is 246 seconds (or about 4:05) which is not very accurate. If the 301 is removed, the average is 232 seconds (or 3:52) which is a lot more accurate (though could still be improved on).
I think it comes back to statistics where we talk about median's and things like that, but when I did that stuff at school (a LONG time ago) I didn't really think it was useful so never paid attention.
There could always be more than one outer lying number, and there is no set number of how many occurrences we would want exclude from the array before calculating the average.
I hope that makes sense. Does it sound too hard? I thought it would be fairly easy for somebody who knew the right functions to use, but maybe I was wrong.
ok so you can define the varience of +10, so can you also define from what point the variance of 10% should be measured?
Tricky problem
What I'd do, is this. It's not exactly what you said, but I think it should have the desired effect.
Sort the values from lowest to highest. If there are more than 2 values, check the difference between the lowest and the highest. If it's more than 20% (both 10% away from the "average"), remove the first and the last value. Repeat.
After this stops (only 1 or 2 values left, or all values approximately the same), just calculate the average.
Or you could do this:
Sort the values from lowest to highest and take the average. If there is more than 1 value, check both the lowest and the highest value. Take the one that's the furthest away from your average, and if it's more than 10% away, remove the value and repeat.
Then calculate the average.
I think Stubru Freak has a good idea there, it would look like this:
Code: 
function average($a, $b = 13){
$return = array_sum($a)/count($a);
return round($return, $b);
}
function average2($array, $b = 1) {
sort($array, SORT_NUMERIC);
$i = count($array)  1;
while ($i > 2 && ((average($array) > $array[0])  (average($array) > $array[$i]))) {
foreach ($array as $key=>$num) {
if ($key != 0 && $key != $i) {
$array2[] = $num;
}
}
$array = $array2;
$i = 2;
}
return average($array, $b);
}
$array = array(3.5, 3.6, 3.4, 8.1, 2.5);
echo average2($array);

you can use the last 2 lines as often as you want to calculate the averages, and the function takes a second parameter ($b) which makes it round with that much decimals (default is 1).
Thanks heaps guys rvec and Stubru.
That function will do the job. Very much appreciated.
I never thought of doing it that way.
