Hello,

I have a couple of transaction types with different risk scores in a scale between 0 to 10 [0-10]:

I want to calculate an overall risk score by using some "weighted_average" method or something like that.

However, when I use the number of transactions as weight, then the weights of the most risky transactions types (type_1 and type_3) become insignificantly low (~0.04% and ~0.08% respectively).

I don't want to underestimate the most risky transactions types 1 and 3, so I came up with the idea of using the log functions for the number of transactions. Thus:

With this log-conversion and new weights, the risky trans_types 1 and 3 can contribute to the result significantly (19.7% and 20.7%) and still the trans_types 2 and 4 contributes to the results the most (29.4% and 30.2%) due to their number of transactions. So everything looks perfect with this method.

However, I have difficulty to motivate "How I could use the log-conversion for the weighted-average calculation" according to the statistical principles. Does my method make any sense statistically?

I have a couple of transaction types with different risk scores in a scale between 0 to 10 [0-10]:

Code:

```
Trans_type_1_risk = 8 and Number_of_trans_type_1 = 1.000.000 (~0.04% of Total_nbr_of_transactions)
Trans_type_2_risk = 4 and Number_of_trans_type_2 = 1.000.000.000 (~40% of Total_nbr_of_transactions)
Trans_type_3_risk = 9 and Number_of_trans_type_3 = 2.000.000 (~0.08% of Total_nbr_of_transactions)
Trans_type_4_risk = 5 and Number_of_trans_type_4 = 1.500.000.000 (~60% of Total_nbr_of_transactions)
```

Code:

`Overall_risk = (Trans_type_1_risk * Nbr_of_trans_type_1 + Trans_type_2_risk * Nbr_of_trans_type_2 + Trans_type_3_risk * Nbr_of_trans_type_3 + Trans_type_1_risk * Nbr_of_trans_type_4) / Total_nbr_of_trans`

I don't want to underestimate the most risky transactions types 1 and 3, so I came up with the idea of using the log functions for the number of transactions. Thus:

Code:

```
log(Nbr_of_trans_type_1) = log(1.000.000) = 6 which is ~19.7% of (6+9+6.3+9.2) so the new Weight_1 becomes 19.7%
log(Nbr_of_trans_type_2) = log(1.000.000.000) = 9 which is ~29.4% of (6+9+6.3+9.2) so the new Weight_2 becomes 29.4%
log(Nbr_of_trans_type_3) = log(2.000.000) = 6.3 which is ~20.7% of (6+9+6.3+9.2) so the new Weight_3 becomes 20.7%
log(Nbr_of_trans_type_4) = log(1.500.000.000) = 9.2 which is ~30.2% of (6+9+6.3+9.2) so the new Weight_4 becomes 30.2%
```

Code:

` Overall_risk = (8 * 19.7% + 4 * 29.4% + 9 * 20.7% + 5 * 30.2%) / 30.5 = 6.125`

Last edited: