#Quantifying Predictive Accuracy: From RMSE and MAE to Correlation Coefficients
In the fields of data science, finance, and scientific research, we often need to evaluate the accuracy of model predictions. Whether forecasting stock prices, sales figures, or temperature, a set of objective, quantitative standards is required to measure the discrepancy between predicted and actual values. This article will provide an in-depth analysis of several core evaluation metrics, including MSE, RMSE, MAE, and the Pearson correlation coefficient, and will specially explore their unique application in finance—calculation and significance in terms of basis points (bp).
I. Definition and Interpretation of Core Metrics
1. MSE (Mean Squared Error)
MSE measures the average of the squares of the differences between predicted and actual values.
- Formula:
MSE = (1/n) * Σ(Actual Value - Predicted Value)² - Interpretation: MSE amplifies the penalty for larger errors through the squaring term. This means a single large error will have a disproportionate impact on the MSE. Its units are the square of the original data's units, which can sometimes be difficult to interpret directly.
2. RMSE (Root Mean Squared Error)
RMSE is the square root of the MSE and is a more commonly used metric than MSE.
- Formula:
RMSE = √MSE - Interpretation: RMSE restores the error units to the same dimension as the original data, making it more interpretable. Because it inherits the squaring property of MSE, it is also sensitive to outliers. Typically, the value of RMSE will be larger than that of MAE.
3. MAE (Mean Absolute Error)
MAE measures the average of the absolute differences between predicted and actual values.
- Formula:
MAE = (1/n) * Σ|Actual Value - Predicted Value| - Interpretation: MAE provides the average of the absolute errors, giving a very intuitive result, i.e., "on average, how much is the deviation per unit?". It is less sensitive to outliers than MSE and RMSE.
4. Pearson Correlation Coefficient
Unlike the first three metrics, the Pearson correlation coefficient measures the linear relationship between predicted and actual values, not the direct error.
- Formula: It measures the ratio of the covariance between two variables to the product of their respective standard deviations.
- Range: Between -1 and 1.
- 1: Perfect positive correlation.
- -1: Perfect negative correlation.
- 0: No linear correlation.
- Interpretation: A high correlation coefficient means the predicted values closely follow the trend of the actual values, but this does not mean the predictions are accurate. Even if all predictions are systematically biased high or low, the correlation coefficient can still be high.
II. Special Application in Finance: Error Calculation in Basis Points (bp)
In finance, especially in forecasting interest rates, bond yields, and credit spreads, minute differences can signify significant risk or return. Therefore, the industry standard is to use Basis Points (bp) as the unit.
- Basis Point Definition: 1 basis point equals 0.01%, i.e.,
1 bp = 0.0001.
Converting error metrics to bp greatly enhances the readability and business value of the results. The following concrete example compares calculations based on numerical values and bp.
Scenario: Predicting the yield to maturity of a bond.
- Actual Values: 3.50%, 3.50%
- Predicted Values: 3.45%, 3.60%
| Step | Calculation Item | Value-Based Calculation (Unit: %) | Basis Points Calculation (Unit: bp) | Explanation & Relationship |
|---|---|---|---|---|
| 1 | Individual Error | Error₁ = 0.05 Error₂ = -0.10 | Error₁ = 50 bp Error₂ = -100 bp | Core Conversion: Error(bp) = Error(%) × 10,000 |
| 2 | MAE | (0.05 + 0.10) / 2 = 0.075 | (50 + 100) / 2 = 75 bp | MAE (bp) = MAE (%) × 10,000 |
| 3 | MSE | (0.0025 + 0.0100) / 2 = 0.00625 | (2500 + 10000) / 2 = 6250 bp² | MSE (bp²) = MSE (%) × (10,000)² |
| 4 | RMSE | √0.00625 ≈ 0.07906 | √6250 ≈ 79.06 bp | RMSE (bp) = RMSE (%) × 10,000 |
Business Interpretation:
- In reports, stating that "our yield prediction model has an average error of 75 bp" is far more professional and intuitive than saying "the average error is 0.075%."
- RMSE (bp) and MAE (bp) are the most frequently cited error metrics in financial model validation because their units are intuitive and they account for different error characteristics. MSE (bp²), due to its difficult-to-interpret units, is typically only used as an intermediate step in calculating RMSE.
III. How to Choose the Right Metric?
- Focus on Large Errors: If large mistakes are far more costly than small ones in your business context (e.g., risk control), use RMSE, as it severely penalizes large deviations.
- Treat All Errors Equally: If all errors, regardless of size, should be treated equally (e.g., cost forecasting), MAE is a fairer choice.
- Understand Trend Over Precision: If you want to assess whether the model captures the direction of data movement rather than absolute precision (e.g., determining stock price trends), the Pearson Correlation Coefficient is the ideal tool.
- Financial Industry Standard: In scenarios involving interest rates, spreads, etc., be sure to use RMSE (bp) and MAE (bp) as the final reporting metrics.