Do You calculate Linearity properly?

Imagine the following situation. You are testing the linearity of an analytical method, let’s say HPLC. The linearity coefficient is R=0.999, the coefficient of determination is R² = 0.999. An ideal situation. The criteria are met, the validation is successful. Now all that’s left is to write the report and go home. Nothing could be further from the truth. There may be a hidden gremlin in our data that is not visible at first glance. However, residual analysis will help us with this.

Analiza resztkowa w wyznaczaniu liniowości
Residal analysis in linearity determination

Linearity and regression model fit

In 2023, the ICH published a new guideline, ICH Q2 – Validation of Analytical Methods. One of the changes concerned the determination of linearity. The new guidelines took into account biological methods such as ELISA and cytotoxicity tests. These methods are often based on non-linear calibration models. Therefore, it is impossible to determine their linearity.

For this reason, linearity has been replaced by regression model fitting in the new ICH Q2. This does not change the fact that the term linearity can still be used for methods that are linear in nature.

How does linear regression work?

In order to determine the concentration, a calibration curve is constructed. Several solutions of known concentration are analysed and, based on the results obtained, a graph of the linear dependence of the signal on the concentration is generated. Of course, the measured values may not coincide perfectly with the curve. Therefore, interpolation is used, in which the line of the graph runs between the measured points. The course is not random and is determined so that the sum of the squares of the measurement errors is the smallest.

Linearity coefficient and coefficient of determination

When determining linearity and reading results, we may encounter two values: the linearity coefficient – R, and the coefficient of determination – R². How do they differ?

Linearity coefficient (R) – identical to Pearson’s correlation coefficient. It indicates whether the relationship between data is linear and to what extent. The closer the value is to 1, the stronger the correlation.

Coefficient of determination (R²) – This is calculated by squaring the R value. It indicates the fit of the regression model, i.e. how well the concentration can be determined based on the signal measurement.

OK, our data is perfect: linearity coefficient R = 0.9999, coefficient of determination R2 = 0.9999? Well, theoretically, the model is a perfect fit. Perhaps even accuracy would not reveal any problems. However, it is possible that we are calculating something wrong and our method is even more accurate than we think.

Residual analysis – is the model truly linear?

Residual analysis is one of the stages of regression model evaluation. It involves examining the differences (residuals) between the observed (measured) values and the values read from the regression model. The residuals (e) represent model errors. Their position on the concentration dependence graph can indicate whether the model is properly fitted.

Analysis of the graph showing the relationship between concentration and residual values can help to determine the following:

Verification of model correctness
Verification of variance homogeneity (homoscedasticity)
Identification of outliers

To explain residual analysis, we will use the following data set:

Concentration	Signal
50	110,2483571
75	159,9308678
100	210,3238443
125	260,7615149
150	309,8829233

The signal represents the measured values. To calculate the residuals, we need to determine the expected values based on the equation of the curve y=ax +b.

Concentration	Signal (y)	Expected values (ŷ)
50	110,2483571	110,2095456
75	159,9308678	160,2195235
100	210,3238443	210,2295015
125	260,7615149	260,2394794
150	309,8829233	310,2494574

Then, using the difference between the measured values and the expected values, we determine the remainders.

Concentration	Signal (y)	Expected values (ŷ)	Residuals (e)
50	110,2483571	110,2095456	0,038811
75	159,9308678	160,2195235	-0,28866
100	210,3238443	210,2295015	0,094343
125	260,7615149	260,2394794	0,522035
150	309,8829233	310,2494574	-0,36653

Residual analysis plot (linear regression)

In general:

The residual values should be evenly distributed on both sides of the horizontal line 0.
No clear trend or shape indicates a good fit for the linear model.
Outliers can be observed.

Case study

To demonstrate the true power of residual analysis, I will use a set of three data points.

	Case 1		Case 2		Case 3
Cocncentration	Signal	Residuals	Signal	Residuals	Signal	Residuals
50	110,2483571	0,0388115	114,9531726	2,345937031	109,80000	-0,440000
75	159,9308678	-0,288655683	161,5658426	-1,031487151	160,40000	0,400000
100	210,3238443	0,094342782	210,1534869	-2,433936904	209,20000	-0,560000
125	260,7615149	0,522035486	261,1561051	-1,421412863	261,20000	1,680000
150	309,8829233	-0,366534085	315,108512	2,540899887	308,20000	-1,080000

The residual plots for these cases are as follows:

What can we conclude from the graphs?

Case 1:

The points are evenly distributed on both sides of the value 0. There is no clear trend in the position of the points. This indicates a good fit for the linear model.

Case 2:

The characteristic parabolic shape of the points. This shape indicates the non-linear nature of the distribution of points. In most cases, they have a binomial distribution (second-degree function). In this case, the use of a linear model is incorrect despite the high R² value, as the relationship is non-linear.

Case 3:

The residuals move away from 0 as the concentration increases (funnel shape). This indicates heteroscedasticity. In this case, weighted regression should be used.

Summary

Based on the above data, it is quite clear that the values of R or R² can be misleading. When determining linearity, it is worth taking a closer look at the data and assessing whether the calibration model we have used is actually the best and consistent with mathematics.

Unfortunately, in most cases, the software used to operate the equipment and analyse the results will not perform this analysis. It is therefore worth using special statistical tools such as Statistica or Minitab, or simply a spreadsheet, to assess whether the linearity is actually linear.

Sources:

ICH Q2

pogotowiestatystyczne.pl

sixsigmadsi.com

Do You calculate Linearity properly?

Linearity and regression model fit

How does linear regression work?

Linearity coefficient and coefficient of determination

Residual analysis – is the model truly linear?

Case study

Case 1:

Case 2:

Case 3:

Summary

Sources:

Leave a Reply Cancel reply

Kontakt