Menu Close

Do You calculate Linearity properly?

Imagine the following situation. You are testing the linearity of an analytical method, let’s say HPLC. The linearity coefficient is R=0.999, the coefficient of determination is R2 = 0.999. An ideal situation. The criteria are met, the validation is successful. Now all that’s left is to write the report and go home. Nothing could be further from the truth. There may be a hidden gremlin in our data that is not visible at first glance. However, residual analysis will help us with this.

Analiza resztkowa w wyznaczaniu liniowości
Residal analysis in linearity determination

Linearity and regression model fit

In 2023, the ICH published a new guideline, ICH Q2 – Validation of Analytical Methods. One of the changes concerned the determination of linearity. The new guidelines took into account biological methods such as ELISA and cytotoxicity tests. These methods are often based on non-linear calibration models. Therefore, it is impossible to determine their linearity.

For this reason, linearity has been replaced by regression model fitting in the new ICH Q2. This does not change the fact that the term linearity can still be used for methods that are linear in nature.

How does linear regression work?

In order to determine the concentration, a calibration curve is constructed. Several solutions of known concentration are analysed and, based on the results obtained, a graph of the linear dependence of the signal on the concentration is generated. Of course, the measured values may not coincide perfectly with the curve. Therefore, interpolation is used, in which the line of the graph runs between the measured points. The course is not random and is determined so that the sum of the squares of the measurement errors is the smallest.

Linearity coefficient and coefficient of determination

When determining linearity and reading results, we may encounter two values: the linearity coefficient – R, and the coefficient of determination – R2. How do they differ?

Linearity coefficient (R) – identical to Pearson’s correlation coefficient. It indicates whether the relationship between data is linear and to what extent. The closer the value is to 1, the stronger the correlation.

Coefficient of determination (R²) – This is calculated by squaring the R value. It indicates the fit of the regression model, i.e. how well the concentration can be determined based on the signal measurement.

OK, our data is perfect: linearity coefficient R = 0.9999, coefficient of determination R2 = 0.9999? Well, theoretically, the model is a perfect fit. Perhaps even accuracy would not reveal any problems. However, it is possible that we are calculating something wrong and our method is even more accurate than we think.

Residual analysis – is the model truly linear?

Residual analysis is one of the stages of regression model evaluation. It involves examining the differences (residuals) between the observed (measured) values and the values read from the regression model. The residuals (e) represent model errors. Their position on the concentration dependence graph can indicate whether the model is properly fitted.

Analysis of the graph showing the relationship between concentration and residual values can help to determine the following:

  1. Verification of model correctness
  2. Verification of variance homogeneity (homoscedasticity)
  3. Identification of outliers

To explain residual analysis, we will use the following data set:

ConcentrationSignal
50110,2483571
75159,9308678
100210,3238443
125260,7615149
150309,8829233
Linear calibartion curve

The signal represents the measured values. To calculate the residuals, we need to determine the expected values based on the equation of the curve y=ax +b.

ConcentrationSignal (y)Expected values (ŷ)
50110,2483571110,2095456
75159,9308678160,2195235
100210,3238443210,2295015
125260,7615149260,2394794
150309,8829233310,2494574

Then, using the difference between the measured values and the expected values, we determine the remainders.

ConcentrationSignal (y)Expected values (ŷ)Residuals (e)
50110,2483571110,20954560,038811
75159,9308678160,2195235-0,28866
100210,3238443210,22950150,094343
125260,7615149260,23947940,522035
150309,8829233310,2494574-0,36653
   
Residual analysis plot (linear regression)

In general:

  • The residual values should be evenly distributed on both sides of the horizontal line 0.
  • No clear trend or shape indicates a good fit for the linear model.
  • Outliers can be observed.

Case study

To demonstrate the true power of residual analysis, I will use a set of three data points.

Case 1Case 2Case 3
CocncentrationSignalResidualsSignalResidualsSignalResiduals
50110,24835710,0388115114,95317262,345937031109,80000-0,440000
75159,9308678-0,288655683161,5658426-1,031487151160,400000,400000
100210,32384430,094342782210,1534869-2,433936904209,20000-0,560000
125260,76151490,522035486261,1561051-1,421412863261,200001,680000
150309,8829233-0,366534085315,1085122,540899887308,20000-1,080000
Linear regression (case 1)
Linear regression (case 2)
Linear regression (case 3)

The residual plots for these cases are as follows:

Residual analysis (linear)
Residual analysis (polynomial)
Residual analysis (heteroscedasticity)

What can we conclude from the graphs?

Case 1:

The points are evenly distributed on both sides of the value 0. There is no clear trend in the position of the points. This indicates a good fit for the linear model.

Case 2:

The characteristic parabolic shape of the points. This shape indicates the non-linear nature of the distribution of points. In most cases, they have a binomial distribution (second-degree function). In this case, the use of a linear model is incorrect despite the high R2 value, as the relationship is non-linear.

Case 3:

The residuals move away from 0 as the concentration increases (funnel shape). This indicates heteroscedasticity. In this case, weighted regression should be used.

Summary

Based on the above data, it is quite clear that the values of R or R2 can be misleading. When determining linearity, it is worth taking a closer look at the data and assessing whether the calibration model we have used is actually the best and consistent with mathematics.

Unfortunately, in most cases, the software used to operate the equipment and analyse the results will not perform this analysis. It is therefore worth using special statistical tools such as Statistica or Minitab, or simply a spreadsheet, to assess whether the linearity is actually linear.

Sources:

ICH Q2

pogotowiestatystyczne.pl

sixsigmadsi.com

Leave a Reply

Your email address will not be published. Required fields are marked *

Privacy Overview
Bioeducator.eu

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

Analytics

This website uses Google Analytics to collect anonymous information such as the number of visitors to the site, and the most popular pages.

Keeping this cookie enabled helps us to improve our website.