Is Your Process Data Spread Smooth or Chunky?
Just the Facts
DOFPro Team
Often, engineering process data appear with a dependent, or measured, variable as a function of an independent, or set, variable i.e., pressure as a function of temperature, or flow rate as a function of time.
\[y = f(x)\]
Due to (always present) measurement uncertainty and/or process variability,
\[y_i = y_\mathrm{true} + \varepsilon_i = f(x_i) + \varepsilon_i\]
The question is how to determine \(f(x)\).
The human visual system is very good at seeing if a plot of data is a straight line or almost a straight line. A number of functions can be plotted as a straight line with a change of coordinates, typically from linear to logarithmic.
Original
Function
Linear
Form
Coordinate Scale
For Linear Plot
\(y = b x + a\)
\(y = b x + a\)
\(x\) – linear, \(y\) – linear
\(y = ae^{bx}\)
\(\ln y = \ln a + b x\)
\(x\) – linear, \(y\) – log (semi-log)
\(y = a10^{bx}\)
\(\log_{10} y = \log_{10} a + b x\)
\(x\) – linear, \(y\) – log (semi-log)
\(y = ax^b\)
\(\log y = \log a +b\log x\)
\(x\) – log, \(y\) – log (log-log)
Definition of a linear function:
\(f(c_1 x_1 + c_2 x_2) = c_1 f(x_1) + c_2 f(x_2)\).
For a line, \(y=bx + a\), so
\(f(c_1 x_1 + c_2 x_2) \ne c_1 f(x_1) + c_2 f(x_2)\) unless \(a \equiv 0\).
\(x\) | \(y\) |
---|---|
1 | 9.44 |
2 | 13.24 |
3 | 16.32 |
4 | 19.77 |
5 | 22.08 |
6 | 24.15 |
7 | 27.52 |
8 | 30.99 |
9 | 33.33 |
10 | 37.83 |
\(x\) | \(y\) |
---|---|
1 | 1.38E+02 |
2 | 2.74E+03 |
3 | 5.42E+04 |
4 | 1.09E+06 |
5 | 2.38E+07 |
6 | 4.71E+08 |
7 | 9.39E+09 |
8 | 1.86E+11 |
9 | 3.74E+12 |
10 | 7.12E+13 |
\(x\) | \(y\) |
---|---|
1 | 6.706 |
2 | 75.22 |
3 | 146.2 |
4 | 504.7 |
5 | 868.2 |
6 | 1456 |
7 | 2349 |
8 | 3910 |
9 | 5377 |
10 | 7336 |
A subtlety of data fitting on a transformed equation is that you have also transformed the errors, and so you are no longer fitting the original function. In other words, fitting
\(\log y = \log a +b\log x\)
is different from fitting
\(y = ax^b\)
With a spreadsheet or math software, you can do nonlinear least squares data fitting, which gets around that issue.
For the above function and your data set, you would guess values for \(a_\mathrm{fit}\) and \(b_\mathrm{fit}\), then calculate for each data pair
\[y_{i_\mathrm{fit}}=a_\mathrm{fit} x_i^{b_\mathrm{fit}}\]
\[\varepsilon_i=y_{i_\mathrm{fit}}-y_i\]
\[SSE=\sum \varepsilon_i^2\]
Then, by hook or by crook (or by using Solver) you would minimize SSE by varying \(a_\mathrm{fit}\) and \(b_\mathrm{fit}\).
Thanks for watching!
The Full Story companion video is in the link in the upper left. The next video in the series is in the upper right. To learn more about Chemical and Thermal Processes, visit the website linked in the description to find previous and following videos in this series.
The DOFPro Team