Prediction Using Fitted Functions Can Be Very Wrong

I have made several posts about sea levels and salt marsh in Kirby le Soken.

This post was prompted when I read a report about an earlier attempt to recreate salt marsh by breaching a sea wall at Tollesbury (google Tollesbury_Final_report_2008.pdf).  In particular these three graphs of sedimentation rates.

tollesburyFig22tollesburyFig23tollesburyFig24

As these are graphs of sedimentation rates – what is sedimentation?

The water around the coast of East Anglia is murky as there is lots of mud floating in it. Some of this mud may settle on the ground, or salt marsh. The rate at which it is deposited is called the sedimentation rate.

Sedimentation causes salt marsh to rise vertically and is crucial for the salt marsh to survive when sea levels are rising. For the salt marsh to stay above sea level, apart from high water at spring tides, the rate of sedimentation has to be at least as great as the rate the sea level is rising. When sea levels rise faster than the sedimentation the salt marsh will eventually be submerged and  die.

So it’s to be expected that a report on recreating salt marsh contains a discussion and graphs of sedimentation rate.

What caught my eye was that whoever wrote the report choose to fit a straight line to the first two graphs and this quadratic equation to the third:

 y = -14.85 + 0.07650 times day - 0.000006 times day^2

This part of the equation  + 0.07650 times day means that the line on the graph will start going upwards and this part  - 0.000006 times day^2 means that it will eventually turn downwards and reach zero.

I wondered if this was what the author had intended to say.

By the way when we say fit an equation we mean let the computer juggle the numbers around till it finds values which mean that if we add up all the distances from the data points to the line, as it goes past the, we choose the values that give the smallest total distance.

So we might have started with  y = 1 + 2 times day - 3 times day^2 and the computer tries different values instead of 1, 2 and 3 until it gets the values which give the shortest overall distance – the best fit.

It is important to realise fitting an equation to some data only really gives us information over the range of the data. In particular making predictions outside this range can give very very misleading results, unless there is a strong reason to believe that a particular equation should hold over a wider range.

Perhaps some examples will help

  • The stock market goes up for a few weeks or months and people assume it will go up for ever.
  • We have a few hot years in a row and people assume that it will go on getting hotter for ever.
  • There is a run of gloomy economic news and people assume that things will go on  getting worse and worse.

An advantage of living by the sea (or playing with a yoyo) is you get used to the idea of things going up and down.

Here are the graphs of 6 different types of functions.  I have used a different range of x values to best show the features of each function.

sixFunc

I am sure you would agree they all look different, but they all start from the point 0,0 and go upwards.  Here is a graph with all 6 functions on and x only goes from 0 to 1.

toOneAlthough you can see 6 different lines the different shapes are not nearly so obvious.   Also at the bottom left hand side of the graph (x=0 to x=0.2) the lines of all 6 functions are pretty much the same.  How easy is it to look at this part of the graph and decide which line corresponds to which of the different graphs above.

to00OneThis graph where x only goes up to 0.001, still has all 6 functions on but, to me at least, it looks like they are all drawing the same line.

Hopefully this has shown (or at least given an indication) that just because you can fit a function to some data does not give you any certainty that the function would describe the data outside the range you have fitted.  Perhaps the data is better described by a different function which behaves the same over the range you have fitted but very differently outside this range.

Returning to the sedimentation graphs, the data points seem to turn down on all three graphs.  A little in fig 2.2, a bit more in fig 2.3 and even more in fig 2.4.  But even in fig 2.4 should you be confident that the sedimentation rate will eventually drop to zero? Isn’t it just as likely the sedimentation rate slows towards a constant level?  This sort of shape is described by “log” function, an example is  the middle row of the right hand column  in the graphs of 6 functions.

This entry was posted in EA, maths, Sea Level, SMP. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

HTML tags are not allowed.

295,447 Spambots Blocked by Simple Comments

Anti-Spam Quiz:

CommentLuv badge