Comparing CMIP5 & observations

[Given recent interest in previous comparisons of CMIP5 simulations and observations of global mean surface air temperature, this is now a permanent page which will be incrementally updated as more data accumulates]

The figure below shows a comparison of CMIP5 simulations & observations of global mean surface air temperature, using a 1986-2005 reference period, and is an updated version of Figure 11.25 from IPCC AR5. The HadCRUT4.4 observations are shown in black with their 5-95% uncertainty. Several other observational datasets are shown in blue.

The grey shading shows the CMIP5 5-95% range for historical (pre-2005) & all future forcing pathways (RCPs, post-2005); the grey lines show the min-max range. The red hatching indicates the IPCC AR5 assessed likely (>66%) range for the 2016-2035 period. The UK Met Office forecast for 2015 is shown by the green error bar.

There are several possible explanations for why the observations are at the lower end of the CMIP5 range. First, there is internal climate variability, which can cause temperatures to temporarily rise faster or slower than expected. Second, the radiative forcings used after 2005 are from the RCPs, rather than as observed. Given that there have been some small volcanic eruptions and a dip in solar activity, this has likely caused some of the apparent discrepancy. Third, the real world may have a climate sensitivity towards the lower end of the CMIP5 range. Last, the exact position of the observations within the CMIP5 range depends slightly on the reference period chosen. A combination of some of these factors is likely responsible.

Note also that as the HadCRUT4.4 dataset has gaps over the Arctic it is likely to be a slight underestimate of the true recent global temperature anomaly. And, in this version of the figure, the CMIP5 simulations are NOT masked to the HadCRUT4.4 observational coverage, unlike some previous examples on this blog. Finally, recent work has established that the use of air temperatures from the models produces faster warming than a blended combination of air & sea temperatures, as used in the observational estimates.

Updated version of IPCC AR5 Figure 11.25 with HadCRUT4.4 (black) global temperature time-series. The CMIP5 model projections are shown relative to 1986-2005 (light grey). The red hatching is the IPCC AR5 assessed likely range for global temperatures in the 2016-2035 period. The blue lines represent other observational datasets (Cowtan & Way, NASA GISTEMP, NOAA NCDC, ERA-Interim, BEST). [Click for larger version]

2nd September 2015: Updated figure to use HadCRUT4.4 up to July 2015 and added link to Cowtan et al. (2015).
5th June 2015: Updated using data from HadCRUT4.3 up to April 2015, and the new NOAA dataset.
2nd February 2015: Cowtan & Way 2014 data added
26th January 2015: Entire page updated for 2014 temperatures
27th January 2014: Page created.

33 thoughts on “Comparing CMIP5 & observations

  1. Your update of Fig 11.25 should use the Cowtan and Way uncertainty envelope as this data series is now widely preferred to HadCrut4 for the evaluation of recent temperature trends, for obvious reasons. HadCrut4 should be shown as a dotted line without the uncertainty envelope, to emphasize its deprecation.

    1. Hi Deep Climate,

      I don’t think the Cowtan and Way uncertainty envelope covers all the uncertainties that the HadCRUT4 envelope covers.

      They have a reduced coverage uncertainty due to the kriging, but as far as I know (happy to be corrected, I know their data set is in a state of continual improvement), they don’t propagate all the uncertainties in the HadCRUT4 gridded data through their analysis so they’re likely to underestimate the total uncertainty.


      1. Would downloading the global mean from KNMI also give the uncertainty envelopes? I did download KNMI and calculated mean and envelope for RCP4.5. But I had to download all model series to do that. Or am I missing something here?

        Perhaps you could post the consolidated CMIP5 data as you provided to Schmidt et al. Or better yet just post the data as charted (i.e. CMIP5 model mean with uncertainty envelopes, assessed likely range).

        Also, AR4 has a very good feature where data to reproduce each chart was made available. Is that available for AR5?

  2. Hi Deep Climate,

    The plotted 5-95% future ranges are here.

    The red hatched box corners are:
    [2016: 0.133 0.503]
    [2035: 0.267 1.097]

    I don’t think there is such a data feature for AR5. Where is the AR4 one?


    1. I misremembered a bit – it’s not for all figures.

      However AR4 WG1 Fig 10.5 (scenario projections) can be reproduced from GCM model means found here:

      I found these easier to work with than KNMI, because they are exactly as used in AR4 (can’t get KNMI to match up exactly for various reasons), and also are already merged with 20th century hindcasts.

      I don’t think a similar page for AR5 GCM mean output exists yet.

    2. Oh and many thanks for that – I’ll use it to replace the RCP4.5 I have.

      I would like the historical envelope too please (at least from say 1980 so as to cover the 1986-2005 baseline period).

      Nice to have would be min-max and an ensemble mean or median, but don’t bother if you don’t have these easily at hand. Thanks again!

  3. Ed,
    Please stick to using HadCRUT4. Cowtan & Way are SkS activists and it is far from clear that their GMST timeseries is to be preferred to HadCRUT4 even over shortish timescales. And their co-authorship of a recent paper (Cawley et al. 2015) that multiplied by a factor it should have divided by, thereby wrongly strengthening their argument that TCR had been underestimated in another study, does not inspire confidence in the reliability and impartiality of their temperature dataset.

    For a full spatial coverage comparison, I think using NCDC MLOST and UAH and RSS TLT data alongside HadCRUT4 would be preferable. Ideally one would compare the TLT data with model projections having a similar vertical weighting profile – do you know if and such data has been produced for CMIP5 models, at grid-cell or global resolution?

    1. Cowtan & Way are SkS activists
      So what? You’re associated with the Global Warming Policy Foundation. Do you think people should discount what you do because of that? I don’t, but it appears that you think they should.

      And their co-authorship of a recent paper (Cawley et al. 2015) that multiplied by a factor it should have divided by,
      Oh come, that’s pathetic. It was a stupid mistake. For someone who works in this field, it should be patently obvious that Loehle’s calculation was complete and utter nonsense. That Cawley et al. made a silly mistake when discussing what would happen if Loehle had used all external forcings, rather than CO2 only, doesn’t change that Loehle’s calculation was garbage. That you would focus on Cawley et als. silly mistake, while ignoring Loehle’s calculation completely, does you no favours. Maybe people shouldn’t ignore your work because of your association with the GWPF, but should ignore it because of your very obvious and explicit biases. Maybe consider that before harping on about the impartiality of others.

    2. Nic Lewis,
      I could hardly be described as an activist. You’re welcome to go through all my postings at SKS and point out instances where I have pushed a particular policy agenda. I am an Inuk researcher from northern Canada who has devoted my life to studying the cryosphere with a side interest in some climate related items – certainly not an activist. I write for skeptical science on issues related to the Polar Regions and my aim is to help better inform people on these areas that are often discussed in the popular media.

      Secondly, and more importantly, you have yet to demonstrate any issues with Cowtan and Way (2014) and yet you repeatedly berate it. You have been multiple times asked to defend your views but you have not put forward any sort of credible technical reasoning. Now you have resorted to attacking the credibility of the authors because you’re not capable of attacking from a scientific perspective.

      You’re welcome to your opinions, of course, but you will be asked to defend them when they’re not based on fact. Once again – I am very open to having a discussion with you on the technical merits of the CW2014 record. Will you avoid the topic – as you have each and every time I asked for such a discussion?

    3. “For a full spatial coverage comparison, I think using NCDC MLOST and UAH and RSS TLT data alongside HadCRUT4 would be preferable. Ideally one would compare the TLT data with model projections having a similar vertical weighting profile – do you know if and such data has been produced for CMIP5 models, at grid-cell or global resolution?”

      Well Nic, NCDC’s MLOST is actually more susceptible to coverage bias than HadCRUT4 because its coverage in the northern regions is even poorer. You can see that very clearly in this document:

      Aside from having less coverage, there is also evidence that the GHCN automated homogenization algorithm is downweighting several Arctic stations because they’re warming rapidly in contrast with the boreal cooling over eurasia. This has been verified by some of the people at GHCN who we contacted on the issue.

      As for the remote sensing datasets – I am unsure of which record is preferable but when you look at the disagreement between the RSS, UAH and STAR groups it is necessary that more effort is done to reconcile these differences. Secondly, the satellite series will undoubtedly miss some of the Arctic warming because it is characterized as being most intense in the near-surface as a result of ice-feedbacks. This is something that analyses such as by Simmons and Poli (2014) have picked up and this is also an area where I do believe that the use of UAH by our record in the Hybrid approach could underestimate some warming (potentially).

      All these issues aside – using CW2014 or BEST with your approach to climate sensitivity raises the numbers by ~10% and you have yet to provide a technical argument why these two records should be excluded.

  4. ” That you would focus on Cawley et als. silly mistake, while ignoring Loehle’s calculation completely, does you no favours.”

    Far from ignoring the shortcomings in Loehle’s method, I wrote in my original comment on Cawley et al 2015 at The Blackboard:

    “Some of the points Cawley et al. make seem valid criticisms of the paper that it is in response to – Loehle (2014): A minimal model for estimating climate sensitivity (LS14; paywalled). I’m not very keen on the LS14 model for global temperature changes over the instrumental period, on Cawley et al.’s revised version thereof, or on their alternative “minimal” model. They are all cycle-based curve-fitting approaches, without what I would regard as a properly justified physical basis.”

    The shortcoming in Loehle’s model have absolutely no relevance to my comment here, as you must surely realise.


    That all five authors of Cawley et al could overlook such a basic and gross error is very worrying. A combination of comfirmation bias and carelessness seems the most likely explanation to me. As I wrote, that does not inspire confidence in the Cowtan & Way temperature dataset, whatever its merits may be.

    I don’t discount what people associated with SkS produce, but I do scrutinise it carefully. As this case shows, peer review cannot be relied upon to pick up even obvious, gross errors.


    1. I’m not very keen on the LS14 model for global temperature changes over the instrumental period, on Cawley et al.’s revised version thereof, or on their alternative “minimal” model. They are all cycle-based curve-fitting approaches, without what I would regard as a properly justified physical basis.”

      Oh, and this is rather over-stated (I was going to say nonsense, but I’m trying to reign this in slightly :-) ). The model in Cawley et al. is essentially an energy balance model with a lag that attempts to take into account that the system doesn’t respond instantly to changes in forcing, and with a term that mimics internal variability using the ENSO index. It is a step up on a simple energy balance approach. You’re right about LS14, though. That is just curve fitting.

      1. Hi Nic & ATTP,
        Please try and keep the discussion scientific! I have edited and snipped some comments from both of you which strayed too far off topic. All papers should be judged on their merits rather than author lists.

        I have retained HadCRUT4 as the primary global datatset as they use no interpolation/extrapolation, and included the other major datasets for completeness. All sit inside the HadCRUT4 uncertainties (using this reference period).


  5. New reader Ed and very much enjoying the science you’re addressing here. First comment – I share Nic Lewis’s view that Cowtan and Way dataset should not be used because C&W have recently been shown to be practitioners of poor quality science and so it would be prudent to be sceptical of all their science, at least for the time being.
    C&W were authors (with others at SkS) of a paper that was researched, written, presumably checked, and then published, containing a substantive and integral error in workings/calculations that rendered their published conclusions wholly unsupportable. Nic Lewis noted the error at Lucia’s Blackboard. Bishop Hill picked it up which is how it came to my attention.
    Robert Way’s twitter feed had 4 tweets about the published paper on 15 Nov 2014 that read:

    “Last year, this rubbish (WB – he’s referring here to the Loehle paper) was published in an out-of-topic journal that contained the author on its advisory board 1/4
    Beyond the 4-month timeline from receipt to published, clear statistical errors showed it had not been scrutinized enough during review 2/4
    Reimplementing their analysis showed a number of methodological flaws and assumptions to the point where a response was necessary 3/4
    Enter Cawley et al (2015) who show how fundamentally flawed Loehle (2013) was in reality #climate (4/4)”


    Judith Curry has posted before about her commenters (denizens) giving Way a break because he is young but she’s noted he has choices to make about how he wants to conduct his science career. Based on his election to be involved with SkS, his unnecessary sharpness in comments to Steve McIntyre at ClimateAudit and his tweets, I am of the opinion Way is leaning warmy in the manner of other unreliable warmy scientists a la Mann, Schmidt, Tremberth et al, and as a consequence I should be skeptical of all of the science with which he is involved and I should remain so until he and his co-authors publish a corrigenda acknowledging their error and correcting it.

    I think it’s reasonable to take the following position:
    1. the C&W dataset was created by scientists who have at least once publicly concluded x when their own workings show y, and the scientists did not even realise their workings showed y before they went to press (whatever the reason they engaged in a low quality science activity).
    2. it is reasonable to consider that if they have done it once then they may have done it twice i.e made a substantive error with their dataset.
    Ignoring their dataset at this time is defensible, and probably even prudent.
    My two cents.

    1. Hi WB – welcome to CLB!
      Am sure we all agree that mistakes do happen – I have published an erratum in one of my own papers and I believe a correction to the Cowley paper is happening.

      I prefer to discuss scientific issues here – the data and methodology is available for CW14 so it could be independently checked if someone wants to. C&W have also been very open to point out that 2014 is second warmest in their dataset. And, the results are well within the HadCRUT4 uncertainties.

      I see no reason to assume there is a mistake in CW14 until shown otherwise – their involvement with SkS doesn’t matter to me. In the same way, I don’t assume that Nic has made an error because he is involved with GWPF.


  6. Thanks Ed, I do like the science and I try my hardest but I am a commercial lawyer in the private sector (IT industry) so I view the practice of climate science through that legal professional prism.

    I don’t actually agree with you that ‘mistakes do happen’ because to me that’s an incomplete statement. I think you mean ‘mistakes do happen but the people who made those mistakes in piece of work B should not have an unfavourable inference drawn against them about piece of work A, C, K etc so long as they issue a corrigenda, or perhaps have their work retracted’.

    I think that’s what you mean. And I think that is the common approach in academia/govt sector pubic service, which is, after all, where climate science is practised. But here in the private sector ‘mistakes do happen’ is called ‘failure’ and we most definitely draw unfavourable inferences about the people who fail and about their works – all of their works.

    You are only as good as your last case/deal/contract/transaction.

    We put workers on performance management if they’re not reaching their KPIs, we sack them if they still can’t reach their KPI’s. We do that because at the end of the line we’re vulnerable to getting sued if we deliver an inferior defective product or a lousy service. That threat of litigation tends to concentrate our minds on not delivering inferior/defective goods and services. Hence we rid ourselves of under performing staff.

    In climate science (i.e academia) there never seems to be any adverse consequence to mistakes. Your attitude of presuming C&W work is good until it is proven bad is collegiate and generous. I can’t share it cos it’s just not how private sector folks work – I presume everything C&W do, have done and ever will do will be mistaken until it is proven right by a trusted source.

    Who I trust is quite another matter but I’ll start with Steve McIntyre and thrown you into the pot alongside Judy Curry 😉

  7. Hi Ed,
    First comment here, but I’ve been reading your excellent blog for a while..
    You have earlier made a good point of comparing apples to apples (as much as possible), eg Hadcrut4 vs masked CMIP5.
    I have been wondering, is it fully right to compare observational Global surface temperature indices with CMIP5, since the indices are composites made by roughly 71% SST and 29% 2m land air temperatures, but CMIP:s are 100% 2 m air temperatures?

    I have played with data from KNMI Climate Explorer and made a CMIP5-index that should be an equivalent to the global temperature indices. I downloaded CMIP5 rcp 8.5 means for tos (temperature of ocean surface) and land-masked tas (2 m air temperature) and simply made an index with 0.71tos+ 0.29 tas.
    For comparison I chose Hadcrut4 kriging (Cowtan & Way) since it has the ambition of full global coverage.

    I also applied a 60 year base period (1951-2010) for anomalies. The reason is that there might be 60-year cycles involved that affects the surface temperatures by altering the Nino/Nina-distribution. For example 1910-1940 was a warm period, 1940-1970 cool, 1970-2000 warm, etc.
    The use of a 1986-2005 base may cause bias, since it consists of 15 warm and 5 cold years, which anomaly-wise risks to push down the surface index. I avoid this by using a 60-year base, a full cycle with hopefully balanced warmth and cold.

    Well, the result is here:

    The observations fit very well to the model index. I did not bother to make standard confidence intervals, instead I simply put a “fair” interval of +/- 0.2 C around the CMIP means. I have also included the average anomaly for 2015 so far.

    Other comparisons, apple-to-apple, fit equally well, eg:
    Crutem4 kriging (C&W) vs land mask CMIP5 tas
    GistempdTs vs global CMIP5 tas
    ERSST v4 vs CMIP5 tos

    If I have made some fundamental errors, please correct me.
    I am just a layman with climate science interest… :-)

    1. Hi Olof,

      You make an excellent point, and a very nice graphical demonstration. How much difference does the use of land+ocean simulated data make in your data, compared to just the global 2m temperature?

      Coincidentally, a paper is about to be submitted by a team of climate scientists on this exact topic. There are some subtleties with defining such a land+ocean global index to exactly compare with the observations because of how the models represent coastlines, and because of a changing sea ice distribution, but your approach is a good first step.

      Watch this space for our estimates of how much difference we think it makes!


  8. Ed, the difference between my composite global index and the standard global 100 % 2 m tas is right now (2015) ca 0.1 C, and the difference is increasing. Here is the graph:

    The CMIP5 tos (green) has a clearly lower trend than ocean mask 2 m tas (yellow). This makes sense in a heating world since water has a higher heat capacity than air. As a result the composite global index (brown) runs lower than the standard global 2 m tas (blue). By coincidence the global index is very similar to the ocean mask 2 m tas.

    I dont know in detail how KNMI explorer masking handles sea ice and seasonal variation, that could give some error in this kind of crude estimates.
    However, I am looking forward to a scientific paper and a good blog post on this interesting issue…

  9. Pingback: Quora

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>