Tuesday, 21 February 2017

Politics is not rational

Hillary Clinton lost the presidential election because people are not rational. Except for racists and millionaires it would have been in the best interest of everyone to have voted for Clinton. But we are not rational, we do not always look at our best interests. Real humans are not homo economii.

That includes me. As a scientist it is my job to keep a cool head. I hope you will excuse me for thinking I do my job reasonably well. I like to see myself as rational, but naturally I am not, especially learning about the ultimatum game shocked my self-perception.

It is a very simple and pure economic game. Reducing a problem to its essence like this has the elegance my inner physicist loves. In the ultimatum game, two players must divide a sum of money. The first player has to propose a certain division. The second player can accept this division or reject it. If the offer is rejected both players do not receive any money. In its purest form, the experiment is played only once and anonymously with players that do not know each other.

Time for a short thinking pause: What would you do? How much would you offer as player one? Below which percentage would you reject the offer?

Initially, I wondered why economists would play this game. Surely player one would would offer 50/50 and player two would accept. But that was my irrational side and my missing economic eduction. A good economist would expect that player two would accept any non-zero offer: it is better to get something than nothing, and that thus player one will make the smallest possible offer. Reality is in between. Many people offer 50%, but many also do not. These offers below 50% are, however, also regularly rejected. Player two is apparently willing to hurt himself to punish unfair behavior. This game and many variations and similar games lead to the conclusion: humans are not purely selfish, but have a sense of fairness.

As a student of variability, for me the key aspect of the ultimatum game is its non-linearity. You either get something or nothing. In case of nonlinear processes, such as radiation flowing through clouds, variability is important. A smooth cloud field reflects more solar radiation than a bumpy cloud field with the same amount of water. The variability of the cloud water is important because the flow of radiation through clouds is a non-linear process.

By sometimes rejecting low offers, player two gets better offers from player one. This is especially clear when the game is played multiple times with the same players. In the beginning quite large offers are rejected to entice larger offers later in the game. How humans evolved a sense of fairness to be able to also benefit from this in one-off games is not yet understood. Fairness is surprising because a cartoon version of evolutionary theory would predict that altruism is only possible among kin. But the empirical evidence clearly shows that fairness belongs to being human. (Just like competition.)

Knowledge will come only if economics can be reoriented to the study of man as he is and the economic system as it actually exists.
Ronald Coase

Fairness is but one emotion that it not rational, not "productive". It offers some protection against unfairness, such as wages going lower and lower. Offering and accepting jobs are yes-no decisions under uncertainty for both parties. If there is one term that is often used in labor conflicts it is "unfair wages" or "unfair labor conditions". All the while economists wonder why unemployment is higher than the friction unemployment of rational actors and blame anything but their faulty assumptions.

Anger is also not productive, but fear of anger forces the haves to make better offers to the have-nots. Amok runs are not productive, mass shootings are not productive, suicide attacks are not productive. I would venture that independent of the proclaimed rationalizations, they signal a lack of justice and fairness.

The American election was also seen as unfair by many. The two parties had both selected historically unpopular candidates. Had the historically unpopular Trump not run, Clinton would have been the least popular candidate since polling started on this question. The main reason to vote was not to get other candidate.

With both candidates and parties so unpopular, with the historical unpopularity ratings of Congress and Washington the enormous partisan tribalism in America is surprising. The main pride of both tribes seems to be that they are at least not members of the other tribe. The lizard people have managed to pit the population against each other, while they loot the country and drag the world down. Do help me in the comments how "they" did this.

Many felt the election was a trap. In such a case one can expect irrational behavior. Or as Michael Moore elegantly said: Trump is the human Molotov cocktail they could throw through the window of the establishment. I am afraid the voters will find it was the window of their own house.

One mistake the Democratic establishment made in their support for Clinton was to expect rational behavior. They learned about economics and its political counterpart [[public choice theory]]. Both theories assume rational behavior. The Democrat establishment assumed that the working class had no other options than to vote for them because the Republicans would make their lives even worse.

Nic Smith, a self-described "white trash hillbilly from the holler" from coal country, on Trump voters: They are desperate to believe in something.

In a rational world the establishment would be right and player two would take the non-zero Clinton offer, in the real world people are fed up with begin treated unfairly and seeing inequality and corruption jointly grow for decades. In the real world having to choose the lesser evil, election after election, over and over again, makes it ever more likely the voters will sulk. That the Democrat establishment had just put up their middle finger to half of their party during the primaries likely also did not help putting people in a more rational mood.

Last year's presidential election was an extreme example, but a two-party system invariably mean that many people do not feel represented and are dissatisfied. [[A transferable vote]] would do a lot to fix this and gives the voters the possibility to vote for their candidate of choice without losing their vote.

A two-party system is also much more prone to corruption. A large part of the politicians will be in save districts and do not have to fear the wrath of their voters. Where the voters do have some choice, the corporations only have to convince politician D that they will also bribe politician R and both can do so with impunity.

A corrupt two-party system is not much better than a one-party system. In a representative democracy with more than two parties there would be real competition and the voters could vote for another politician.

What can we do to break this ultimatum game? The rhetoric and tribalism in America is unique. Humans are social animals and our group is important to us, but the US tribalism in beyond normal. For example, 34% of Trump voters being willing to say Trump's inauguration was the biggest ever is not normal.

Tribalism and emotions are not good for clear thinking and needs to be fought. The only thing we can change is how we act ourselves, we should try to reduce unnecessarily antagonizing people. When you have to say something bad about the corrupt Republican politicians in Washington make clear you mean them and do not use the term Republicans, which also means every single member of the group, most of whom also reject corruption.

I am only talking about who you address. Please stand your ground, there is no need to keep on moving our position in the direction of corrupt unreasonable politics. That only signals you do not believe in your ideas. If there is one thing frustrating about US politics it is weak corporate Democrats continually moving in the direction of ever more corrupt Republican politicians in the name of appeasement and in reality because they have the same donors.

Given the lack of a real choice one can also not blame the voters for every character error of their candidate and for all policies. For fashion icon Ken Bone the election was a choice between his personal benefit as coal worker and the greater good. Many Trump voters voted for Obama before. Some people say they voted Trump expecting him not to be able to execute his racist plans because they are unconstitutional. That may be a rationalization and for me Trump's overt racism would be a deal breaker, but not all of his voters are automatically bigots, even if many clearly are.

Darkness cannot drive out darkness; only light can do that. Hate cannot drive out hate; only love can do that.
Martin Luther King, Jr

Most people simply voted the party they always voted. There are people who have their health insurance via the Affordable Care Act who voted Republican and are likely to lose coverage. They thought the Republicans would not do something as barbaric as repealing the ACA without replacement. Thousand of people will die every year when that happens, but the repeal means that billionaires will have to pay less for healthcare and they own the Republican politicians, so I am less optimistic they will not do it.

Do not go around calling every Trump voter a personalized Donald Trump, make them an offer they cannot refuse. Especially the Democratic establishment should stop blaming everyone but themselves for not voting for their inevitable candidate. Rather than scolding their voters, they should make the left an offer they cannot refuse.

That offer would be a non-corrupt candidate. That would be an offer Democrat and Republican voters alike would find it hard to refuse. It is, unfortunately, the one compromise the Democratic establishment is least willing to make. The people in power are in power because they are good at selling out to corporations.

This video gives a good overview of the corruption in America and how it impacts normal people via politics and the media. Since corruption became worse the workers no longer shared in the increases in productivity and the politicians respond to the wishes of the donor class and not the working class. Readers from the USA may think political corruption is normal because it slowly and imperceptibly grew, but in its enormity it is not normal. It was much better before the 1970s it is much better in other advanced nations.

Fortunately several initiatives have sprung up after the Trump election debacle and after Sanders showing it is possible to campaign for the presidency without taking donor money. As an offspring of the Sanders campaign Our Revolution will run a large number of candidates under one political and organizational platform. Similar, but very clear in their wish to primary and get rid of corporate Democrats, are the Justice Democrats.

The non-partisan group Brand New Congress also wants to help (Tea Party) Republicans that do not accept money into Congress. I would love to see more of this on the Republican side. In Europe conservative parties are conservative, but not corrupt and not bat shit crazy. They are people you can have an adult conversation with and negotiate. They may prioritize the environment less, but do not childishly claim climate change does not exist. Getting non-corrupt Republicans into office may even be worth the time of US liberals.

The group 314 Action (inspired by π) work to get more Science, Technology, Engineering and Math (STEM) people into politics. If you love money and power, science is the weirdest career choice you can make. Thus I would expect the scientists that run for office to be mostly clean. The climate "debate" shows that nearly all climatologists are not touched by corporate corruption, while there are strong incentives for coal and oil companies to bribe them.

Let's work to end corporate rule, get the corporation out of politics and send them back to take care of the economy.

Following The Ninth: In The Footsteps of Beethoven's Final Symphony.

Related reading

The big lesson of Trump's first 2 weeks: resistance works

The magazine Correspondent: This is how we can fight Donald Trump’s attack on democracy. Focuses on how to change the media, which has become more pressing in the Age of Trump

Chris Hedges: We Are All Deplorables. "My relatives in Maine are deplorables. I cannot write on their behalf. I can write in their defense. ... I see the Christian right as a serious threat to an open society. But I do not hate those who desperately cling to this emotional life raft"

Thomas Frank in The Guardian: How the Democrats could win again, if they wanted

CNN Money: U.S. inequality keeps getting uglier

David Roberts of Vox: Everything mattered: lessons from 2016's bizarre presidential election - WTF just happened?

Political Polarization in the American Public - How Increasing Ideological Uniformity and Partisan Antipathy Affect Politics, Compromise and Everyday Life

North Carolina is no longer classified as a democracy by Andrew Reynolds, Professor of Political Science at the University of North Carolina at Chapel Hill.

A law professor's warning: we are closer to oligopoly than at any point in 100 years. Economically. The political power of the corporations is also increasing

The first days inside Trump’s White House: Fury, tumult and a reboot. "Trump has been resentful, even furious, at what he views as the media’s failure to reflect the magnitude of his achievements, and he feels demoralized that the public’s perception of his presidency so far does not necessarily align with his own sense of accomplishment."

An important piece for poll nerds by Nate Silver: Why Polls Differ On Trump’s Popularity?

Variable Variability: The ultimatum game, a key experiment showing intrinsic fairness and altruism among strangers

* Photo at the top, Be Human, is by ModernDope and has a creative commons CC BY-SA 2.0 license.

Sunday, 5 February 2017

David Rose's alternative reality in the Daily Mail

Peek-a-boo! Joanna Krupa shows off her stunning figure in see-through mesh dress over black underwear
Bottoms up! Perrie Edwards sizzles in plunging leotard as Little Mix flaunt their enviable figures in skimpy one-pieces
Bum's the word! Lottie Moss flaunts her pert derriere in a skimpy thong as she strips off for steamy selfie

Sorry about those titles. They provide the fitting context right next to a similarly racy Daily Mail on Sunday piece of David Rose: "Exposed: How world leaders were duped into investing billions over manipulated global warming data". Another article on that "pause" thingy that mitigation skeptics do their best to pretend not to understand. For people in the fortunate circumstances not to know what the Daily Mail is, this video provides some context about this Murdoch "newspaper".

[UPDATE: David Rose' source says in an interview with E&E News on Tuesday: “The issue here is not an issue of tampering with data”. So I guess you can skip this post, except if you get pleasure out of seeing the English language being maltreated. But do watch the Daily Mail video below.

See also this article on the void left by the Daily Mail after fact checking. I am sure all integrityTM-waving climate "skeptics" will condemn David Rose and never listen to him again.]

You can see this "pause" in the graph below of the global mean temperature. Can you find it? Well you have to think those last two years away and then start the period exactly in that large temperature peak you see in 1998. It is not actually a thing, it is a consequence of cherry picking a period to get a politically convenient answer (for David Rose's pay masters).

In 2013 Boyin Huang of NOAA and his colleagues created an improved sea surface dataset called ERSST.v4. No one cared about this new analysis. Normal good science. One of the "scandals" Rose uncovered was that NOAA is drafting an article on ERSST.v5.

But this post is unfortunately about nearly nothing, about the minimal changes in the top panel of the graph below. I feel the important panel is the lower one. It shows that in the raw data the globe seems to warm more. This is because before WWII many measurements were performed with buckets and the water in the bucket would cool a little due to evaporation before reading the thermometer. Scientists naturally make corrections for such problems (homogenization) and that helps make a more accurate assessment of how much the world actually warmed.

But Rose is obsessed with the top panel. I made the graph extra large, so that you can see the differences. The thick black line shows the new assessment (ERSST.v4) and the thin red line the previously estimated global temperature signal (ERSST.v3). Differences are mostly less than 0.05°C, both warmer and cooler. The "problem" is the minute change at the right end of the curves.

The mitigation skeptical movement was not happy when a paper in Science in 2015, Karl and colleagues (2015), pointed out that due to this update the "pause" is gone, even if you use the bad statistics the mitigation skeptics like. As I have said for many years now about political activists claiming this "pause" is highly important: if your political case depends on such minute changes, your political case is fragile.

In the mean time a recent article in Science Advances by Zeke Hausfather and colleagues (2016) now shows evidence that the updated dataset (ERSSTv4) is indeed better than the previous version (ERSSTv3b). They do so by comparing the ERSST dataset, which comes from a large number of data sources, with data that comes only from only one source (buoys, satellites (CCl) or ARGO). These single-source datasets are shorter, but without trend uncertainties due to the combination of sources. The plot below shows that the ERSSTv4 update improves the fit with the other datasets.

The trend change over the cherry-picked "pause" period were mostly due to the changes in the sea surface temperature of ERSST. Rose makes a lot of noise about the land data, where the update was inconsequential. As indicated in Karl and colleagues (2015) this was a beta-version dataset. The raw data was published; that is the data of the International Surface Temperature Initiative (ISTI) and the homogenization method was published. The homogenization method works well; I checked myself.

The dataset itself is not published yet. Just applying a known method to a known dataset is not a scientific paper. Too boring.

So for the paper NOAA put a lot of work into estimating the uncertainty due to the homogenization method. When developing a homogenization method you have to make many choices. For example, inhomogeneities are found by comparing one candidate station with multiple nearby reference stations. There are setting for now many stations and for how nearby the reference stations need to be. NOAA studied which of these settings are most important with a nifty new statistical method. These settings were varied to study how much influence that has. I look forward to reading the final paper. I guess Rose will not read it and stick to his role as suggestive interpreter of interpreters.

The update of NOAA's land data will probably remove a precious conspiracy of the mitigation skeptical movement. While, as shown above, the adjustments reduce our estimate for the warming of the entire world, the adjustments make the estimate for the warming over land larger. Mitigation skeptics like to show the adjustments for land data only to suggest that evil scientists are making global warming bigger.

This is no longer the case. A recommendable overview paper by Philip Jones, The Reliability of Global and Hemispheric Surface Temperature Records, analyzed the new NOAA dataset. The results for land are shown below. The new ISTI raw data dataset shows more warming than the previous NOAA raw data dataset. As a consequence the homogenization now does not change the global mean appreciably any more to arrive at about the same answer after homogenization; compare NOAA uncorrected (yellow line) with NOAA (red; homogenized).

The main reason for the smaller warming in the old NOAA raw data was that this smaller dataset contained a higher percentage of airport stations. That is because airports report their data very reliably in near real time. Many of these airport stations were in cities before and cities are warmer than airports due to the urban heat island effect. Such relocations thus typically cause cooling jumps that are not related to global warming and are removed by homogenization.

So we have quite some irony here.
Still Rose sees a scandal in these minute updates and dubs it Climategate 2; I thought we were already at 3 or 4. In this typical racy style he calls data "wrong", "rogue", "biased". Knowing that data is never perfect is why scientists do their best to assess the quality of the data, remove problems and make sure that the quality is good enough to make a certain statement. In return people like David Rose simultaneously pontificate about uncertainty monsters and assumes data is perfect and then get the vapors when updates are needed.

Rose gets some suggestive quotes from an apparently disgruntled retired NOAA employee. The quotes themselves seem to be likely inconsequential procedural complaints, the corresponding insinuations seem to come from Rose.

I thought journalism had a rule that claims by a source need to be confirmed by at least a second source. I am missing any confirmation.

While Rose presents the employee as an expert on the topic, I have never heard of him. Peter Thorne, who worked at NOAA, confirms that the employee did not work with surface station data himself. He has a decent publication record, mainly on satellite climate datasets of clouds, humidity and radiation. Ironically, I keep using that word, he also has papers about the homogenization of his datasets, while homogenization is treated by the mitigation skeptical movement as the work of the devil. I am sure they are willing to forgive him his past transgressions this time.

It sounds as if he made a set of procedures for his climate satellite data, which he really liked, and wanted other groups in NOAA to use it as well. Was frustrated when others did not prioritize enough updating their existing procedures to his.

For David Rose this is naturally mostly about politics and in his fantasies the Paris climate treaty would not have existed with the Karl and colleagues (2015) paper. I know that "pause" thingy is important for the Anglo-American mitigation skeptical movement, but let me assure Rose that the rest of the world considers all the evidence and does not make politics based on single papers.

[UPDATE: Some days you gotta love journalism: a journalist asked several of the diplomats who worked for years on the Paris climate treaty, they gave the answer you would expect: Contested NOAA paper had no influence on Paris climate deal. The answers still give an interesting insight into the sausage making. What is actually politically important.]

David Rose thus ends:
Has there been an unexpected pause in global warming? If so, is the world less sensitive to carbon dioxide than climate computer models suggest?
No, there never was an "unexpected pause." Even if there were, such a minute change is not important for the climate sensitivity. Most methods do not use the historical warming for that and those that do consider the full warming of about 1°C since the 19th century and not only short periods with unreliable, noisy short-term trends.

David Rose:
And does this mean that truly dangerous global warming is less imminent, and that politicians’ repeated calls for immediate ‘urgent action’ to curb emissions are exaggerated?
No, but thanks for asking.

Post Scriptum. Sorry that I cannot talk about all errors in the article of David Rose, if only because in most cases he does not present clear evidence and because this post would be unbearably long. The articles of Peter Thorne and Zeke Hausfather are mostly complementary on the history and regulations at NOAA and on the validation of NOAA's results, respectively.

Related information

2 weeks later. The nailing New York Times interviewed several former colleagues of NOAA retire Bates: How an Interoffice Spat Erupted Into a Climate-Change Furor. "He’s retaliating. It’s like grade school ... At that meeting, Dr. Bates shouted that Ms. McGuirk was not trustworthy and belonged in jail, according to an internal log ..." Lock her up, lock her up, ...

Wednesday. The NOAA retiree now says: "The Science paper would have been fine had it simply had a disclaimer at the bottom saying that it was citing research, not operational, data for its land-surface temperatures." To me it was always clear it was research data, otherwise they would have cited a data paper and named the dataset. How a culture clash at NOAA led to a flap over a high-profile warming pause study

Tuesday. is a balanced article from the New York Times: Was Data Manipulated in a Widely Cited 2015 Climate Study? Steve Bloom: "How "Climategate" should have been covered." Even better if mass media would not have to cover office politics on archival standards fabricated into a fake scandal.

Also on Tuesday, an interview of E&E News: 'Whistleblower' says protocol was breached but no data fraud: The disgruntled NOAA retiree: "The issue here is not an issue of tampering with data".

Associated Press: Major global warming study again questioned, again defended. "The study has been reproduced independently of Karl et al — that's the ultimate platinum test of whether a study is to be believed or not," McNutt said. "And this study has passed." Marcia McNutt, who was editor of Science at the time the paper was published and is now president of the National Academy of Sciences.

Daily Mail’s Misleading Claims on Climate Change. If I were David Rose I would give back my journalism diploma after this, but I guess he will not.

Monday. I hope I am not starting to bore people by saying that Ars Technica has the best science reporting on the world wide web. This time again. Plus inside scoop suggesting all of this is mainly petty office politics. Sad.

Sunday. Factcheck: Mail on Sunday’s ‘astonishing evidence’ about global temperature rise. Zeke Hausfather wrote a very complementary response, pointing out many problems of the Daily Mail piece that I had to skip. Zeke works at the Berkeley Earth Surface Temperature project, which produces one of the main global temperature datasets.

Sunday. Peter Thorne, climatology professor in Ireland, former NOAA employee and leader of the International Surface Temperature Initiative: On the Mail on Sunday article on Karl et al., 2015.

Phil Plait (Bad Astronomy) — "Together these show that Rose is, as usual, grossly exaggerating the death of global warming" — on the science and the politics of the Daily Mail piece: Sorry, climate change deniers, but the global warming 'pause' still never happened

You can download the future NOAA land dataset (GHCNv4-beta) and the land dataset used by Karl and colleagues (2015), h/t Zeke Hausfather.

The most accessible article on the topic rightly emphasizes the industrial production of doubt for political reasons: Mail on Sunday launches the first salvo in the latest war against climate scientists.

A well-readable older article on the study that showed that ERSST.v4 was an improvement: NOAA challenged the global warming ‘pause.’ Now new research says the agency was right.

One should not even have to answer the question, but: No, U.S. climate scientists didn't trick the world into adopting the Paris deal. A good complete overview at medium level.

Even fact checker Snopes sadly wasted its precious time: Did NOAA Scientists Manipulate Climate Change Data?
A tabloid used testimony from a single scientist to paint an excruciatingly technical matter as a worldwide conspiracy.

Carbon Brief Guest post by Peter Thorne on the upcoming ERSSTv5 dataset, currently under peer review: Why NOAA updates its sea surface temperature record.

Monday, 30 January 2017

With some programing skills you can compute global mean temperatures yourself

This is a guest post by citizen scientist Ron Roeland (not his real name, but I like alliteration for some reason). Being an actually sceptical person, he decided to compute the global mean land temperature from station observations himself. He could reproduce the results of the main scientific groups that compute this signal and, new for me, while studying the data noticed how important the relocation of temperature stations to airports is for the NOAA GHCNv3 dataset. (The headers in the post are mine.)

This post does not pretend to present a rigorous analysis of the global temperature record; instead, it intends to show how easy it is for someone with basic programming/math skills to debunk claims that NASA and NOAA have manipulated temperature data to produce their global-average temperature results, i.e. claims like these:

From C3 Headlines: By utilizing questionable adjustments based on even more questionable assumptions, NOAA managed to produce an entirely fabricated increase in the global warming trend from 1998 to 2012.

From a blogger on the Hill: There’s going to have to be a massive effort to pick apart failing climate models and questionably-adjusted data.

From Climate Depot: Over the past decade, NASA and NOAA have continuously altered the temperature record to cool the past and warm the present. Their claims are straight out Orwell's 1984, and have nothing to do with science'

The routine

Some time ago, after reading all kinds of claims (like the ones above) about how NASA and NOAA had improperly adjusted temperature data to produce their global-average temperature results, I decided to take a crack at the data myself.

I coded up a straightforward baselining/gridding/averaging routine that is quite simple and “dumbed down” in comparison to the NASA and NOAA algorithms. Below is a complete description of the algorithm I coded up.
  1. Using GHCN v3 monthly-average data, compute 1951-1980 monthly baseline temperatures for all GHCN stations. If a station has 15 or more valid temperatures in any given month for the 1951-1980 baseline period, retain that monthly baseline value; otherwise drop that station/month from the computations. Stations with no valid monthly baseline periods are completely excluded from the computations.
  2. For all stations and months where valid baseline temperature estimates were computed per (1) above, subtract the respective baseline temperatures from all of the station monthly temperature temperatures to produce monthly temperature anomalies for the years 1880-2015.
  3. Set up a global gridding scheme to perform area-weighting. To keep things really simple, and to minimize the number of empty grid-cells, I selected large grid-cell sizes (20 degrees x 20 degrees at the Equator). I also opted to recalculate the grid-cell latitude dimensions as one goes north/south of the equator in order to keep the grid-cell areas as nearly constant as possible. I did this to keep the grid-cell areas from shrinking (per the latitude cosines) in order to minimize the number of empty grid cells.
  4. In each grid-cell, compute the average (over all stations in the grid-cell) of the monthly temperature anomalies to produce a single time-series of average temperature anomalies for each month (years 1880 through 2015).
  5. Compute global average monthly temperature anomalies by averaging together all the grid-cell monthly average anomalies, weighted by the grid-cell areas (again, for years 1880 through 2015).
  6. Compute global-average annual anomalies for years 1880 through 2015 by averaging together the global monthly anomalies for each year.
The algorithm does not involve any station data adjustments (obviously!) or temperature interpolation operations. It’s a pretty basic number-crunching procedure that uses straightforward math plus a wee bit of trigonometry (for computing latitude/longitude grid-cell areas).

For me, the most complicated part of the algorithm implementation was managing the variable data record lengths and data gaps (monthly and annual) in the station data -- basically, the “data housekeeping” stuff. Fortunately, modern development libraries such as the C++ Standard Template Library make this less of a chore than it used to be.

Why this routine?

People unfamiliar with global temperature computational methods sometimes ask: “Why not simply average the temperature station data to compute global-average estimates? Why bother with the baselining and gridding described above?”

We could get away with straight averaging of the temperature data if it were not for the two problems described below.

Problem 1: Temperature stations have varying record lengths. The majority of stations do not have continuous data records that go all the way back to 1880 (the beginning of the NASA/GISS global temperature calculations). Even stations with data going back to 1880 have gaps in their records -- there are missing months or even years.

Problem 2: Temperature stations are not evenly distributed over the Earth’s surface. Some regions, like the continental USA and western Europe, have very dense networks of stations. Other regions, like the African continent, have very sparse station networks.

As a result of problem 1, we have a mix of temperature stations that changes from year to year. If we were simply to average the absolute temperature data from all those stations, the final global-average results would be significantly skewed from year to year due to the changing mix of stations from one year to the next.

Fortunately, the solution for this complication is quite straightforward: the baselining and anomaly-averaging procedure described above. For those who already familiar with this procedure, please bear with me while I illustrate how it works with a simple scenario constructed from simulated data.

Let’s consider a very simple scenario where the full 1880-2016 temperature history for a particular region is contained in data reported by two temperature stations, one of which is located on a hilltop and the other located on a nearby valley floor. The hilltop and valley floor locations have identical long-term temperature trends, but the hilltop location is consistently about 1 degree C cooler than the valley floor location. The hilltop temperature station has a temperature record starting in 1880 and ending in 1990. The valley floor station has a temperature record beginning in 1930 and ending in 2016.

Figure 1 below shows the simulated temperature time-series for these two hypothetical stations. Both time-series were constructed by superimposing random noise on the same linear trend, with the valley-floor station time-series having a constant offset temperature 1 degree C more than that of the hilltop station time-series. The simulated time-series for the hilltop station (red) begins in 1880 and continues to 1990. The simulated valley floor station temperature (blue) data begins in 1930 and runs to 2016. As can be seen during their period of overlap (1930-1990), the simulated valley-floor temperature data runs about 1 degree warmer than the simulated hilltop temperature data.

Figure 1: Simulated Hilltop Station Data (red) and Valley Floor Station Data (blue)

If we were to attempt to construct a complete 1880-2016 temperature history for this region by computing a straight average of the hilltop and valley floor data, we would obtain the results seen in Figure 2 below.

Figure 2: Straight Average of Valley Floor Station Data and Hilltop Station Data

The effects of the changing mix of stations (hilltop vs. valley floor) on the average temperature results can clearly be seen in Figure 2. A large temperature jump is seen at 1930, where the warmer valley floor data begins, and a second temperature jump is seen at 1990 where the cooler hilltop data ends. These temperature jumps obviously do not represent actual temperature increases for that particular region; instead, they are artifacts introduced by the changes in the mix of stations in 1930 and 1990.

An accurate reconstruction of the regional temperature history computed from these two temperature time-series obviously should show the warming trend seen in the hilltop and valley floor data over the entire 1880-2016 time period. That is clearly not the case here. Much of the apparent warming seen in Figure 2 is a consequence of the changing mix of stations.

Now, let’s modify the processing a bit by subtracting the (standard NASA/GISS) 1951-1980 hilltop baseline average temperature from the hilltop temperature data and the 1951-1980 valley floor baseline average temperature from the valley floor temperature data. This procedure produces the temperature anomalies for the hilltop and valley floor stations. Then for each year, compute the average of the station anomalies for the 1880-2016 time period.

This is the baselining and anomaly-averaging procedure that is used by NASA/GISS, NOAA, and other organizations to produce their global-average temperature results.

When this baselining and anomaly-averaging procedure is applied to the simulated temperature station data, it produces the results that can be viewed in figure 3 below.

Figure 3: Average of Valley Floor Station Anomalies and Hilltop Station Anomalies

In Figure 3, the temperature jumps associated with the beginning of the valley floor data record and the end of the hilltop data record have been removed, clearly revealing the underlying temperature trend shared by the two temperature time-series.

Also note that although neither of my simulated temperature stations have a full 1880-2016 temperature record, we were still able to compute a complete reconstruction for the 1880-2016 time period because there was enough overlap between the station records to allow us to “align” them via baselining.

The second problem, the non-uniform distribution of temperature stations, can clearly be seen in Figure 4 below. That figure shows all GHCNv3 temperature stations that have data records beginning in 1900 or earlier and continuing to the present time.

Figure 4: Long-Record GHCN Station Distribution

As one can see, the stations are highly concentrated in the continental USA and western Europe; Africa and South America, in contrast, have very sparse coverage. A straight unweighted average of the data from all the stations shown in the above image would result in temperature changes in the continental USA and western Europe “swamping out” temperature changes in South America and Africa in the final global average calculations.

That is the problem that gridding solves. The averaging procedure using grid-cells is performed in two steps. First, the temperature time-series for all stations in each grid-cell are averaged together to produce a single time-series per grid-cell. Then all the grid-cell time-series are averaged together to construct the final global-average temperature results (note: in the final average, the grid-cell time-series are weighted according to the size of each grid-cell). This eliminates the problem where areas on the Earth with very dense networks of stations are over-weighted in the global average relative to areas where the station coverage is more sparse.

Now, some have argued that the sparse coverage of certain regions of the Earth invalidate the global-average temperature computations. But it turns out that the NASA/GISS warming trend can be confirmed even with a very sparse sampling of the Earth’s surface temperatures. (In fact, the NASA/GISS warming trend can be replicated very closely with data from as few as 30 temperature stations scattered around the world.)

Real-world results

Now that we are done with the preliminaries, let’s look at some real-world results. Let’s start off by taking a look at how my simple “dumbed-down” gridding/averaging algorithm compares with the NASA/GISS algorithm when it is used to process the same GHCNv3 adjusted data that NASA/GISS uses. To see how my algorithm compares with the NASA/GISS algorithm, take a look at Figure 5 below, where the output of my algorithm is plotted directly against the NASA/GISS “Global Mean Estimates based on Land Data only” results.

(Note: All references to NASA/GISS global temperature results in this post refer specifically to the NASA/GISS “Global Mean Estimates based on Land Data only” results. Those results can be viewed on the NASA/GISS web-site; scroll down to view the “Global Mean Estimates based on Land Data only” graph).

Figure 5: Adjusted Data, All Stations: My Simple Gridding/Averaging (blue) vs. NASA/GISS (red)

In spite of the rudimentary nature of my algorithm, my algorithm produces results that match the NASA/GISS results quite closely. According to the R-squared statistic I calculated (seen in the upper-left corner of Figure 5), I got 98% of the NASA/GISS answer with a only tiny fraction of the effort!

But what happens when we use unadjusted GHCNv3 data? Well, let’s go ahead and compare the output of my algorithm with the NASA/GISS algorithm when my algorithm is used to process the unadjusted GHCNv3 data. Figure 6 below shows a plot of my unadjusted global temperature results vs. the NASA/GISS results (remember that NASA/GISS uses adjusted GHCNv3 data).

Figure 6: Unadjusted Data, All Stations: My Simple Gridding /Averaging (green) vs. NASA/GISS (red)

My “all stations” unadjusted data results show a warming trend that lines up very closely with the NASA/GISS warming trend from 1960 to 2016, with my results as well as the NASA/GISS results showing record high temperatures for 2016. However, my results do show a visible warm-bias relative to the NASA/GISS results prior to 1950 or so. This is the basis of the accusations that NOAA and NASA “cooled the past (and warmed the present)” to exaggerate the global warming trend.

Now, why do my unadjusted data results show that pre-1950 “warm bias” relative to the NASA/GISS results? Well, this excerpt from NOAA’s GHCN FAQ provides some clues:
Why are there more cold (negative) step changes than warm(positive) step changes in the historical land surface air temperature records represented in the GHCN v3 dataset?

The reason for the larger number of cold step changes is not completely clear, but they may be due in part to systematic changes in station locations from city centers to cooler airport locations that occurred in many parts of the world from the 1930s to through the 1960s.
Because the GHCNv3 metadata contains an airport designator field for every temperature station, it was quite easy for me to modify my program to exclude all the “airport” stations from the computations. So let’s exclude all of the “airport” station data and see what we get. Figure 7 below shows my unadjusted data results vs. the NASA/GISS results when all “airport” stations are excluded from my computations.

Figure 7: Unadjusted Data, Airports Excluded (green) vs. NASA/GISS (red)

There is a very visible reduction in the bias between my unadjusted results and the NASA results (especially prior to 1950 or so) when airport stations are excluded from my unadjusted data processing. This is quite consistent with the notion that many of the stations currently located at airports were moved to their current locations from city centers at some point during their history.

Now just for fun, let’s look at what happens when we do the reverse and exclude non-airport stations (i.e. process only the airport stations). Figure 8 shows what we get when we process unadjusted data exclusively from “airport” stations.

Figure 8: Unadjusted Data, Airports Only (green) vs. NASA/GISS (red)

Well, look at that! The pre-1950 bias between my unadjusted data results and the NASA/GISS results really jumps out. And take note of another interesting thing about the plot -- in spite of the fact that I processed only “airport” stations, the green “airports only” temperature curve goes all the way back to 1880, decades prior to the existence of airplanes (or airports)! It is only reasonable to conclude that those “airport” stations must have been moved at some point in their history.

Now, for a bit more fun, let’s drill down a little further into the data and process only airport stations that also have temperature data records going back to 1903 (the year that the Wright Brothers first successfully flew an airplane) or earlier.

When I drilled down into the data, I found over 400 “airport” temperature stations with data going back to 1903 or earlier. And when I computed global-average temperature estimates from just those stations, this is what I got (Figure 9):

Figure 9: Unadjusted Data, Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

OK, that looks pretty much like the previous temperature plot, except that my results are “noisier” due to the fact that I processed data from fewer temperature stations.

And for even more fun, let’s look at the results we get when we process data exclusively from non-airport stations with data going back to 1903 or earlier:

Figure 10: Unadjusted Data, Non-Airport Stations with pre-1903 Data (green) vs. NASA/GISS (red)

When only non-airport stations are processed, the pre-1950 “eyeball estimate” bias between my unadjusted data temperature curve and the NASA/GISS temperature curve is sharply reduced.

The results seen in the above plots are entirely consistent with the notion that the movement of large numbers of temperature stations from city centers to cooler outlying airport locations during the middle of the 20th Century is responsible for much of the bias seen between the unadjusted and adjusted GHCNv3 global-average temperature results.

It is quite reasonable to conclude, based on the results presented here, that one major reason for the bias seen between the GHCNv3 unadjusted and adjusted data results is the presence of corrections for those station moves in the adjusted data (corrections that are obviously absent from the unadjusted data). Those corrections remove the contaminating effects of station moves and permit more accurate estimates of global surface temperature increases over time.

Take-home lessons (in no particular order):

  1. Even a very simple global temperature algorithm can reproduce the NASA/GISS results very closely. This really is a case where you can get 98% of the answer (per my R-squared statistic) with less than 1% of the effort.
  2. NOAA’s GHCNv3 monthly data repository contains everything an independent “citizen scientist” needs (data and documentation) to conduct his/her own investigation of the global land station temperature data.
  3. A direct comparison of unadjusted data results (all GHCN stations) vs. the NASA/GISS adjusted data temperature curves reveals only modest differences between the two temperature curves, especially for the past 6 decades. Furthermore, my unadjusted and the NASA/GISS adjusted results show nearly identical (and record) temperatures for 2016. If NASA and NOAA were adjusting data to exaggerate the amount of planetary warming, they sure went to an awful lot of trouble and effort to produce only a small overall increase in warming in the land station data.
  4. Eliminating all “airport” stations from the processing significantly reduced the bias between my unadjusted data results and the NASA/GISS results. It is therefore reasonable to conclude that a large share of the modest bias between my GHCN v3 unadjusted results and the NASA/GISS adjusted data results is the result of corrections for station moves from urban centers to outlying airports (corrections present in the adjusted data, but not in the unadjusted data).
  5. Simply excluding “airport” stations likely eliminates many stations that were always located at airports (and never moved) and also fails to eliminate stations that were moved out from city centers to non-airport locations. So it is not a comprehensive evaluation of the impacts of station moves. However, it is a very easy “first step” analysis exercise to perform; even this incomplete “first step” analysis produces results that strongly consistent with the hypothesis that corrections for station moves are likely the dominant reason for the pre-1950 bias seen between the adjusted and unadjusted GHCN global temperature results. Remember that many urban stations were also moved from city centers to non-airport locations during the mid-20th century. Unfortunately, those station moves are not recorded in the simple summary metadata files supplied with the GHCNv3 monthly data. An analysis of NOAA’s more detailed metadata would be required to identify those stations and perform a more complete analysis of the impacts of station moves. However, that is outside of the scope of this simple project.
  6. For someone who has the requisite math and programming skills, confirming the results presented here should not be very hard at all. Skeptics should try it some time. Provided that those skeptics are willing and able to accept results that contradict their original views about temperature data adjustments, they could have a lot of fun taking on a project like this.

Related reading

Also the Clear Climate Code project was able to reproduce the results of NASA-GISS. Berkeley Earth made an high-level independent analysis and confirmed previous results. Also (non-climate) scientist Nick Stokes (Moyhu) computed his own temperature signal: TempLS which also fits well.

The global warming conspiracy would be huge. Not only the 7 global datasets also national datasets from so many groups show clear warming.

Just the facts, homogenization adjustments reduce global warming.

Why raw temperatures show too little global warming.

Irrigation and paint as reasons for a cooling bias.

Temperature trend biases due to urbanization and siting quality changes.

Temperature bias from the village heat island

Cooling moves of urban stations. From cities to airports or simply to outside a city or village.

The transition to automatic weather stations. We’d better study it now. It may be a cooling bias.

Changes in screen design leading to temperature trend biases.

Early global warming

Cranberry picking short-term temperature trends

How climatology treats sceptics

Monday, 16 January 2017

Cranberry picking short-term temperature trends

Photo of cranberry fields

Monckton is a heavy user of this disingenuous "technique" and should thus know better: you cannot get any trend, but people like Monckton unfortunately do have much leeway to deceive the population. This post will show that political activists can nearly always pick a politically correct period to get a short-term trend that is smaller than the long-term trend. After this careful selection they can pretend to be shocked that scientists did not tell them about this slowdown in warming.

Traditionally this strategy to pick only the data you like is called "cherry picking". It is such a deplorable deceptive strategy that "cherry picking" sounds too nice to me. I would suggest calling it "cranberry picking". Under the assumption that people only eat cranberries when the burn peeing is worse. Another good new name could be "wishful picking."

In a previous post, I showed that the uncertainty of short-term trends is huge, probably much larger than you think, the uncertainty monster can only stomach a few short-term trends for breakfast. Because of this large uncertainty the influence of cranberry picking is probably also larger than you think. Even I was surprised by the calculations. I hope the uncertainty monster does not upset his stomach, he does not get the uncertainties he needs to thrive.

Uncertainty monster made of papers

Size of short-term temperature fluctuations

To get some realistic numbers we first need to know how large the fluctuations around the long-term trend are. Thus let's first have a look at the size of these fluctuations in two surface temperature and two tropospheric temperature datasets:
  • the surface temperature of Berkeley Earth (formerly known as BEST),
  • the surface temperature of NASA-GISS: GISTEMP,
  • the satellite Temperature of the Total Troposphere (TTT) of Remote Sensing Systems (RSS),
  • the satellite Temperature of the Lower Troposphere (TLT version 6 beta) of the University of Alabama in Huntsville (UAH).
The four graphs below have two panels. The top panel shows the yearly average temperature anomalies over time as red dots. The Berkeley Earth data series starts earlier, but I only use data starting in 1880 because earlier data is too sparse and may thus not show actual climatic changes in the global mean temperature. For both surface temperature datasets the second world war is removed because its values are not reliable. The long-term trend is estimated using a [[LOESS]] smoother and shown as a blue line.

The lower panel shows the deviations from the long-term trend as red dots. The standard deviation of these fluctuations over the full period is written in red. The graphs for the surface temperature also gives the standard deviation of the deviations over the shorter satellite period written in blue for comparison with the satellite data. The period does not make much difference.

Both tropospheric datasets have fluctuations with a typical size (standard deviation) of 0.14 °C. The standard deviation of the surface datasets varies a little depending on the dataset or period. For the rest of this post I will use 0.086 °C as a typical value for the surface temperature.

The tropospheric temperature clearly shows more short-term variability. This mainly comes from El Nino, which has a stronger influence on the temperature high up in the air than on the surface temperature. This larger noise level gives the impression that the trend in the tropospheric temperature is smaller, but the trend in the RSS dataset is actually about the same as the surface trend; see below.

The trend in the preliminary UAHv6 temperature is currently lower than all others. Please note that, the changes from the previous version of UAH to the recent one are large and that the previous version of UAH showed more (recent) warming* and about the same trend as the other datasets.

Uncertainty of short-term trends

Already without cranberry picking short-term trends are problematic because of the strong influence of short-term fluctuations. While a average value computed over 10 years of data is only 3 times as uncertain as a 100-year average, the uncertainty of a 10-year trend is 32 times as large as a 100-year trend.**

To study how accurate a trend is you can generate random numbers and compute their trend. On average this trend will be zero, but due to the short-term fluctuations any individual realization will have some trend. By repeating this procedure often you can study how much the trend varies due to the short-term fluctuations, how uncertain the trend is, or more positively formulated: what the confidence interval of the trend is. See my previous post for details. I have done this for the graph below; for the satellite temperatures the random numbers have a standard deviation of 0.14 °C, for the surface temperatures 0.086 °C.

The graph below shows the confidence interval of the trends, which is two times the standard deviation of 10,000 trends computed from 10,000 series of random numbers. A 10-year trend of the satellite temperatures, which may sound like a decent period, has a whooping uncertainty of 3 °C per century.*** This means that with no long-term trend the short-term trend will vary between -3°C and +3 °C per century for 95% of the cases and for the other 5% even more. That is the uncertainty from the fluctuations along, there are additional uncertainties due to changes in the orbit, the local time the satellite observes, calibration and so on.

Cherry picking the begin year

To look at the influence of cranberry picking, I generated series of 30 values, computed all possible trends between 10 and 30 years and selected the smallest trend. The confidence intervals of these cranberry picked satellite temperature trends are shown below in red. For comparison the intervals for trends without cranberry picking, like above, are shown in blue. To show both cases clearly in the same graph, I have shifted the both bars a little away from each others.

The situation is similar for the surface temperature trends. However, because the data is less noisy, the confidence intervals of the trends are smaller; see below.

While the short-term trends without cranberry picking have a huge uncertainty, on average they are zero. With cranberry picking the average trends are clearly negative, especially for shorter trends, showing the strong influence of selecting a specific period. Without cranberry picking half of the trends are below zero, with cranberry picking 88% of the trends are negative.

Cherry picking the period

For some the record temperatures the last two years are not a sign that they were wrong to see a "hiatus". Some claim that there was something like a "pause" or a "slowdown" since 1998, but that it recently stopped. This claim gives even more freedom for cranberry picking. Now also the end year is cranberry picked. To see how bad this is, I again generated noise and selected the period lasting at least 10 years with the lowest trend and ending this year, or one year earlier or two years earlier.

The graphs below compare the range of trends you can get with cranberry picking the begin and end year in green with "only" cranberry picking the begin year like before in red. With double cranberry picking 96% of the trends are negative and the trends are going down even more. (Mitigation skeptics often use this "technique" by showing an older plot, when the newer plot would not be as "effective".)

A negative trend in the above examples of random numbers without any trend would be comparable to a real dataset where a short-term trend is below the long-term trend. Thus by selecting the "right" period, political activists can nearly always claim that scientists talking about the long-term trend are exaggerating because they do not look at this highly interesting short period.

In the US political practice the cranberry picking will be worse. Activists will not only pick a period of their political liking, but also the dataset, variable, region, depth, season, or resolution that produces a graph that can be misinterpreted. The more degrees of freedom, the stronger the influence of cranberry picking.


There are a few things you can do to protect yourself against making spurious judgements.

1. Use large datasets. You can see in the plots above that the influence of cranberry picking is much smaller for the longer trends. For a 30-year period the difference between the blue confidence intervals for a typical 30-year period and the red confidence intervals for a cranberry picked 30-year period is small. Had I generated series of 50 random numbers rather than 30 numbers, this would likely have shown a larger effect of cranberry picking on 30-year trends, but still a lot smaller than on 10-year trends.

2. Only make statistical tests for relationships you expect to exist. This limits your freedom and the chance that one of the many possible statistical tests is spuriously significant. If you make 100 statistical tests of pure noise, 5 of them will on average be spuriously significant.

There was no physical reason for global warming to stop or slow down after 1998. No one computed the trend since 1998 because they had a reason to expect a change. They computed it because their eyes had seen something; that makes the trend test cranberry picking by definition. The absence of a reason should have made people very careful. The more so because there was a good reason to expect spurious results starting in a large El Nino year.

3. Study the reasons for the relationship you found. Even if I would wrongly have seen the statistical evidence for a trend decrease as credible, I would not have made a big point of it before I had understood the reason for this trend change. In the "hiatus" case the situation was even reversed: it was clear from the beginning that most of fluctuations that gave the appearance of a "hiatus" in the eyes of some was El Nino. Thus there was a perfectly fine physical reason not to claim that there was a change in the trend.

There is currently a strong decline in global sea ice extent. Before I cry wolf, accuse scientists of fraud and understating the seriousness of climate change, I would like to understand why this decline happened.

4. Use the right statistical test. People have compared the trend before 1998 and after 1998 and their uncertainties. These trend uncertainties are not valid for cherry picked periods. In this case, the right test would have been one for a trend change at an unknown position/year. There was no physical reason to expect a real trend change in 1998, thus the statistical test should take that the actual reason you make the test is because your eye sampled all possible years.

Against activists doing these kind of things we cannot do much, except trying to inform their readers how deceptive this strategy is. For example by linking to this post. Hint, hint.

Let me leave you with a classic Potholer54 video delicately mocking Monckton's cranberry picking to get politically convenient global cooling and melting ice trends.

Related reading

Richard Telford on the Monckton/McKitrick definition of a "hiatus", which nearly always gives you one: Recipe for a hiatus

Tamino: Cherry p

Statistically significant trends - Short-term temperature trend are more uncertain than you probably think

How can the pause be both ‘false’ and caused by something?

Atmospheric warming hiatus: The peculiar debate about the 2% of the 2%

Temperature trend over last 15 years is twice as large as previously thought because much warming was over Arctic where we have few measurements

Why raw temperatures show too little global warming

* The common baseline period of UAH5.6 and UAH6.0 is 1981-2010.

** These uncertainties are for Gaussian white noise.

*** I like the unit °C per century for trends even if the period of the trend it shorter. You get rounder numbers and it is easier to compare the trends to the warming we have seen in the last century and expert to see in the next one.

**** The code to compute the graphs of this post can be downloaded here.

***** Photo of cranberry field by mrbanjo1138 used under a Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic (CC BY-NC-ND 2.0) license.

Sunday, 8 January 2017

Much ado about NOAAthing

I know NOAAthing.

This post is about nothing. Nearly nothing. But when I found this title I had to write it.

Once upon a time in America there were some political activists who claimed that global warming had stopped. These were the moderate voices, with many people in this movement saying that an ice age is just around the corner. Others said global warming paused, hiatused or slowed down. I feel that good statistics has always shown this idea to be complete rubbish (Foster and Abraham, 2015; Lewandowsky et al., 2016), but at least in 2017 it should be clear that it is nothing, nothing what so ever. It is interpreting noise. More kindly: interpreting variability, mostly El Nino variability.

Even if you disingenuously cherry-pick 1998 the hot El Nino year as the first year of your trend to get a smaller trend, the short-term trend is about the same size as the long-term trend now that 2016 is another hot El Nino year to balance out the first crime. Zeke Hausfather tweeted to the graph below: "You keep using that word, "pause". I do not think it means what you think it means." #CulturalReference

In 2013 Boyin Huang of NOAA and his colleagues created an improved sea surface dataset called ERSST.v4. No one cared about this new analysis. Normal good science.

Thomas Karl of NOAA and his colleagues showed what the update means for the global temperature (ocean and land). The interesting part is the lower panel. It shows that the adjustments make global warming smaller by about 0.2°C. Climate data scientists naturally knew this and I blogged about his before, but I think the Karl paper was the first time this was shown in the scientific literature. (The adjustments are normally shown for the individual land or ocean datasets.)

But this post is unfortunately about nearly nothing, about the minimal changes in the top panel of the graph below. I made the graph extra large, so that you can see the differences. The thick black line shows the new assessment (ERSST.v4) and the thin red line the previous estimated global temperature signal (ERSST.v3). Differences are mostly less than 0.05°C, both warmer and cooler. The "problem" is the minute change at the right end of the curves.

The new paper by Zeke Hausfather and colleagues now shows evidence that the updated dataset (ERSSTv4) is indeed better than the previous version (ERSSTv3b). It is a beautifully done study of high technical quality. They do so by comparing the ERSST dataset, which comes from a large number of data sources, with  data that comes only from only one source (buoys, satellites (CCl) or ARGO). These single-source datasets are shorter, but without trend uncertainties due to the combination of sources.

The recent trend of HadSST also seems to be too small and to a lesser amount also COBE-SST. This problem with HadSST was known, but not published yet. The warm bias of ships that measure SST at their engine room intake is getting smaller over the last decade. The reason for this is not yet clear. The main contender seems to be that the fleet has become more actively managed and (typically warm) bad measurements have been discontinued.

Also ERSST uses ship data, but it gives them a much smaller weight compared to the buoy data. That makes this problem less visible in ERSST. Prepare for a small warming update for recent temperatures once this problem is better understood and corrected for. And prepare for the predictable cries of the mitigation skeptical movement and their political puppets.

Karl and colleagues showed that as a consequence of the minimal changes in ERSST and if you start a trend in 1998 and compute a trend, this trend is statistically significant. In the graph below you can see in the left global panel that the old version of ERSST (circles) had a 90% confidence interval (vertical line) that includes zero (not statistically significantly different from zero), while the confidence interval of updated dataset did not (statistically significant).

Did I mention that such a cherry-picked begin year is a very bad idea? The right statistical test is one for a trend change at an unknown year. This test provides no evidence whatsoever for a recent trend change.

That the trend in Karl and colleagues was statistically significant should thus not have mattered: Nothing could be worse than define a "hiatus" period as one were the confidence interval of a trend includes zero. However, this is the definition public speaker Christopher Monckton uses for his blog posts at Watts Up With That, a large blog of the mitigation skeptical movement. Short-term trends are very uncertain, their uncertainty increases very fast the shorter the period is. Thus if your period is short enough, you will find a trend whose confidence interval includes zero.

You should not do this kind of statistical test in the first place because of the inevitable cherry picking of the period, but if you want to statistically test whether the long-term trend suddenly dropped, the test should have the long-term trend as null-hypothesis. This is the 21st century, we understand the physics of man-made global warming, we know it should be warming, it would be enormously surprising and without any explanation if "global warming had stopped". Thus continued warming is the thing that should be disproven, not a flat trend line. Good luck doing so for such short periods given how enormously uncertain short-term trends are.

The large uncertainty also means that cherry picking a specific period to get a low trend has a large impact. I will show this numerically in an upcoming post. The methods to compute a confidence interval are for a randomly selected period, not for a period that was selected to have a low trend.

Concluding, we have something that does not exist, but which was made into an major talking point of the mitigation skeptical movement. This movement put their credibility on fluctuations that produced a minor short-term trend change that was not statistically significant. The deviation was also so small that it put an unfounded confidence in the perfection of the data.

The inevitable happened and small corrections needed to be made to the data. After this even disingenuous cherry-picking and bad statistics were no longer enough to support the talking point. As a consequence Lamar Smith of TX21 abused his Washington power to punish politically inconvenient science. Science that was confirmed this week. This should all have been politically irrelevant because the statistics were wrong all along. This was politically irrelevant by now because the new El Nino produced record temperatures in 2016 and even cherry picking 1998 as begin year is no longer enough.

"Much Ado About Nothing is generally considered one of Shakespeare's best comedies because it combines elements of mistaken identities, love, robust hilarity with more serious meditations on honour, shame, and court politics."
Yes, I get my culture from Wikipedia)

To end on a positive note, if your are interested in sea surface temperature and its uncertainties, we just published a review paper in the Bulletin of the American Meteorological Society: "A call for new approaches to quantifying biases in observations of sea-surface temperature." This focuses on ideas for future research and how the SST community can make it easier for others to join the field and work on improving the data.

Another good review paper on the quality of SST observations is: "Effects of instrumentation changes on sea surface temperature measured in situ" and also the homepage of HadSST is quite informative. For more information on the three main sea surface temperature datasets follow these links: ERSSTv4, HadSST3 and COBE-SST. Thanks to John Kennedy for suggesting the links in this paragraph.

Do watch the clear video below where Zeke Hausfather explains the study and why he thinks recent ocean warming used to be underestimated.

Related reading

The op-ed by the authors Kevin Cowtan and Zeke Hausfather is probably the best article on the study: Political Investigation Is Not the Way to Scientific Truth. Independent replication is the key to verification; trolling through scientists' emails looking for out-of-context "gotcha" statements isn't.

Scott K. Johnson in Ars Technica (a reading recommendation for science geeks by itself): New analysis shows Lamar Smith’s accusations on climate data are wrong. It wasn't a political plot—temperatures really did get warmer.

Phil Plait (Bad Astronomy) naturally has a clear explanation of the study and the ensuing political harassment: New Study Confirms Sea Surface Temperatures Are Warming Faster Than Previously Thought

The take of the UK MetOffice, producers of HadSST, on the new study and the differences found for HadSST: The challenge of taking the temperature of the world’s oceans

Hotwhopper is your explainer if you like your stories with a little snark: The winner is NOAA - for global sea surface temperature

Hotwhopper follow-up: Dumb as: Anthony Watts complains Hausfather17 authors didn't use FUTURE data. With such a response to the study it is unreasonable to complain about snark in the response.

The Christian Science Monitor gives a good non-technical summary: Debunking the myth of climate change 'hiatus': Where did it come from?

I guess it is hard for a journalist to not write that the topic is not important. Chris Mooney at the Washington Post claims Karl and colleagues is important: NOAA challenged the global warming ‘pause.’ Now new research says the agency was right.

Climate Denial Crock of the Week with Peter Sinclair: New Study Shows (Again): Deniers Wrong, NOAA Scientists Right. Quotes from several articles and has good explainer videos.

Global Warming ‘Hiatus’ Wasn’t, Second Study Confirms

The guardian blog by John Abraham: New study confirms NOAA finding of faster global warming

Atmospheric warming hiatus: The peculiar debate about the 2% of the 2%

No! Ah! Part II. The return of the uncertainty monster

How can the pause be both ‘false’ and caused by something?


Grant Foster and John Abraham, 2015: Lack of evidence for a slowdown in global temperature. US CLIVAR Variations, Summer 2015, 13, No. 3.

Zeke Hausfather, Kevin Cowtan, David C. Clarke, Peter Jacobs, Mark Richardson, Robert Rohde, 2017: Assessing recent warming using instrumentally homogeneous sea surface temperature records. Science Advances, 04 Jan 2017.

Boyin Huang, Viva F. Banzon, Eric Freeman, Jay Lawrimore, Wei Liu, Thomas C. Peterson, Thomas M. Smith, Peter W. Thorne, Scott D. Woodruff, and Huai-Min Zhang, 2015: Extended Reconstructed Sea Surface Temperature Version 4 (ERSST.v4). Part I: Upgrades and Intercomparisons. Journal Climate, 28, pp. 911–930, doi: 10.1175/JCLI-D-14-00006.1.

Thomas R. Karl, Anthony Arguez, Boyin Huang, Jay H. Lawrimore, James R. McMahon, Matthew J. Menne, Thomas C. Peterson, Russell S. Vose, Huai-Min Zhang, 2015: Possible artifacts of data biases in the recent global surface warming hiatus. Science. doi: 10.1126/science.aaa5632.

Lewandowsky, S., J. Risbey, and N. Oreskes, 2016: The “Pause” in Global Warming: Turning a Routine Fluctuation into a Problem for Science. Bull. Amer. Meteor. Soc., 97, 723–733, doi: 10.1175/BAMS-D-14-00106.1.