The unemployment numbers for the United States came out yesterday and they seemed internally inconsistent. Payrolls increased by only 36,000 (a significant disappointment, since payrolls need to increase by 200,000 or more to make a dent in unemployment) but the unemployment rate dropped from 9.4% to 9%. The news, though, gives me a chance to talk about one of my favorite topics: sampling, statistics and standard error.
Staying on the unemployment numbers, it is worth examining how they are computed. The best source is the Bureau of Labor Statistics (BLS), which provides details on how it computes the numbers. What makes the unemployment numbers interesting is that the two numbers (payrolls and unemployment rate) are based upon different samples.
Unemployment rate: Here is the description of how the BLS computes this number:
http://www.bls.gov/cps/cps_htgm.htm
As the BLS points out, it uses a sample of 60,000 households, translating into about 110,000 individuals, to estimate the number of employed and unemployed people in the nation and computes the rate based upon that sample. It uses interviewers to classify these individuals into three groups -those that are employed, those that are unemployed and those that are not part of the work force (they are not employed but are not looking for work either).
Unemployment rate = Unemployed / (Employed + Unemployed)
Those who are not looking for work don't get counted.
Payroll: The BLS describes how it computes payroll numbers here:
http://www.bls.gov/bls/empsitquickguide.htm
This a survey of 140,000 businesses, with roughly 440,000 work sites, with adjustments for new businesses starting up and businesses exiting.
(Kudos to the BLS for transparency. They do an excellent job describing what they do as well as how and why they do it.)
If the last month, if the survey is to be believed, almost a half a million people decided to leave the workforce, shrinking the denominator. Thus, the abrupt drop in the unemployment rate. Since few people believe that this big a change could have occurred in January, we are seeing questions being raised and answers offered:
1. Is the sample size large enough?
Absolutely. As samples goes, these are both huge samples. To provide a contrast, the typical sample size for the polls that we see around presidential elections is 1000-2000.
2. Is there sampling bias?
This is a concern with any test based upon a sample. If the sample is not representative of the population, the results cannot be generalized. Thus, if the sample of households used by the Labor department has a disproportionate number of college graduates or people from the Mountain States or people between the ages of 45-60, the results will be skewed. Unless I see evidence to the contrary, I will continue to believe that the Labor department has unbiased and competent statisticians on its staff to ensure that there is no sampling bias.
3. How much sampling noise is there?
Even with a large sample size and no sampling bias, the results from a sample will have a standard error, i.e., a range on the estimated number. That standard error will be a function of the volatility in the underlying data. In periods like the last two and a half years, where the labor market has been in tumult, there is every reason to believe that the unemployment rate is being estimated with more error than it was in more stable time periods. There can be other sources for the noise as well. One culprit being pointed to is the weather, with some economists claiming that the terrible weather has affected employment statistics in some sectors (such as construction). This may explain the payroll data and the employed/unemployed numbers. Unless it also had an impact on data gathering as well, I don't see how this explains the surge in the number of people who have left the employment pool.
4. What post-sampling adjustments are being made to the number that may affect the reported number?
The Labor department does not report the raw numbers that it gets from its survey.
* It makes seasonal adjustments to reflect the "normal" ups and downs in employment. In December, for instance, it adjusts the employment rate to reflect the increase in part-time jobs before Christmas; thus, an increase of 100,000 jobs on the raw data might be seasonally adjusted to become an increase of only 20,000 jobs in the report. To the extent that the seasonal adjustments are incorrect or imprecise, they can cause the unemployment rate to be volatile.
* The other number that the Labor department reports is revisions to previous months' estimates. Presumably, some of the respondents being interviewed provide information that leads to a reassessment of both employment data (into employed, unemployed and not in the work force) and payroll data in prior months.
5. Can I trust the results from the sample?
Any estimate that comes from a sample has to be viewed as such: an estimate and not a fact. I believe that the employment picture is improving but it is doing so slowly. I would not be surprised to see the number of payroll jobs increase by a lot more in February but see the unemployment rate go up with it.
As a general rule, these are the questions you should ask about most assertions that you see made about economics, health and general culture. As access to data improves and the number of data-snoopers multiplies, we are bombarded every day with more, and often contradictory, findings: workers are becoming more productive (or is it less productive?).. drink more wine for good health (or is it stop drinking all together?).. Take a deep breath and resolve to do the following on the next statistic or study that you encounter:
1. Check for bias in the source of the study. A study by a gun-control group that guns increase violence should be viewed with just as much skepticism as a study by the NRA that improving access to guns makes you safer.
2. Do not over react to any single statistic (or study). It may just reflect statistical noise. The problem will be magnified if you have small samples and are measuring a variable that is volatile or difficult to measure.
3. Look for confirmation in independent studies or assessments. With unemployment, for instance, the government does take multiple shots at getting it right. You have the unemployment claims that are estimated every Thursday, the payroll numbers and the unemployment rates. You also have estimates from private sources: ADP estimates the number of jobs created each month by private businesses and reports it just before the government reports the unemployment rate. I will feel more sanguine about US employment when I see all the numbers start moving in the same direction.
4. Statistical significance does not always equate to real world significance. There are a lot of findings that are statistically significant that matter very little in the real world. This is especially so with large sample studies, where small changes in a variable can be statistically significant.
I know it is asking too much of reporters and researchers to be transparent: report sample sizes, standard errors and any information that may reflect your bias. On my part, I will try to do better on this dimension. However, as consumers, we need to be more skeptical about data and wary about generalizations.
Staying on the unemployment numbers, it is worth examining how they are computed. The best source is the Bureau of Labor Statistics (BLS), which provides details on how it computes the numbers. What makes the unemployment numbers interesting is that the two numbers (payrolls and unemployment rate) are based upon different samples.
Unemployment rate: Here is the description of how the BLS computes this number:
http://www.bls.gov/cps/cps_htgm.htm
As the BLS points out, it uses a sample of 60,000 households, translating into about 110,000 individuals, to estimate the number of employed and unemployed people in the nation and computes the rate based upon that sample. It uses interviewers to classify these individuals into three groups -those that are employed, those that are unemployed and those that are not part of the work force (they are not employed but are not looking for work either).
Unemployment rate = Unemployed / (Employed + Unemployed)
Those who are not looking for work don't get counted.
Payroll: The BLS describes how it computes payroll numbers here:
http://www.bls.gov/bls/empsitquickguide.htm
This a survey of 140,000 businesses, with roughly 440,000 work sites, with adjustments for new businesses starting up and businesses exiting.
(Kudos to the BLS for transparency. They do an excellent job describing what they do as well as how and why they do it.)
If the last month, if the survey is to be believed, almost a half a million people decided to leave the workforce, shrinking the denominator. Thus, the abrupt drop in the unemployment rate. Since few people believe that this big a change could have occurred in January, we are seeing questions being raised and answers offered:
1. Is the sample size large enough?
Absolutely. As samples goes, these are both huge samples. To provide a contrast, the typical sample size for the polls that we see around presidential elections is 1000-2000.
2. Is there sampling bias?
This is a concern with any test based upon a sample. If the sample is not representative of the population, the results cannot be generalized. Thus, if the sample of households used by the Labor department has a disproportionate number of college graduates or people from the Mountain States or people between the ages of 45-60, the results will be skewed. Unless I see evidence to the contrary, I will continue to believe that the Labor department has unbiased and competent statisticians on its staff to ensure that there is no sampling bias.
3. How much sampling noise is there?
Even with a large sample size and no sampling bias, the results from a sample will have a standard error, i.e., a range on the estimated number. That standard error will be a function of the volatility in the underlying data. In periods like the last two and a half years, where the labor market has been in tumult, there is every reason to believe that the unemployment rate is being estimated with more error than it was in more stable time periods. There can be other sources for the noise as well. One culprit being pointed to is the weather, with some economists claiming that the terrible weather has affected employment statistics in some sectors (such as construction). This may explain the payroll data and the employed/unemployed numbers. Unless it also had an impact on data gathering as well, I don't see how this explains the surge in the number of people who have left the employment pool.
4. What post-sampling adjustments are being made to the number that may affect the reported number?
The Labor department does not report the raw numbers that it gets from its survey.
* It makes seasonal adjustments to reflect the "normal" ups and downs in employment. In December, for instance, it adjusts the employment rate to reflect the increase in part-time jobs before Christmas; thus, an increase of 100,000 jobs on the raw data might be seasonally adjusted to become an increase of only 20,000 jobs in the report. To the extent that the seasonal adjustments are incorrect or imprecise, they can cause the unemployment rate to be volatile.
* The other number that the Labor department reports is revisions to previous months' estimates. Presumably, some of the respondents being interviewed provide information that leads to a reassessment of both employment data (into employed, unemployed and not in the work force) and payroll data in prior months.
5. Can I trust the results from the sample?
Any estimate that comes from a sample has to be viewed as such: an estimate and not a fact. I believe that the employment picture is improving but it is doing so slowly. I would not be surprised to see the number of payroll jobs increase by a lot more in February but see the unemployment rate go up with it.
As a general rule, these are the questions you should ask about most assertions that you see made about economics, health and general culture. As access to data improves and the number of data-snoopers multiplies, we are bombarded every day with more, and often contradictory, findings: workers are becoming more productive (or is it less productive?).. drink more wine for good health (or is it stop drinking all together?).. Take a deep breath and resolve to do the following on the next statistic or study that you encounter:
1. Check for bias in the source of the study. A study by a gun-control group that guns increase violence should be viewed with just as much skepticism as a study by the NRA that improving access to guns makes you safer.
2. Do not over react to any single statistic (or study). It may just reflect statistical noise. The problem will be magnified if you have small samples and are measuring a variable that is volatile or difficult to measure.
3. Look for confirmation in independent studies or assessments. With unemployment, for instance, the government does take multiple shots at getting it right. You have the unemployment claims that are estimated every Thursday, the payroll numbers and the unemployment rates. You also have estimates from private sources: ADP estimates the number of jobs created each month by private businesses and reports it just before the government reports the unemployment rate. I will feel more sanguine about US employment when I see all the numbers start moving in the same direction.
4. Statistical significance does not always equate to real world significance. There are a lot of findings that are statistically significant that matter very little in the real world. This is especially so with large sample studies, where small changes in a variable can be statistically significant.
I know it is asking too much of reporters and researchers to be transparent: report sample sizes, standard errors and any information that may reflect your bias. On my part, I will try to do better on this dimension. However, as consumers, we need to be more skeptical about data and wary about generalizations.
0 comments:
Post a Comment