Friday, September 22, 2017

Misclassification in Binary Choice Models

Several years ago I wrote a number of posts about Logit and Probit models, and the Linear Probability Model LPM). One of those posts (also, see here) dealt with the problems that arise if you mis-classify the dependent variable in such models.  That is, in the binary case, if some of your "zeroes" should be "ones", and/or vice versa.

In a conventional linear regression model, measurement errors in the dependent variable are not a biog deal. However, the situation is quite different with Logit, Probit, and the LPM.

This issue is taken up in detail in an excellent, recent, paper by Meyer and Mittag (2017), and I commend their paper to you.

To give you an indication of what those authors have to say, this is from their Introduction:
".....the literature has established that misclassification is pervasive and affects estimates, but not how it affects them or what can still be done with contaminated data. This paper characterizes the consequences of misclassification of the dependent variable in binary choice models and assesses whether substantive conclusions can still be drawn from the observed data and if so, which methods to do so work well. We first present a closed form solution for the bias in the linear probability model that allows for simple corrections. For non-linear binary choice models such as the Probit model, we decompose the asymptotic bias into four components. We derive closed form expressions for three bias components and an equation that determines the fourth component. The formulas imply that if misclassification is conditionally random, only the probabilities of misclassification are required to obtain the exact bias in the linear probability model and an approximation in the Probit model. If misclassification is related to the covariates, additional information on this relation is required to assess the (asymptotic) bias, but the results still imply a tendency for the bias to be in the opposite direction of the sign of the coefficient."
This paper includes a wealth of information, including some practical guidelines for practitioners.


Meyer, B. D. and N. Mittag, 2017. Misclassification in binary choice models. Journal of Econometrics, 200, 295-311.

© 2017, David E. Giles

Wednesday, September 20, 2017

Monte Carlo Simulations & the "SimDesign" Package in R

Past posts on this blog have included several relating to Monte Carlo simulation - e.g., see here, here, and here.

Recently I came across a great article by Matthew Sigal and Philip Chalmers in the Journal of Statistics Education. It's titled, "Play it Again: Teaching Statistics With Monte Carlo Simulation", and the full reference appears below.

The authors provide a really nice introduction to basic Monte Carlo simulation, using R. In particular, they contrast using a "for loop" approach, with using the "SimDesign" R package (Chalmers, 2017). 

Here's the abstract of their paper:
"Monte Carlo simulations (MCSs) provide important information about statistical phenomena that would be impossible to assess otherwise. This article introduces MCS methods and their applications to research and statistical pedagogy using a novel software package for the R Project for Statistical Computing constructed to lessen the often steep learning curve when organizing simulation code. A primary goal of this article is to demonstrate how well-suited MCS designs are to classroom demonstrations, and how they provide a hands-on method for students to become acquainted with complex statistical concepts. In this article, essential programming aspects for writing MCS code in R are overviewed, multiple applied examples with relevant code are provided, and the benefits of using a generate–analyze–summarize coding structure over the typical “for-loop” strategy are discussed."
The SimDesign package provides an efficient, and safe template for setting pretty much any Monte Carlo experiment that you're likely to want to conduct. It's really impressive, and I'm looking forward to experimenting with it.

The Sigal-Chalmers paper includes helpful examples, with the associated R code and output. It would be superfluous for me to add that here.

Needless to say, the SimDesign package is just as useful for simulations in econometrics as it is for those dealing with straight statistics problems. Try it out for yourself!


Chalmers, R. P., 2017. SimDesign: Structure for Organizing Monte Carlo Simulation Designs, R package version 1.7.

M. J. Sigal and R. P. Chalmers, 2016. Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24, 136-156.

© 2017, David E. Giles

Sunday, September 10, 2017

Econometrics Reading List for September

A little belatedly, here is my September reading list:
  • Benjamin, D. J. et al., 2017. Redefine statistical significance. Pre-print.
  • Jiang, B., G. Athanasopoulos, R. J. Hyndman, A. Panagiotelis, and F. Vahid, 2017. Macroeconomic forecasting for Australia using a large number of predictors. Working Paper 2/17, Department of Econometrics and Business Statistics, Monash University.
  • Knaeble, D. and S. Dutter, 2017. Reversals of least-square estimates and model-invariant estimations for directions of unique effects. The American Statistician, 71, 97-105.
  • Moiseev, N. A., 2017. Forecasting time series of economic processes by model averaging across data frames of various lengths. Journal of Statistical Computation and Simulation, 87, 3111-3131.
  • Stewart, K. G., 2017. Normalized CES supply systems: Replication of Klump, McAdam and Willman (2007). Journal of Applied Econometrics, in press.
  • Tsai, A. C., M. Liou, M. Simak, and P. E. Cheng, 2017. On hyperbolic transformations to normality. Computational Statistics and Data Analysis, 115, 250-266,

© 2017, David E. Giles

Monday, July 31, 2017

My August Reading List

Here are some suggestions for you:
  • Calzolari, G., 2017. Econometrics exams and round numbers: Use or misuse of indirect estimation methods? Communications in Statistics - Simulation and Computation, in press.
  • Chakraborti, S., F. Jardim, & E. Epprecht, 2017. Higher order moments using the survival function: The alternative expectation formula. American Statistician, in press.
  • Clarke, J. A., 2017. Model averaging OLS and 2SLS: An application of the WALS procedure. Econometrics Working Paper EWP1701, Department of Economics, University of Victoria.
  • Hotelling, H., 1940. The teaching of statistics, Annals of Mathematical Statistics, 11, 457-470.
  • Knaeble, B. & S. Dutter, 2017. Reversals of least-square estimates and model-invariant estimation for directions of unique effects. American Statistician, 71, 97-105.
  • Megerdichian, A., 2017. Further results on interpreting coefficients in regressions with a logarithmic dependent variable. Journal of Econometric Methods, in press.

© 2017, David E. Giles

Wednesday, July 12, 2017

The Bandwidth for the KPSS Test

Recently, I received an email from a follower of this blog, who asked:
"May I know what is the difference between the bandwidth of Newey-West and Andrews for the KPSS test. It is because when I test the variable with Newey-West, it is I(2), but then I switch the bandwidth to Andrews, it becomes I(1)."
First of all, it's worth noting that the unit root and stationarity tests that we commonly use can be very sensitive to the way in which they're constructed and applied. An obvious example arises with the choice of the maximum lag length when we're using the Augmented Dickey-Fuller test. Another example would be the treatment of the drift and trend components when using that test, So, the situation that's mentioned in the email above is not unusual, in general terms.

Now, let's look at the specific question that's been raised here.

Saturday, July 1, 2017

Canada Day Reading List

I was tempted to offer you a list of 150 items, but I thought better of it!

  • Hamilton, J. D., 2017. Why you should never use the Hodrick-Prescott filter. Mimeo., Department of Economics, UC San Diego.
  • Jin, H. and S. Zhang, 2017. Spurious regression between long memory series due to mis-specified structural breaks. Communications in Statistics - Simulation and Computation, in press.
  • Kiviet, J. F., 2016. Testing the impossible: Identifying exclusion restrictions.Discussion Paper 2016/03, Amsterdam School of Economics, University of Economics.
  • Lenz, G. and A. Sahn, 2017. Achieving statistical significance with covariates. BITSS Preprint (H/T  Arthur Charpentier)
  • Sephton, P., 2017. Finite sample critical values of the generalized KPSS test. Computational Economics, 50, 161-172.
© 2017, David E. Giles

Monday, June 26, 2017

Recent Developments in Cointegration

Recently, I posted about a special issue of the journal, Econometrics, devoted to "Unit Roots and Structural Breaks".

Another recent special issue of that journal will be of equal interest to readers of this blog. Katerina Juselius has guest- edited an issue titles, "Recent Developments in Cointegration". The papers published so far in this issue are, of course, open-access. Check them out!

© 2017, David E. Giles

Sunday, June 25, 2017

Instrumental Variables & the Frisch-Waugh-Lovell Theorem

The so-called Frisch-Waugh-Lovell (FWL) Theorem is a standard result that we meet in pretty much any introductory grad. course in econometrics.

The theorem is so-named because (i) in the very fist volume of Econometrica Frisch and Waugh (1933) established it in the particular context of "de-trending" time-series data; and (ii) Lovell (1963) demonstrated that the same result establishes the equivalence of "seasonally adjusting" time-series data (in a particular way), and including seasonal dummy variables in an OLS regression model. (Also, see Lovell, 2008.)

We'll take a look at the statement of the FWL Theorem in a moment. First, though, it's important to note that it's purely an algebraic/geometric result. Although it arises in the context of regression analysis, it has no statistical content, per se.

What's not generally recognized, however, is that the FWL Theorem doesn't rely on the geometry of OLS. In fact, it relies on the geometry of the Instrumental Variables (IV) estimator - of which OLS is a special case, of course. (OLS is just IV in the just-identified case, with the regressors being used as their own instruments.)

Implicitly, this was shown in an old paper of mine (Giles, 1984) where I extended Lovell's analysis to the context of IV estimation. However, in that paper I didn't spell out the generality of the FWL-IV result.

Let's take a look at all of this.

Friday, June 23, 2017

Unit Roots & Structural Breaks

The open-access journal, Econometrics (of which I'm happy to be an Editorial Board member), has recently published a special issue on the topic of "Unit Roots and Structural Breaks". 

This issue is guest-edited by Pierre Perron, and it includes eight really terrific papers. You can find the special issue here.

© 2017, David E. Giles

Wednesday, June 7, 2017

Marc Bellemare on "How to Publish in Academic Journals"

If you don't follow Marc Bellemare's blog, you should do.

And if you read only one other blog post this week, it should be this one from Marc, titled, "How to Publish in Academic Journals". Read his slides that are linked in the post.

Great advice that is totally applicable to anyone doing research in econometrics - theory or applied.

© 2017, David E. Giles