Thursday, December 5, 2013

Econometrics and "Big Data"

In this age of "big data" there's a whole new language that econometricians need to learn. Its origins are somewhat diverse - the fields of statistics, data-mining, machine learning, and that nebulous area called "data science".

What do you know about such things as:
  • Decision trees 
  • Support vector machines
  • Neural nets 
  • Deep learning
  • Classification and regression trees
  • Random forests
  • Penalized regression (e.g., the lasso, lars, and elastic nets)
  • Boosting
  • Bagging
  • Spike and slab regression?

Probably not enough!

If you want some motivation to rectify things, a recent paper by Hal Varian will do the trick. It's titled, "Big Data: New Tricks for Econometrics", and you can download it from here. Hal provides an extremely readable introduction to several of these topics.

He also offers a valuable piece of advice:
"I believe that these methods have a lot to offer and should be more widely known and used by economists. In fact, my standard advice to graduate students these days is 'go to the computer science department and take a class in machine learning'."
Interestingly, my son (a computer science grad.) "audited" my classes on Bayesian econometrics when he was taking machine learning courses. He assured me that this was worthwhile - and I think he meant it! Apparently there's the potential for synergies in both directions.


© 2013, David E. Giles