Last modified: 28 March 2017

#### Abstract

**ABSTRACT**

Econometric discrete choice analysis constitutes the underlying framework for analyzing demand for a variety of consumer commodities and services. For many decades, the discrete choice model employed was the multinomial logit (MNL) model (McFadden, 1974), which assumes a *single composite* independently and identically distributed or IID (across alternatives) random utility error term with a Gumbel (or Type I extreme-value) distribution. However, over the past two decades, it has become much more common place to acknowledge the presence of unobserved taste sensitivity in response to variables, as well as accommodate non-IID kernel error terms across alternatives. A general approach to do so is to use a multivariate normal kernel mixed with an appropriately distributed random coefficients vector, which we will label as the mixed multinomial probit (or mixed MNP) model.

An important consideration in the random multivariate mixing (random coefficients) distribution is to explicitly specify it in a way that is consistent with theoretical notions. In fact, the ability to do so is critical to the observation made by McFadden and Train (2000) that the mixed model (whether with an extreme value kernel or an MNP kernel) is capable of approximating any random utility maximization model. For example, it is possible that an analyst may want to specify a naturally bounded distribution (such as a log-normal distribution or a Rayleigh distribution) for cost and time coefficients in a travel choice model, so that the coefficients are strictly negative. Indeed, several studies (for example, Hensher *et al*., 2005, and Torres *et al*., 2011) have underscored the potentially serious misspecification consequences (in terms of theoretical considerations, data fit, as well as trade-off evaluations) of using an unbounded distribution (specifically the normal distribution). Besides, another issue with using an unbounded distribution that straddles the zero value for the cost coefficient is that it leads to a breakdown of the willingness to pay (WTP) calculations (see Daly *et al*., 2011).

In this paper, we propose a mixed multinomial probit model that is able to accommodate a general covariance structure for the kernel error terms as well as a very flexible continuous parametric multivariate structure for unobserved individual heterogeneity. The latter is introduced using a Gaussian copula approach that ties different continuous univariate mixing distributions into a joint multivariate distribution. The individual univariate mixing distributions can be bounded or unbounded, allowing the incorporation of theoretical considerations that require specific coefficients to span only the half-line. In addition, our proposed approach includes the case of independence across specific coefficients, allows a flexible and wide range of dependence across coefficients, and is easy to work with. The estimation of the model is achieved using a combination of the maximum simulated likelihood (MSL) technique (to accommodate the non-normal random coefficients) and Bhat’s MACML inference approach (to accommodate all the normal random coefficients as well as the kernel normal error structure; see Bhat, 2011). To our knowledge, this is the first time that a copula-based mixed MNP model has been proposed in the literature, along with an associated hybrid MSL-MACML inference approach that is ideally suited for the case when there are few non-randomly distributed coefficients (so that the MSL simulation does not involve very high dimensions) and many normally distributed coefficients (so that the MACML computational accuracy and efficiency can be realized). For the non-normal coefficients, the use of univariate distributions that have a closed-form inverse function facilitates quick estimation. Of these, we would particularly like to highlight our consideration of the power log-normal distribution that has not been considered earlier in discrete choice models. The advantage of this distribution relative to other distributions on the half-line (including the log-normal) is that it can both allow for substantial heterogeneity (large variance parameter) and also ensure that the skewed tail is relatively thin, which helps convergence.

We demonstrate the effectiveness of our inference approach through simulation exercises as well as an empirical application. The simulations involve cross-sectional choice data with a sample size of 3000, and two configurations of three random coefficients. The first includes two power log-normal coefficients and one normal coefficient, while the second considers one each of power log-normal, exponential and normal coefficients. Overall, the simulation results indicate that the proposed method allows for accurate parameter recovery. Further, the asymptotic standard errors from the method also quite closely reflect the finite sample standard deviations. One finding, however, is that it appears to be more difficult to recover the copula parameters characterizing the dependence between pairs of univariate margins, especially between pairs of non-normal univariate margins. Also, the simulation results suggest that distributions with very long tails (such as the exponential and lognormal) make it particularly difficult to recover variance parameters and corresponding copula parameters of dependence with other margins. However, even in these cases, the method performs quite well.

In the empirical application, we focus on repeated choice commute travel mode stated preference data collected in Austin, Texas. The results reiterate the importance of the power lognormal distribution as a strong contender (and alternative) to the traditional lognormal distribution and other bounded distributions for the travel cost coefficient. Additionally, data fit statistics indicate that a normal distribution for the travel time coefficient, which allows the possibility of positive utilities of travel time, does better than other bounded specifications for this coefficient, at least in the current empirical context with short-term daily travel mode choice data.

*Keywords*: copula, heterogeneity, MACML, multinomial probit, choice modeling.

** **

**References**

** **

Bhat, C.R. (2011). The maximum approximate composite marginal likelihood (MACML) estimation of multinomial probit-based unordered response choice models. *Transportation Research Part B*, 45(7), 923-939.

Daly, A., Hess, S., Train, K. (2011). Assuring finite moments for willingness to pay in random coefficient models. *Transportation*, 39(1), 19-31.

Hensher, D.A., Rose, J.M., Greene, W.H. (2005). *Applied Choice Analysis: A Primer*. Cambridge University Press, Cambridge, U.K.

McFadden, D. (1974). The measurement of urban travel demand. *Journal of Public Economics*, 3(4), 303-328.

McFadden, D., Train, K. (2000). Mixed MNL models for discrete response. *Journal of Applied Econometrics*,* *15(5), 447-470.

Torres, C., Hanley, N., Riera, A. (2011). How wrong can you be? Implications of incorrect utility function specification for welfare measurement in choice experiments. *Journal of Environmental Economics and Management*, 62(1), 111-121.