ESSAYS ON INCOME INEQUALITY AND PUBLIC POLICY By Christopher Fowler A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Economics – Doctor of Philosophy 2020 ABSTRACT ESSAYS ON INCOME INEQUALITY AND PUBLIC POLICY By Christopher Fowler Chapter one determines the properties of the optimal tax function when there is rent-seeking in the labor market. Rent-seeking in the labor market refers to unproductive effort expended in order to increase compensation. With rent-seeking effort expended by high skill workers, low skill workers face reduced wages because firms, in a competitive market, face a zero profits condition. Firms are able to respond to rent-seeking by increasing the number of high skill workers hired, reducing their productivity and wages. The government’s optimal tax function increases marginal and average taxes on high skill workers. While low skill workers face lower marginal and average tax rates. The government, therefore, wishes to redistribute income primarily through post-tax income rather than through manipulating the distribution of pre-tax income. Chapter two looks at the effect of both intensive and extensive margin labor supply on the optimal tax function. The model combines a static search labor market model with a classical labor supply model. By combining these two models, the optimal tax function will balance incentives for working more hours and incentives for searching for work. The tax function provides insight into how the government should balance redistribution and efficiency when workers can potentially be unemployed for long periods of time. The resulting tax function increases the marginal tax rate over the Mirrlees (1971) model. This increase is due to the government’s ability to decrease the wages of workers which increases the general equilibrium probability of employment for workers. Chapter three investigates the effect of uneven internal migration by skill on the income inequality in local labor markets. Migrant moving within the US are more educated than workers who stay in their local labor market. We would expect to see income inequality to decrease in locations that experience more migration. However, we don’t see this effect. Chapter one investigates this phenomenon using data from the American Community Survey (ACS). The ACS records information on income, education, and migration patterns and is a yearly representative sample of the US population. To causally estimate the effect of differing rates of migration by skill, a shift-share instrument is constructed. This instrument creates a predicted amount of migration based on historical migration patterns. The instrument seems to work well and does not appear to correlated with labor demand shocks. The main results are that income inequality increases when there are more college educated workers moving than non-college educated workers. ACKNOWLEDGEMENTS A number of people deserve acknowledgement for helping me complete this dissertation. First, I would like to express my sincere gratitude to my main advisor, Jay Wilson, for his guidance over the past 4 years. He provided me direction when I was stuck on a problem and he helped me understand some of the not-so-obvious ideas in optimal tax theory. Oren Ziv provided excellent guidance and support for my first chapter. Without his help, I would not have gone on the job market this past fall. Carl Davidson helped to greatly improve my second chapter with his knowledge of search labor markets. He also helped me write my tax theory chapters for a broader audience than just other tax theorists. Finally, Matt Grossmann helped give me the perspective of a non-economist reading my work. I would also like to thank Steven Haider and Todd Elder for their help in navigating the doctoral program. Ron Fisher helped me immensely with teaching my class in the HRLR department and when completing the North Dakota project together. Being Ron’s teaching assistant was always a pleasure as well. Mike Conlin helped me improve my job market paper and navigate the job market itself. And I also thank all the economics department staff (Lori Jean Nichols, Margaret Lynch, Belen Feight, and Jay Feight) who make the department run. I have been lucky to have some great friends who supported me for the last 6 years. The Berkey Basement crew of Riley Acton, Hannah Gabrielle, Luke Watson, Gabrielle Pepin, Bethany Lemont, and Cody Orr helped keep me on track. My writing group of Anicca Cox, Alyssa Harben, and Kathy Kim provided great writing advice to me and helped me become a better writer. And big thanks to my Macalester College crew who always help each other out even if we are three continents apart. Finally, I want to thank my family. Deepest thanks to my parents for always talking with me and helping me become an adult. I could not have gotten through the last sixth years without my partner, Leah Billingham. You helped get through the tough spots and you were always willing to listen to me. Finally, I have to thank my cat, Minerva. iv TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTER 1 OPTIMAL TAXATION AND RENT-SEEKING . . . . . . . . . . . . 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Rent-Seeking Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Nonlinear Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.1 Numerical Results CHAPTER 2 OPTIMAL TAXATION AND SEARCH WITH INTENSIVE MAR- GIN LABOR SUPPLY . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 2.2 Past Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Model Individual’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 2.3.2 Firm’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Matching and Bargaining 2.4 Elasticity Rule for the Top Marginal Rate . . . . . . . . . . . . . . . . . . . 2.5 Three Type Second Best Equilibrium . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Discrete Mirrlees Model . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Top Two Types Search . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 A Numerical Example . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Constrained First Best Equilibrium . . . . . . . . . . . . . . . . . . . . . . . 2.7 Decentralizing the Continuous Direct Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.1 Continuous Type Mirrlees Model Incentive Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.2 Solving the Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7.3 2.7.4 Implementing the Mechanism . . . . . . . . . . . . . . . . . . . . . . 2.7.5 Analytical Properties of the Tax Function . . . . . . . . . . . . . . . 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHAPTER 3 BRAIN DRAIN IN THE US: THE EFFECT OF UNEVEN MIGRA- TION ON INCOME INEQUALITY . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Theoretical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Model Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 6 11 17 19 22 22 25 28 30 31 31 33 42 43 46 49 50 56 57 58 60 62 64 68 70 70 74 77 79 84 v 86 3.5.1 The Shift-Share Instrument . . . . . . . . . . . . . . . . . . . . . . . 90 3.5.2 First Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.5.3 Changes to Local Income Inequality . . . . . . . . . . . . . . . . . . . 99 3.5.4 Assessing the Plausibility of the Supply Shift . . . . . . . . . . . . . . 3.5.5 Extending the Theoretical Model to Incorporate the Results . . . . . 102 3.5.6 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 APPENDIX A APPENDIX TO CHAPTER 1 . . . . . . . . . . . . . . . . . . 109 APPENDIX B APPENDIX TO CHAPTER 3 . . . . . . . . . . . . . . . . . . 112 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 vi LIST OF TABLES Table 1.1: Parameter Values and Function Forms . . . . . . . . . . . . . . . . . . . . 17 Table 1.2: Numerically Calculated Tax Rates . . . . . . . . . . . . . . . . . . . . . . 18 Table 2.1: Top marginal tax rates that vary with the elasticities. The income share of output is z = yθ/θhθ and is assumed to be constant over θ above the threshold. The table assumes that a = 2.0 and g = 0. Both of these assumptions are well within the standard range of acceptable values for both parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Table 2.2: Three type model parameter values. . . . . . . . . . . . . . . . . . . . . . 50 Table 2.3: Equilibrium tax and transfer system. . . . . . . . . . . . . . . . . . . . . . 50 Table 3.1: Reduced Form Estimates of the Change in the Skill Premium . . . . . . . 85 Table 3.2: Exogeneity Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Table 3.3: Base-Year Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 Table 3.4: First Stage Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Table 3.5: Change in Skill Premium . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Table 3.6: Change in High Skill Income . . . . . . . . . . . . . . . . . . . . . . . . . 95 Table 3.7: Change in Low Skill Income . . . . . . . . . . . . . . . . . . . . . . . . . 96 Table 3.8: Migration Bias and Incomes by Occupation . . . . . . . . . . . . . . . . . 98 Table 3.9: Change in Income by Percentile . . . . . . . . . . . . . . . . . . . . . . . 99 Table 3.10: Change in Skill Premium with Different Skill Definitions . . . . . . . . . . 105 Table B.1: Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Table B.2: Movers versus Stayers, summary statistics . . . . . . . . . . . . . . . . . . 116 Table B.3: Change in High Skill Unemployment . . . . . . . . . . . . . . . . . . . . . 116 Table B.4: Change in Low Skill Unemployment . . . . . . . . . . . . . . . . . . . . . 117 vii Table B.5: Relative Employment and Relative Migration Responses . . . . . . . . . . 117 Table B.6: Migration Responses by Skill . . . . . . . . . . . . . . . . . . . . . . . . . 117 viii LIST OF FIGURES Figure 2.1: Timing of decisions by government, firms, and individuals. . . . . . . . . 29 Figure 3.1: Kernel density of age for migrants and stayers. . . . . . . . . . . . . . . . 104 Figure A.1: The black areas represent parameter values for which an equilibrium exists.111 Figure B.1: PUMAs, MIGPUMAs, and CZs. MIGPUMA 1 is PUMA 1, 2, 3, and 4 combined. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Figure B.2: National migration rates by state and CZ, broken down by year. The graph is not broken down by education group. . . . . . . . . . . . . . . . 115 Figure B.3: Spatial distribution of the skill premium across CZs. . . . . . . . . . . . 118 ix CHAPTER 1 OPTIMAL TAXATION AND RENT-SEEKING 1.1 Introduction Income inequality has been dramatically risen in the past three decades in most developed countries. The share of income accruing to the bottom 50% of US households has declined by 8% from 1980 to 2014 (Piketty et al., 2016). At the same time, the top one percent of earners has seen a roughly equal rise in their share of national income. The rise in top incomes has not been met with an increase in GDP growth (Piketty et al., 2014). Recent data reconciling tax returns and national accounts data suggest that little of the increase in pre-tax income inequality is offset by government policy. The original purpose of labor income taxation, going back to Mirrlees (1971), is to re- distribute from high income individuals to low income individuals in the most efficient way possible. The extent of redistribution was limited by the disincentive effects of imposing the tax. Many optimal tax models use a labor market where wages are fixed, exogenous, equal to marginal products. However, in richer models of the labor market, wages can be endogenous and change with the behavior of workers (or firms). Workers, particularly high skill ones, have bargaining power over their wages. One way the bargaining power of high skill workers can manifest is through rent-seeking in the labor market. Rent-seeking in the labor market is when individuals exert unproductive effort to increase the private return to labor. By increasing the private return to labor for one type of worker, other types of work- ers are harmed. To the workers hurt by rent-seeking, the effect is equivalent to a negative externality. Workers helped by rent-seeking, however, know their actions will increase their return to labor; hence, rent-seeking is unlike a traditional externality. Since rent-seeking is similar to an externality, the income tax has a corrective role as well as a redistributive role. The income tax is balancing three different goals: efficiency of those who are helped by 1 rent-seeking, efficiency of those who are hurt by rent-seeking, and redistribution from high income to low income. The government will not be able to perfectly correct the rent-seeking because it must trade off between each goal because it must harm workers engaging in rent- seeking. Increasing the marginal tax on high types, the government redistributes income from high to low types and corrects the externality but hurts the high types by decreasing labor supply and wages. Rent-seeking can have two manifestations based on who rent-seeking hurts and their rela- tion to who is doing the rent-seeking. Vertical rent-seeking occurs when workers with higher (lower) attributes, such as productivity or bargaining power, impose a negative externality on workers with lower (higher) attributes (Piketty et al., 2014; Lawson, 2015). Horizontal rent-seeking is when workers of similar attributes impose a negative externality on each other (Rothschild and Scheuer, 2016). With horizontal rent-seeking, individuals who appear simi- lar to the government can make very different responses to changes in the tax function. The different responses come about because individuals differ in unobservable decisions, such as the type of labor effort. Workers who engage in rent-seeking are able to increase their incomes by expending effort to increase their return to labor. In the model laid out in this chapter, high skill workers choose how to divide their effort between productive labor and rent-seeking effort. The firm’s zero profit condition determines the wage of the low skill worker, who take this wage as given. High skill workers rent-seeking hurts the low skill workers throught lower wages; otherwise, all firms would exit the market. Firms can counteract high skill worker’s rent-seeking by hiring more high skill workers which decreases the marginal product of high skill workers and reduces rent-seeking effort. Firms are constrained by being utility takers in the market; they must give high skill workers the equilibrium utility. Rent-seeking causes the government to tax high skill workers more than in the Stiglitz (1982) endogenous wage economy.1 The government increases the marginal and average tax 1The Stiglitz (1982) model is similar to the Mirrlees (1971) model except that wages are 2 rates on the high types in order to reduce the amount of rent-seeking effort. With higher marginal tax rates, the government is not using the tax system to reduce pre-tax income inequality. Rather, the government redistributes post-tax income to low skill workers. 1.2 Literature Review The literature on taxation and rent-seeking implements rent-seeking as wasteful effort that increases the return to labor; hence, wages are endogenous and not all workers are price-takers. Stiglitz (1982) begins the literature on taxation with endogenous wages.2 He calculates the optimal tax in a model with two types of workers and endogenous wages based on a constant returns to scale production function. The ratio of low to high skilled workers’ labor supply determines the wages of each worker type. The optimal tax function depends on the degree of substitutability of high skill to low skill labor. If high and low skilled labor are perfect substitutes then the optimal income tax is zero for the high skilled worker. But, if the two types of labor are imperfectly substitutable then the optimal top marginal rate is less than zero. The intuition is that the government uses marginal tax rates to change the pre-tax incomes of both types. With imperfect substitutability, the government can increase high skill labor supply with a negative marginal tax rate. High labor supply decreases high skill wages and increases low skill wages which create redistribution through the pre-tax income distribution. A simple way to implement the rent-seeking externality is to use a pollution externality where rent-seeking effort creates a reduction in every one else’s return to labor. Rothschild and Scheuer (2016) use a two sector Roy model where one sector has workers engagin in rent- seeking. Their conception of rent-seeking is akin to a pollution externality (Piketty et al., 2014). Workers in the rent-seeking sector are productive but their labor also causes the return labor in both sectors to decrease, similar in spirit to rent-seeking. The Pigouvian correction is complicated because of opportunities for switching sectors. Taxing labor effort reduces endogenous. 2Allen (1982) characterizes the optimal linear tax function. 3 labor supplied to both sectors but by different amounts. The amount varies depending on the change in relative returns to labor between the two sectors. If the decrease in the return to effort in the traditional sector is greater than the decrease in the return to effort in the rent-seeking sector then there will be a shift in effort towards the rent-seeking sector. In response, marginal tax rates should decrease in order to reduce the relative amount of labor supplied to the rent-seeking sector. Without being able to observe sector choice by the worker, the government cannot perfectly correct the externality and can potentially create more rent-seeking with increases in the marginal tax rate; since the return to the traditional sector is more elastic with respect to the marginal tax rate. The only cases where the optimal correction with perfect targeting is the same as the optimal correction without perfect targeting occurs when either relative returns between sectors is fixed or effort ratio is fixed (elasticity of substitution between sector is zero). The pollution externality can be put into a continuous screening to derive the optimal nonlinear tax results. The imperfect targeting correction, where sector is unobservable to the government, differs from the perfect targeting correction, where sector is observable, whenever relative returns between the two sectors are not fixed. The imperfect correction follows the two-step method outlined in Kopczuk (2003). The Pigouvian correction is applied and then the optimal nonlinear tax rule is applied on the corrected income. When a majority of the high skill workers are in the rent-seeking sector then the marginal tax rate can be decreasing in skill at the top end of the distribution (Rothschild and Scheuer, 2016). Rothschild and Scheuer’s model is a combination of both vertical and horizontal rent- seeking where most of the problems come from the horizontal aspects. Workers with the same total income might have very different skills in the two sectors. An increase in the marginal tax rate could cause one of the workers to switch sector but the government does not know which worker will switch and which sector the worker started in. There is some degree of vertical rent-seeking in the Rothschild and Scheuer model but vertical rent-seeking is not the main concern in the model. 4 Most of the unproductive labor efforts in the economy are likely from high income earners (Rothschild and Scheuer, 2016; Piketty et al., 2014; Boadway, 2015). Therefore, we should look at the ways in which high income earners can increase their wages above marginal product. Piketty et. al. (2014) focus on the effect of bargaining by high income individuals. There are a few ways that wages can differ from marginal products in Piketty et al. (2014). First, search and matching can be inefficient in the sense that the Hosios (1990) condition is not met. Second, for many high income occupations, marginal products are unknown or are only known to be in an interval. High skill workers have wider latitude to negotiate higher wages because the firm is unsure of the output that a high skill worker produces.3 Their model uses a reduced form model of bargaining where high skilled workers are able to exert effort in order to increase their return to labor. The bargaining model can rationalize both reasons for compensation about output. The government can observe the aggregate amount of income that is generated by rent-seeking but cannot observe individual rent-seeking. Since the extra income earned by rent-seeking is observable by the government, the optimal tax function is able to perfectly remove this income and redistribute it to the rest of the workers by increasing the demogrant. Lawson (2015) looks at how managers can hire more workers than optimal in order to decrease the wages of workers while increasing the wages of the managers. Taxes on managers reduces their labor supply and increases the wage of the workers. The extra employment has a rent-seeking effect. The computed marginal tax rate in Lawson’s model contains a kink at the point where workers become managers. It is not clear how such a distinction should be handled in a model of continuous types. In Piketty et al. (2014) and Rothschild and Scheuer (2016), wages are different than marginal product for at least one type of worker. How the firm handles this difference 3Piketty et al. (2014) note that few companies would be willing to go without a CFO for an extended period of time in order to figure out the marginal product of a CFO. This kind of experiementation needed to determine marginal product is not present when the occupation is unique within the firm. 5 between wages and productivities is left unresolved in all three papers. Lawson (2015) has wages equaling marginal product but managers are able to decrease worker marginal product and raise their own marginal product by over-hiring. My paper contributes to the literature by taking into account how firms handle the increase in wages of some types at the expense of other types. By making sure that the firm’s zero profit condition is maintained, I am able to model how bargaining by high skilled workers affects low skilled workers. The zero profit condition constrains the amount of wage dispersion in the economy and forces workers that are adversely affected by rent-seeking to change their behavior. The firm must account for the change in behavior by both types. With the firm maintaining zero profit, the government is given another avenue to redistribute income because increasing taxes on those who rent-seek causes within firm redistribution of income towards workers who are hurt by rent-seeking. These effects are missing from the rent-seeking literature. 1.3 Rent-Seeking Model The economy contains two types of workers, high and low skill. Both types of workers choose intensive margin labor supply and both types always work some positive amount of hours. High skill workers, denote by the number 2, are able to determine their own wage through rent-seeking. Since type 2 workers are able to choose their own wage, type 1 workers’ wage will differ from their marginal product. Let w1 and w2 denote the marginal productivities of each type of worker and ˜w1 and ˜w2 are the actual wages paid by the firm to each type of worker. High-skilled workers are able to control their own wage while the low skill workers are wage takers. The main determinant of the wages is the ratio of high skill to low skill labor, henceforth called the labor ratio. Firms are utility takers in the high skill labor market which means that no single firm can increase the labor ratio above the equilibrium value in order to increase profits. If a single firm tries to increase the labor ratio then all of its workers 6 will go to other firms. The zero profit condition determines the wage of the low skill worker given the labor ratio and the high skill wage. Key to the equilibrium is that firms are constrained by the equilibrium labor ratio. And this equilibrium labor ratio constraint is at least just binding. Firms want to increase high- skilled labor demand because high-skilled wage will decrease by more than the decrease in high-skilled marginal product; which increases profit. A single firm cannot violate the equilibrium labor ratio constraint because they must give high skill workers at least as much utility as the other firms are giving. Low skilled workers have the standard labor-leisure problem. The function v is strictly increase and strictly convex that measures the disutility of labor supply. Each low skill worker solves ˜w1l1 − T ( ˜w1l1) − v(l1) max l1 The low skill worker’s first-order condition is ˜w1(1 − T(cid:48)( ˜w1l1)) = v(cid:48)(l1) (1.1) (1.2) Due to rent-seeking by the high-types, ˜w1 is less than the marginal product w1. Since ˜w1 is less than w1, the low skilled worker will supply less labor in equilibrium. High skilled workers have a different utility maximization problem. Each high skilled work is able to increase their return to labor by percentage d; this is the rent-seeking effort. High skilled workers can choose both labor supply and rent-seeking effort. Their utility maximization problem is w2l2(1 + d) − T (w2l2(1 + d)) − v(l2 + αd) (1.3) max l2,d where α measures the relative cost of rent-seeking effort compared to labor. The underlying productivity is w2. The amount of rent-seeking is directly connected to the difference between the productivity and the wage paid to workers. d = − 1 ˜w2 w2 7 (1.4) Then the high skilled worker’s true utility maximization problem is ˜w2l2 − T ( ˜w2l2) − v max l2, ˜w2 l2 + α (cid:18) (cid:19)(cid:19) − 1 (cid:18) ˜w2 w2 High skilled workers are maximizing utility along the following relation, ˜w2 l2 = w2 α (1.5) (1.6) Therefore, wage and labor supply, ˜w2 and l2, are both functions of the marginal product of the high-type. Since marginal products are determined by the ratio of high-skilled to low-skilled labor, wage and labors supply are also functions of the labor ratio. When α is high, the amount of rent-seeking is low compared to labor supply. Also, a higher marginal product increases the amount of rent-seeking because the relative value of rent-seeking is higher. The firm chooses labor demand for each type of worker. The production function exhibits constant returns to scale but diminishing returns to each effective labor type G(θ1l1f (θ1), θ2l2f (θ2)) (1.7) The firm takes the number of firms and the population of each type as fixed so choosing labor demand for that firm is equivalent to choosing total labor demand. Since the produc- tion function exhibits constant returns to scale, the representative firm chooses the ratio of effective labor supply, L = θ2l2f (θ2) θ1l1f (θ1). Therefore, the firm solves the problem g(L) − ˜w1 − ˜w2L L max s.t. L ≤ L∗ (1.8) (1.9) (cid:16) (cid:17) θ2L2 θ1L1 . The firm is constrained by the equilibrium labor ratio, denoted where g(L) = G by L∗. No single firm can raise L above the equilibrium L without losing all of its high- 1, skilled workers. Each firm is a high-type utility-taker; the equilibrium labor ratio constraint is equivalent to a constraint on high skill utility. No firm can raise L above the equilibrium amount and still offer the high-types the same utility as all other firms. 8 High skilled workers are able to choose their wage. But, each firm knows how high skilled workers will react to changes in the productivity of high skilled workers. The first-order condition is g(cid:48)(L) = ˜w2 + L − β ∂ ˜w2 ∂L (1.10) where β is the multiplier on the firm’s equilibrium labor ratio constraint. The second term on the right-hand side of (1.10) is the change in the wage as result of high skilled workers re-optimizing because the wage rate has changed. Firms know that increasing L will decrease the high skill wage more than the marginal product and hence increase profits. Without the constraint, the firm over-hires high-skill workers in order to decrease their wage. There is a possibility of non-equilibrium that comes from (1.10). For some parameter values, the firm might find that it always wants to decrease L from the equilibrium L∗. Hence, there is no value of L∗ that causes an equilibrium. Section ?? derives the condition for which there is no equilibrium. The productivity of each high-skilled worker is dependent on the equilibrium labor of each type of worker. Individual labor supply of high type, l2, is a decreasing function in the aggregate labor ratio L. The function is decreasing because increases in L decrease the marginal product, w2, which causes individual labor supply to decrease. If each firm increases the demand for high-skilled labor then the chosen wage will decrease. (cid:18) g(cid:48)(cid:48)(L)l2(L) + g(cid:48)(L) (cid:19) ∂ ˜w2 ∂L = 1 α g(cid:48)(cid:48)(L) α = (l2 + g(cid:48)(L)θ2) < 0 (1.11) ∂l2 ∂L The change in individual labor supply in response to a change in aggregate labor supply is ∂l2 ∂w2 ∂w2 ∂L . Using the high type’s maximization problem, ∂l2 ∂L = in productivity w2 in response to a change in L is the second-order derivative of production. = θ2.4 And the change ∂l2 ∂w2 Firms know that increasing the labor demand for high skilled workers will decrease the productivity of high-skilled workers. With rent-seeking, a decrease in the productivity of high-skilled workers will increase the gap between the wage that high-skilled workers are 4See the appendix for the derivation. 9 paid and the productivity of those workers. The gap increasees because l2 has gone up which makes L increase and also makes rent-seeking more valuable. A decrease in l1, however, makes L go up but does not make rent-seeking more valuable because l2 has remained constant. The change in wage with respect to a change in L is always larger than the change in marginal product. That is ∂ ˜w2 ∂L − ∂w2 ∂L = g(cid:48)(cid:48)(L) α (l2 + w2θ2 − α) = (cid:18) α (cid:18) ˜w2 w2 (cid:19) − 1 g(cid:48)(cid:48)(L) α Which is always less than zero because g is concave. (cid:19) + w2θ2 (1.12) Let f (θi) be the proportion of the population of type i. The equilibrium conditions are: l1f (θ1) = L1 and l2f (θ2) = L2 (cid:19)(cid:19) v(cid:48)(l1) = g(cid:48)(L) − ˜w2L (cid:19)(cid:19) v(cid:48)(cid:18) L2 α ˜w2 = v(cid:48)(cid:18) (cid:18) ˜w2 (cid:18) f (θ2) − 1 2 α ˜w2 w2 − 1 ˜w1 = g(L) − ˜w2L + α w2 = ˜w2 (1.13) (1.14) (1.15) (1.16) (1.17) There are six unknown quantities determined in equilibrium, {l1, L1, l2, L2, ˜w1, ˜w2} The conditions for equilibrium begin with the aggregate labor supply conditions, (1.13). Low types’ labor supply is determined by (1.14) which equates the disutility of labor with the return to labor. Similarly, the first-order condition for the high-type’s labor supply gives a labor supply equation while (1.10) gives a labor demand equation. The equilibrium high-type wage is determined by the first-order condition on ˜w2 in the high-type’s problem. The high- type wage must satisfy the following equation along with (1.6), which gives us (1.16). The zero profit condition, (1.17), gives the low-type’s wage. In equilibrium, the low type workers are paid less than their marginal product. Both types’ marginal products are determined by the equilibrium labor ratio L. In this equilibrium, both types of workers are maximizing utilities and the firms are maximizing profits. 10 1.4 Nonlinear Taxation The government raises revenue using a nonlinear tax function. The tax function is set so that high and low skill workers do not work or consume the same amount. Instead of calculating the tax function explicitly, an equivalent direct mechanism chooses post-tax income and labor supply. A direct mechanism where workers are truthfully reporting their type is implementable using a decentralized system of taxes and transfers (Dierker and Haller, 1990). Then, the optimal amount of rent-seeking is determined by the labor supply, through equation (1.6). Definition 1.4.1. A direct mechanism is a set of functions l : {θ1, θ2} → [0,∞) c : {θ1, θ2} → [0,∞) (1.18) (1.19) that map worker type to a labor supply and a post-tax income. The mechanism has workers report their type, they work and receive consumption ac- cording to their reported type. A mechanism is truth-telling, or incentive compatible, when the worker reports their true type. The post-tax income of the low type is c1 = ˜w1l1 − T ( ˜w1l1). A low type’s utility is calculated according to the indirect function, u1(c1, l1) = c1 − v(l1) (1.20) High type utility is simplified using (1.6) to eliminate the rent-seeking term (because the utility maximizing rent-seeking effort is a function of labor supply). The high type indirect utility function is now, u2(c2, l2) = c2 − v(2l2 − α) (1.21) where (1.21) uses (1.6) to calculate the optimal wage as a function of labor supplied. 11 The high types’ mimicking indirect utility is u2(c1, lm 2 ) = c1 − v(2lm 2 − α) (1.22) The wages, for both types, are endogenous which changes the mimicking labor supply, lm 2 . Since the government only observes pre-tax income, the labor supply and wage of the mim- icking high type must be equal to the low type’s allocated pre-tax income. This constrains the high type’s ability to mimic the low type. 2 lm ˜wm 2 − T ( ˜wm 2 lm 2 ) − v max ˜wm 2 ,lm 2 s.t. ˜wm 2 lm 2 = ˜w1l1 (cid:18) lm 2 + α (cid:18) ˜wm 2 w2 (cid:19)(cid:19) − 1 (1.23) (1.24) The mimicking wage ˜wm 2 is optimal wage for the mimicking labor supply. Solving the high type’s problem gives an identical relation between wages and labor supply as in (1.6). Then, the optimal mimicking labor supply is (cid:18) α ˜w1l1 ˜wm 2 lm 2 = (cid:19)1/2 ≡ A(l1, l2) (1.25) where increasing the high type’s labor supply increases the mimicking labor supply as well, A2 > 0.5 The reason is that high type productivity decreases which decreases the optimal amount of rent-seeking. Therefore, it takes more labor supply to reach the income constraint. Since A2 > 0, the government can reduce the mimicking utility by increasing l2 through decreasing the marginal tax rate on the high type. The government aggregates utilities through an increasing and concave social welfare function, Φ(·). With this social welfare function, the government’s objective function is W (c1, c2, l1, l2) = Φ(u1(c1, l1))f (θ1) + Φ(u2(c2, l2))f (θ2) (1.26) where u1 and u2 are the indirect utility functions defined in (1.20) and (1.21). The govern- ment is constrained by the need to raise revenue, R. Here, the total production must be as 5Where A2 = ∂A ∂l2 . Note that for some production functions (like Cobb-Douglas), A1 = 0. 12 great as government revenue and total consumption. G(θ1l1f (θ1), θ2l2f (θ2)) − c1f (θ1) − c2f (θ2) ≥ R (1.27) Also constraining the government’s ability to redistribute is the incentive compatibility con- straint. Rent-seeking is still available for the high type when mimicking. The IC constraint is u2(c2, l2) ≥ u2(c1, A(l1, l2)) (1.28) Rent-seeking means that taking the low type’s allocation will decrease the disutility from labor. The government’s problem is to maximize its objective function subject to the government budget constraint and the IC constraint. max l1,l2,c1,c2 Φ(u1(c1, l1))f (θ1) + Φ(u2(c2, l2))f (θ2) (1.29) s.t. G(θ1l1f (θ1), θ2l2f (θ2)) − c1f (θ1) − c2f (θ2) ≥ R u2(c2, l2) ≥ u2(c1, A(l1, l2)) Let γ be the multiplier on the government budget constraint and λ is the multiplier on the = Φ(cid:48)(u1)v(cid:48)(l1)f (θ1) − γθ1G1(θ1l1f (θ1), θ2l2f (θ2)) + λ2v(cid:48)(2A(l1, l2) − α)A1(l1, l2) = 0 IC constraint. The resulting first-order conditions are ∂L ∂l1 ∂L ∂l2 = Φ(cid:48)(u2)v(cid:48)(2l2−α)f (θ2)−γθ2 G2(θ1l1f (θ1), θ2l2f (θ2))+λ(v(cid:48)(2l2−α)−v(cid:48)(2A(l1, l2)−α)A2(l1, l2)) = 0 2 (1.30) ∂L ∂c1 ∂L ∂c2 = Φ(cid:48)(u1)f (θ1) − γf (θ1) − λ = 0 = Φ(cid:48)(u2)f (θ2) − γf (θ2) + λ = 0 (1.31) (1.32) (1.33) 13 The marginal tax rate for finite types is defined as one minus the slope of the utility function at the optimal allocation.6 Applying (1.33) to (1.31) to get v(cid:48)(2l2 − α) w2 = 1 2 + (1 − Φ(cid:48)(u2)/γ)v(cid:48)(2A(l1, l2) − α)A2(l1, l2) w2 (1.34) The left-hand side is not the traditional marginal tax expression.7 Productivity, not the wage, must be in the demoninator. The correct term is the ratio of the marginal disutility of total effort over the wage. 2 = 1 − v(cid:48)(2l2 − α) τ R = 2 ˜w2 − w2 + 2(1 − Φ(cid:48)(u2)/γ)v(cid:48)(2A(l1, l2) − α)A2(l1, l2) 2 ˜w2 with A2(·,·) > 0, the marginal tax rate could be positive.8 ˜w2 (1.35) The natural comparison result is not the Mirrlees (1971) zero distortion result but the Stiglitz (1982) negative top marginal rate result. Stiglitz (1982) found a negative top rate because the government could use the marginal tax rate to decrease the wage rate of the high type by making the high type work more. The incentive compability constraint is relaxed when the top marginal tax rate is negative. This is another example of the government using the tax system to influence the distribution of pre-tax income. The main Stiglitz (1982) result is that the top marginal rate is 2 = −(1 − Φ(cid:48)(u2)/γ)v(cid:48)(AS(lS 1 , lS τ S 2 ))AS 2 (lS 1 , lS 2 ) wS 2 (1.36) where the S superscript denotes the equilibrium values in the Stiglitz (1982) economy. The top tax rate in the Stiglitz economy is non-positive. When labor is perfectly substitutable across skill then AS 2 = 0 because wages are constant. Then we get the no-distortion at the top result of Mirrlees (1971). If labor is less than perfectly substitutable then AS 2 > 0 and the top marginal tax rate is negative. When labor is not perfectly substitutable, the 6This is because for finite types, the optimal mechanism is a set of functions whose derivatives are discontinuous at the optimal allocation. Hence, the implied optimal tax function also has a discontinuous first derivative. 7The traditional equality is v(cid:48)(l) = ˜w(1 − τ ) where τ is the marginal tax rate. 8In the Mirrlees (1971) model where w2 = ˜w2 and productivity is fixed, A2 = 0 which gives the no distortion at the top result. 14 government can adjust the pre-tax income distribution in favor of redistribution by making high skill workers exert more effort. Increasing effort will decrease the high skill wage and increase the low skill wage (which is why the IC constraint is relaxed). Theorem 1.4.1. The marginal tax rate on the high type is greater when there is strictly positive rent-seeking effort. Proof. The difference between the rent-seeking marginal rate and the non-rent-seeking marginal rate is 2 − τ S τ R 2 = 2 ˜w2 − w2 (1 − Φ(cid:48)(u2)/γ)v(cid:48)(2A(l1, l2) − α)A2(l1, l2) + 2 ˜w2 (1 − Φ(cid:48)(u2)/γ)v(cid:48)(AS(lS + ˜w2 2 ))AS 2 (lS 1 , lS 1 , lS 2 ) wS 2 (1.37) > 0 where τ R 2 is the rent-seeking top marginal tax rate. All three terms are positive since A2 > 0 and ˜w2 > w2. The difference is positive because higher taxes reduce rent-seeking effort. Hence, the government can reduce high skill wages by increasing the marginal tax rate on high skill workers. Rent-seeking causes the marginal tax rate to approximate the Stiglitz (1982) marginal tax rate when the mimicking labor supply is highly responsive to the marginal rate. With a highly responsive labor supply, an increase in the marginal rate causes a large increase in labor supply by all high types. Larger high type labor supply then reduces wage inequality. Hence, the government is better able to redistribute income through the pre-tax distribution of income rather than the post-tax distribution. Having a large marginal rate does not increase the marginal product of the high type and so the effect of rent-seeking is not exacerbated. The marginal tax rate is unambiguously positive when Φ(cid:48)(u2)/γ → 1. That is, when the marginal social welfare of the high types is approximately equal to the marginal cost of 15 public funds. Then the marginal tax rate is bounded from below by 1/2. v(cid:48)(2l2 − α) ˜w2 = w2 2 ˜w2 < 1 2 =⇒ τ2 > 1 2 (1.38) We can also determine a similar expression for the marginal tax rate for the low type. I combine (1.31) and (1.32) to get Φ(cid:48)(u1)(v(cid:48)(l1) − 2v(cid:48)(2A(l1, l2) − α)A1(l1, l2)) = γ(w1 − 2v(cid:48)(2A(l1, l2) − α)A1(l1, l2)) (1.39) which can be reduced using the definition of g to the equation v(cid:48)(l1) ˜w1 = w1 ˜w1g1 + g1 − 1 ˜w1g1 2v(cid:48)(2A(l1, l2) − α)A1(l1, l2) (1.40) If g1 → 1 or A1 = 0 then the marginal tax rate is negative.9 When g1 → 1, we get the following formula, τ1 = 1 − w1 ˜w1 < 0 (1.41) The marginal tax rate is negative because the low types’ wage is less than their productivity. The government has two levers to create redistribution. Average tax rates are the direct way to redistribute. But because marginal taxes can change the wages of both types of workers, the government can use marginal tax rates to change the pre-tax distribution of income. The negative tax rate on low types is used to increase the labor supply of low types which increases the wage of high types. Increasing the marginal tax rate on high types will decrease rent-seeking effort. By reducing rent-seeking effort, the government is redistributing pre-tax income through the firm via the zero-profit condition. Adding rent- seeking to the model changes the government’s approach to redistribution through pre-tax income compared to the Stiglitz (1982) model. 9When production is Cobb-Douglas or linear, A1 = 0 and so the marginal tax rate is negative for the low type. 16 1.4.1 Numerical Results I now numerically solve the government’s optimal tax problem in equation (1.29).10 All of the results are compared to the Stiglitz (1982) results with endogenous wages since that is the appropriate benchmark model. In order to complete the numerical results, I assume functional forms and parameter values. Due to the restrictiveness of the two-type model, I cannot calibrate many parameters to a real economy. Parameter values are also restricted by the fact that an equilibrium is not guaranteed in the rent-seeking model. Table 1.1 shows the parameter values and functional forms chosen in the numerical analysis. Table 1.1: Parameter Values and Function Forms Parameter f (θ1) f (θ2) (θ1, θ2) R v(x) G(x, y) Φ(x) Value 0.9 0.1 (1,1.55) 0 x4 x0.35y0.65 x if low type and 0.2x if high type The disutility of effort function, v, is set so that the elasticity of labor supply corresponds to an elasticity of taxable income of 0.33. This value is in the range of acceptable estimates discussed in Saez et al. (2012). The production function is Cobb-Douglas with low skill workers having a smaller exponent (0.35) than high skill workers (0.65). Finally, there is no revenue requirement for the government. Hence, the government is only redistributing income and not funding some unspecified public spending. There are two reasons for setting R = 0. First, rent-seeking is a type of redistribution within the firm by high-skill workers. We are interested in how the optimal tax function responds to redistribution via rent-seeking. Second, as α increases, the amount of production in the economy will also change absent of 10I use MATLAB’s fmincon method that solves (1.29) directly. Solving the maximization problem directly is more stable than solving the first-order conditions of the problem. In fact, some of MATLAB’s nonlinear equation solvers use minimization methods to find a solution. 17 any taxes. If R = 1 then the amount of taxation necessary to fund R = 1 changes based on value of α. Hence, when R is fixed but α is changing, the size of the economy is changing but the amount of government spending is not. Table 1.2: Numerically Calculated Tax Rates Quantity Marginal tax rate, high skill Marginal tax rate, low skill Average tax rate, high skill Average tax rate, low skill Rent-seeking Marginal tax to rent-seeking α = 0.35 α = 0.4 Stiglitz Model 0.551 -0.166 0.718 -1.883 0.726 1.52 0.506 0.059 0.697 -1.339 0.569 1.86 -0.550 0.080 0.413 -1.154 0 - Table 1.2 shows the numerically calculated marginal and average tax rates. The main result is that average and marginal tax rates are higher for high skill workers compared to the Stiglitz model while the opposite is true for low skill workers. This result follows the logic that the government wants to tax away rent-seeking effort and redistribute to low skill workers. Rent-seeking is not fully taxed away because the government cannot perfectly observe the rent-seeking behavior. The government does not want to distort the labor supply of high skill workers but it must in order to reduce rent-seeking. By reducing rent-seeking, the government is using the marginal tax rate to redistribute pre-tax income. The last row of table 1.2 shows the difference between the rent-seeking marginal tax rate and the stiglitz rate divided by the total amount of rent-seeking.11 This ratio is increasing as α increases. The marginal tax to rent-seeking ratio gives a rule-of-thumb for increasing the marginal rate rate. If the amount of rent-seeking for a particular income level is thought to be, for example, 5% then the marginal tax rate should be increased by between 1.5× 0.05 to 1.85 × 0.05 percentage points. The ratio changes mainly with changes in α and not dramatically with other parameters. 11The formula is (τ RS 2 − τ S seeking. 2 )/d where d is the percentage increase in income due to rent- 18 Average tax rates have received little attention in the literature.12 High skill workers face much higher average tax rates than in the Stiglitz model. This shows that the government is trying to redistribute income away from high skill workers, undoing the redistribution of income within the firm. Low skill workers have low average tax rates which confirms the idea that the government is redistributing towards low skill workers. This sheds more light on the Piketty et al. (2014) result of an increased top marginal tax rate. The government is not just taxing away the rent-seeking income, but it is trying to increase post-tax income of low skill workers. Piketty et al. (2014) can only show that the top marginal tax rate has increased in the presence of rent-seeking. While the government wants to tax away rent-seeking effort, it is constrained by two forces. First, increasing the marginal rate facing high skill workers reduces productive labor effort, which the government does not want to reduce. Second, by reducing productive labor of high skill workers, the relative amount of high skill labor decreases which increases rent-seeking effort. Therefore, marginal tax rates are not a useful tool for the government in reducing rent-seeking effort. 1.5 Conclusion This chapter aimed to fix a number of the flaws in the literature of rent-seeking in the labor market. The two main flaws are: 1. Firms in perfectly competitive markets must maintain zero profits in equilibrium. 2. Firms can react to high skill workers rent-seeking. This chapter maintains zero profits by having the low skill wage decrease in response to increased rent-seeking. And firms can react to rent-seeking by hiring more hire skill work- ers. By hiring more skilled workers, the firm decreases the productivity of each high skill worker, reducing the incentive to rent-seek. At the same time, low skill workers see increased productivity and therefore, increased pay. 12One reason is that the perturbation method of solving for the top marginal tax rate cannot say anything about average tax rates. 19 The government’s tax problem is more complex than the classical Mirrlees (1971) or Stiglitz (1982) models. Since the government must directly choose labor supply, both rent- seeking and the zero profit condition change in response. This complexity precludes easily interpretable analytical solutions. Numerically solving the optimal tax problem adds a few new results to the literature. First, low skill marginal tax rate is lower than the Stiglitz (1982) no-rent-seeking benchmark. The high skill marginal tax rate is higher, as Piketty et al. (2014) find in their model of rent- seeking. Since Piketty et al. (2014) are only able to calculate the marginal tax rate for the high skill workers, my results show that low skill workers do face a lower marginal tax rate. Second, average tax rates for high skill workers are increased while the opposite is true for low skill workers. Only by numerically solving for the optimal tax function can average tax rates be calculated. The changes to average tax rates shows that the government is trying to redistribute away from high skill workers through by shifting decreasing the post-tax income of high skill workers. There are some limitations to model analyzed in this chapter. First, there are not two types of workers in the world so being able to extend the model to a continuum of types would be a significant achievement. However, Piketty et al. (2014) in truth only have a two type model and are only able to calculate the marginal tax rate for the highest type. Rothschild and Scheuer (2016) use a continuum of types but they do not allow rent-seekers to know that rent-seeking effort will increase their wage. Extending the model to a continuum of types is a challenge because when multiple types can rent-seeking, it is not clear how the rent-seeking should propogate to other workers. Second, firms are able to react to rent-seeking but that ability to react is shutoff in equilibrium. This happens because there is no unemployment mechanism so all workers must be employed in the model. In equilibrium, all workers are hired which shuts down the firm’s ability to react. While this chapter has some clear limitations, it does add to the literature in important ways. The model attempts to create an environment where the idea of rent-seeking closely 20 follows the choice problems faced by workers and firms. This connection has not necessarily been very tight in the literature. This chapter has added to the literature by making that connection between idea and model as tight as possible. 21 CHAPTER 2 OPTIMAL TAXATION AND SEARCH WITH INTENSIVE MARGIN LABOR SUPPLY 2.1 Introduction Redistribution over the long-run focuses solely on redistributing income across people who are regularly working. Redistribution over the medium-run, however, includes redis- tributing income between working and non-working individuals. This chapter investigates how to optimally redistribute income through taxing income over the long-run. Specifi- cally, whether adding an extensive margin significantly changes the government’s optimal tax function. Workers can be both voluntarily and involuntarily unemployed. Long-run unemployment could be caused by increased automation of jobs (Acemoglu and Restrepo, 2018) or, more recently, from lengthy stay-at-home orders. Redistribution is constrained by providing incentives for higher skilled workers to search for work and exert effort once successfully matched with an employer. Individuals who choose not to search for work are voluntarily unemployed. Not search- ing is an equilibrium response to the supply and demand for vacancies; when there are few vacancies then not searching is optimal behavior. Involuntarily unemployed individuals are searchers who fail to match with a firm. Involuntary unemployment is seen as a condition that the government should provide insurance against. No feasible technology exists to ac- curately determine search effort by an individual. The government is forced to offer the same unemployment benefit to both types of unemployed workers. With a single unem- ployment benefit, the government faces two informational constraints that reduce possible redistribution. The first constraint is a no mimicking condition where higher skill workers should not want to mimic lower skill workers. Such a no mimicking condition reduces the amount of redistribution from high skill to low skill workers because high skill workers will 22 find it optimal to mimic a low skill worker if redistribution is too high. Second, increasing the unemployment insurance amount will increase the number of voluntarily unemployed individuals. With fewer employed workers, the government cannot fund as much redistribu- tion. The constraint is on redistribution from employed workers to unemployed individuals. This second constraint is not explicitly stated in the government’s problem but it is present and affects how the government chooses the optimal income tax function as well as the unemployment benefit. The model combines two different static labor market models; the traditional intensive labor supply model and the search/matching model. Individuals make labor supply decisions on both the intensive and extensive margins. Firms post vacancies until expected profits are zero. Searching workers and firms are stochastically matched and the two parties bargain over the surplus created by the successful match. Those individuals who fail to be matched are involuntarily unemployed and receive the same unemployment benefit as the individuals who do not search. With equilibrium involuntary unemployment, the government has three methods of re- distribution. First, the government is redistributing income from high skilled individuals to low skilled individuals using average taxes. Second, the government is redistributing between workers and non-workers in the form of an unemployment benefit. Third, the government can use marginal tax rates to increase the probability of employment. The three goals are not complementary. Any income redistributed from high to low skilled individuals is income that cannot fund the unemployment benefit. And increasing the probability of employment reduces the need for a generous unemployment benefit. The government has a few ways to achieve its goals for redistribution. Average tax rates redistribute income from high skilled workers to low skilled workers. The unemployment benefit redistributes from the employed to the unemployed. Less obvious redistribution tools are the probability that a searching individual will find a job and the number of individuals induced to search. By increas- ing marginal tax rates on lower types, the government can make mimicking less likely and 23 increase the probability of employment for the low type worker. The main finding of the paper is the government decreases wages and hours worked in order to increase the unemployment benefit and the probability of employment. Since the lower skilled workers have lower incomes, the government must also increase the number of non-searchers in the economy. The intuition behind these results stems from the govern- ment’s desire to redistribute income. While the government wants to redistribute from those with high income to those with low income or are unemployed, the government faces two different informational constraints. First, the traditional information rent must be paid to the higher skilled individuals in order to get them to report their type truthfully (that is, work the correct number of hours and bargain for the appropriate wages). Increasing the probability of employment for a long type reduces the no-mimicking constraint for higher types. The government is able to increase the probability of employment by increase marginal tax rates. Second, the government faces a constraint at the bottom of the income distribu- tion. By pushing down wages and hours, the government makes searching less beneficial for low skilled workers. The increased unemployment benefit redistributes income towards the voluntarily unemployed workers. The numerical results from the three type model highlight how the government is redis- tributing. First, the government uses positive marginal tax rates to increase the probability of employment. The incentive compatibility constraint ensures the expected surplus from searching for a job is at least as good as not searching for a job. Therefore, increasing the probability of employment is an indirect way to redistribute income and it comes at nega- tive cost to the government. Second, as the numerical example in section 5 clearly shows, increasing the probability of employment also relaxes the downward incentive compatibility constraint on higher types. The marginal tax rate increases the surplus generated by the firm and this increase is larger for the truth-telling type than for a higher type mimick- ing the truth-telling type. This relaxes the incentive compatibility constraint reducing the information rent paid to the higher types. 24 2.2 Past Literature There are a number of papers that study income taxation with a search labor market. I will focus on the literature that employs a static search model. The reason I focus on static search is because static labor market models show the extent to which tax policy should be used for redistribution rather than counter-cyclical policy (Boadway and Tremblay, 2013). There is also a related literature that adds an extensive margin to the Mirrlees (1971) framework. These models are generally difficult to solve completely because the type space is two-dimensional. I will cover the main results of those papers as well. An early seminal paper is Boone and Bovenberg (2002). They model the labor market with a continuum of identical workers who decide how much effort to exert when searching for vacancies posted by firms. When there is free-entry, which implies perfectly elastic labor demand, workers bear the entire burden of taxation. The optimal tax function is progressive (in the sense that average taxes are increasing in wages) if and only if workers have more bargaining power than is efficient under the Hosios (1990) condition1. Without a revenue requirement, the tax code is able to correct inefficiencies due to search externalities. If workers have too much bargaining power then increasing average taxes decreases the return to bargaining harder for the worker. Hence, effective bargaining power of the worker is reduced by creating a more progressive tax function. When there is a positive revenue requirement and free entry of firms, the optimal tax function places all of the burden of taxation on the workers. The reason for placing all of the burden on workers is related to the production efficiency result of Diamond and Mirrlees (1971). By taxing the workers only, the government is not changing market tightness. If the government did tax the profits of the firms then market tightness would change and would create an inefficient amount of market tightness. Shapiro (2004) uses a model of exogenous search effort and heterogeneous cost of par- 1There is no revenue requirement here. The tax code simply exists to correct the ineffi- ciency generated by searching. 25 ticipation in the labor market to analyze the effect of a linear income tax. He finds that the government wants to increase participation in the labor market. However, the govern- ment cannot increase participation through the use of linear taxes. The result that linear taxes cannot increase participation in the labor market are driven by two assumptions that Shapiro makes. First, firms and workers do not bargain over the match generated surplus. Without bargaining over surplus, the government cannot induce workers or firms to changes wages; which would change participation decisions. Second, workers have no intensive search decision and firms have no active role in the market2. With such a restrictive choice set for workers and firms, the government has little ability for change the equilibrium outcomes in the model. The closest papers to Mirrlees (1971), that is, continuum of productivities that are unob- servable by the government, are Hungerbuhler et al. (2006) and Lehmann et al. (2011). Both papers include static matching between workers and firms. In Hungerbuhler et al. (2006), in- dividuals choose to participate in the labor market based on whether expected labor income is greater than the unemployment benefit. Because there is only differences in productivity across individuals, there is a cutoff productivity where everyone below the cutoff will not search for vacancies. In Lehmann et al. (2011), individuals also have heterogeneous cost of entering the labor market. Each productivity level has some non-participants as well as participants who failed to find a job. The main result of both papers is that the government can decrease the wage, by increas- ing the marginal tax rate, which shifts the labor demand curve upward. By shifting labor demand upward, the government increases the number of employed workers. The authors call this effect the “wage-cum-labor demand”. One difference between the two papers is that when there is non-participation at the top of the skill distribution, the optimal marginal tax rate is strictly positive. This result happens in Lehmann et al. (2011) because of the heterogeneous cost of entering the labor market causes some non-participation even at the 2Firms are posting vacancies but make no decision on how many vacancies to post. 26 top of the productivity distribution. A number of other papers that have directly added an extensive margin to the Mirrlees (1971) model. Saez (2002) discusses an extensive margin model added to a finite type Mirrlees model. The focus of Saez’s paper is on the lower end of the income distribution because that is where the extensive margin decision is made. The main result is a earned income tax credit is preferred over a negative income tax. There should also be a small guaranteed income for low ability workers. Building on Saez’s work, Jacquet et al. (2013) look at a continuous, two-dimensional type version of Saez’s model. In the complicated type space, each skill level has a distribution of fixed costs of working. In equilibrium, a nonzero percentage of all types will not work because their fixed cost of working is too high. Their formulation of the government’s tax problem is carefully specified in order to separate the participation tax and the unemployment benefit. Scheuer and Werning (2016) place the Jacquet et al. (2013) model into a unified framework for analyzing nonlinear income taxation. In both papers, wages are different than productivities because of Nash Bargaining be- tween the worker and the firm. However, wages are shaded down even without taxes. When adding an intensive labor supply decision, bargaining presents a problem because the Nash Bargaining problem is now a constrained optimization problem. The constraint is the worker’s first-order condition on utility maximization with respect to labor supply. By adding the constraint, the bargained over wage and the bargained over labor supply are the values that ensure the worker is maximizing utility. One important point about search models is the efficiency of the search process. The standard efficiency condition in Hosios (1990) relies on only wages being decided through Nash bargaining and that any deviation from the efficiency condition can be perfectly ob- served by the government. Both of these assumptions fail to hold in the model presented in section 6. A natural question then is how much does government intervention in the search process help or hurt making search more efficient. There is not really an answer in the lit- 27 erature because there does not seem to be a Hosios condition for search models with hours worked as an additional bargaining control variable. Arseneau and Chugh (2006) state that even with the standard Hosios condition for wage-only Nash bargaining, the introduction of hours worked can break search efficiency. At no point in their paper do they create a Hosios condition. 2.3 Model The labor market model below has two important features that combine extensive and intensive margins of labor supply: 1. Stochastic matching between potential workers and firms 2. Nash bargaining between matched workers and firms over pre-tax income and hours worked The first feature is used in Hungerbuhler et al. (2006) to calculate the optimal tax function with just wage bargaining. The second feature is used in Mirrlees (1971) to calculate the optimal tax function with intensive margin labor supply decisions.3 With extensive labor supply, any increase in average taxes will increase the amount of unemployment benefits possible and increase the type of individual indifferent between searching and not searching. With intensive labor supply, any increase in marginal tax rates will decrease intensive labor supply. The government can increase the marginal tax rate in order to increase labor demand when there is matching between workers and firms; which is manifest as an increase in the probability of employment. Hungerbuhler et al. (2006) show how the government trades off between increasing employment and increasing wages. Adding hours worked complicates the analysis because increasing hours worked also increases probability of being successfully matched but must be compensated with higher wages. 3See Saez and Piketty (2013) for a review of the income taxation in a classical labor supply framework. 28 Government sets (T, b) Firms post vacancies and individuals decide to search Matching occurs Bargaining over income and hours worked Government transfers occur Figure 2.1: Timing of decisions by government, firms, and individuals. The government is still trading off between efficiency and redistribution much like in Mir- rlees (1971). The trade-off is more complicated because new margins of individual decisions create more constraints on the government’s tools used to redistribute. The government’s main tools of redistribution are by increasing the probability of finding a job, the unemploy- ment benefit, and average tax rates. For the rest of the paper, individuals will be used to denote agents who have yet to decide over whether to search or not. Workers are individuals who have been successfully matching with a vacancy. Unemployed denotes individuals who have either not searched or were unsuccessful in the search market. Timing of the model is as follows: 1. The Government posts a tax function T : R+ → R and unemployment benefit b ∈ R. 2. Firms open vacancies and individuals decide whether to search for firms or not. 3. Matching occurs. Nash bargaining over wages occurs and workers commit to supply a specific amount of labor. Intensive labor supply depends on the negotiated wage. Labor supply is observed by the firm. 4. Government transfers to workers and unemployed occurs. In section 2.7, I will describe the direct mechanism that implements the above game. The problem above maps to a direct mechanism where the government directly chooses pre-tax income, expected surplus from a successful match, and hours worked. Such a mapping relies on the fact that the level of the tax function determines the post-tax income distribution and the marginal tax function determines the pre-tax income distribution. 29 2.3.1 Individual’s Problem A type θ individual has utility u(c, h; θ) = cθ − v(hθ) (2.1) Where cθ is the consumption of the agent and hθ is the amount of labor supplied to the firm if the worker searches and is successfully matched with a vacancy. The function v(·) is a strictly increasing, strictly convex function representing the disutility of hours worked. There are two possible budget constraints based on whether the individual was matched with a firm or not. The employed budget constraint is cθ = yθ − T (yθ) (2.2) where yθ is the pre-tax income of the worker that will be determined through Nash Bargain- ing. If the individual is unemployed, the budget constraint is cθ = b (2.3) where b is an unemployment benefit and this benefit can depend on search behavior. Searchers who are successfully matched with a firm do not receive the unemployment benefit. The un- employment benefit modeled here is different from existing unemployment insurance (UI) in the real world. In the model, the benefit is constant across income while UI changes depend- ing on the previous job held. In one sense, the model unemployment benefit is not different from UI in that all workers search once and hold no previous income with which to bench- mark the unemployment benefit. However, then b is no longer an unemployment benefit but compensation for failing to successfully match with a firm. In another sense, there might be less reason to tie unemployment benefits to past income when the unemployed person has been out of a job for several years. Additionally, the individual who doesn’t search gets some utility from not searching equal to d. Therefore, utility for not searching is u(b, 0; θ) = b + d (2.4) 30 Thus, search is costly because it decreases utility by d. If there was no cost to searching (d = 0) then every individual would search and the pool of unemployed individuals would only include failed searchers. 2.3.2 Firm’s Problem Each firm posts vacancies in a labor market for a particular type, θ. Labor markets are perfectly segregated by type and there is no spillover across labor markets. Therefore, we can isolate the firm’s decision-making into the decision of how many vacancies to post in a specific labor market. The firm’s profit from a type θ match is πθ = θhθ − yθ − κθ (2.5) where κθ is the cost of posting a type θ vacancy. The firm gets θ output per hour worked by a type θ worker. Crucial in (2.5) is that the pre-tax income for a type θ worker must be lower than the total product θhθ of the worker. If we assume that θ = yθ/hθ as in the Mirrlees model, the firm will lose money on that type of worker and will never post a vacancy. Firms will post more vacancies in the type θ search market if the government can depress the pre-tax income for a type θ worker or if the government can get a type θ worker to supply more hours worked. 2.3.3 Matching and Bargaining The matching technology follows a Cobb-Douglas functional form with Mθ = A(Uθ)γ(Vθ)1−γ (2.6) where γ is the worker’s bargaining power, Uθ is the number of type θ individuals searching, and Vθ is the number of type θ of vacancies posted. The matching function has constant returns to scale which is the empirically preferred scale property from Petrongolo and Pis- sarides (2001). Define the tightness of the labor market by mθ = Vθ/Uθ. The probability 31 that a type θ searcher is employed is Pθ = = Am1−γ θ Mθ Uθ The probability that a firm will fill a vacancy is = Am −γ θ Mθ Vθ From the equation above, we can derive the expected return for posting a vacancy E[πθ] = Am −γ θ (θhθ − yθ) − κθ Free entry gives the zero-profit condition of mγ θ = A (cid:18)θhθ − yθ (cid:19) κθ (2.7) (2.8) (2.9) (2.10) The zero-profit condition determines the probability of a match because it pins down the market tightness. Therefore, the probability of type θ successfully being matched is (cid:18) θhθ − yθ (cid:19)(1−γ)/γ Pθ(yθ, hθ) = A1/γ κθ (2.11) The probability of type θ being employed is increasing in hours worked, hθ, as long as θ − yθ/hθ > 0. When θ − yθ/hθ > κθ, the firm generates positive profits from the type θ worker and thus the firm wants to hire more of that type of worker. Static search implies that individuals who search and fail to be matched are forever out of a job. Clearly, we do not observe such phenomena frequently for individuals of sufficiently high productivity. A more realistic interpretation of Pθ is that it pins down the average probability of being employed over a long stretch of time. The Nash bargaining problem is to maximize the Nash product by choosing pre-tax income and hours worked. (yθ − T (yθ) − v(hθ) − b)γ(θhθ − yθ)1−γ (2.12) max yθ,hθ where γ is the bargaining power of the worker. Since at the time of matching, the firm has already posted the vacancy, dissolution of the match results in the unemployed individual 32 receiving b and the firm receiving nothing. The participation constraints are internalized by the government (or social planner) because the expected surplus will be greater than zero.4 We can transform the objective function of the Nash product into the expected surplus of the worker. The transformation uses only increasing transformations, which means the pre- tax income and hours-worked that maximize the Nash product also maximize the expected (cid:19) 1−γ γ (cid:18) θhθ − yθ κθ surplus. Sθ = (yθ − T (yθ) − v(hθ) − b)A1/γ (2.13) The transformed equation is useful for understanding the government’s problem. Labor demand is downward sloping, reflected by the Pθ term on the right. Increasing the pre-tax income will increase the utility of the successful searcher but also reduces the probability of being employed. Intensive margin labor supply is not perfectly inelastic; which most search models assume. Increasing intensive labor supply unambiguously increases the expected income of the searcher by increasing the probability of a successful match. However, there is a convex disutility of hours worked that restricts the amount of hours worked by any searcher. So increasing hours worked must be compensated with either increased probability of a match or higher post-tax income. 2.4 Elasticity Rule for the Top Marginal Rate When the distribution of productivities is bounded from above, the marginal tax rate on the right end point of the distribution is zero. This is because increasing the marginal tax rate does not increase revenue but does reduce labor supply. If the productivity distribution is unbounded and has a thick tail then the asymptotic top rate will be greater than zero.5 Incomes in the US and many other developed countries follow a Pareto distribution in the tail (Saez, 2001; Atkinson et al., 2011). Such a distribution does have a thick right tail and 4See section 5. 5The log-normal disribution converges to zero too quickly as skill goes to infinity. This means that if skills are distributed log-normal then the top tax rate is zero. The distribution of top incomes follow a Pareto distribution which has a sufficiently large tail to ensure that the asymptotic tax rate is not zero (Saez, 2001; Saez and Piketty, 2013). 33 so the optimal asymptotic marginal tax rate will be above zero. Also, as Saez (2001) notes, numerical simulations of the marginal tax function for a bounded distribution of types only comes close to zero at the very end of the right tail. For most of the top 10% of types, the marginal tax rate is significantly above zero. Since the marginal tax rate is constant for high incomes (for the highest tax bracket), we can use a perturbation method to derive the optimal top tax rate.6 Saez (2001) and Saez and Piketty (2013) (among others) use this perturbation method to give a simple formula for the top marginal tax rate in terms of a single elasticity as well as a distribution parameter and the marginal social value of income. The perturbation method in Saez (2001) maximizes the government’s welfare function subject to a revenue constraint by varying the top marginal tax rate. Two effects are appear, a mechanical effect on revenue and a behavioral effect on labor supply. The mechanical effect captures the effect that an increase in revenue, due to an increase in the top marginal tax rate, has on government welfare. And the behavioral effect captures the reduction in labor supply due to a higher top marginal rate. The optimal rate occurs when the behavioral effect equals the mechanical effect. That is, the marginal benefit of increased revenue equals the marginal cost of decreased labor supply. This section uses the perturbation method for derivation of the top marginal rate as in Piketty et al. (2014) because there are multiple margins of worker choice. In the more complicated search labor market, the behavioral effect has two parts. The first effect is the standard labor supply effect, similar to Saez (2001). The second effect is a general equilibrium effect caused by individual responses to the marginal tax rate causing a change to the probability of employment. I define the top bracket using a cutoff productivity θ∗; this value corresponds to the income cutoff for the top tax bracket.7 The government’s objective function combines the expected social welfare of those above the cutoff skill θ∗ and the revenue generated from 6We can use the perturbation method because there is no higher bracket. 7Like Saez (2001); Piketty et al. (2014), I assume that a change in the top marginal rate produces a negligible amount of workers crossing the cutoff. 34 taxation. All revenue earned by the government from workers below the cutoff is represented by T (yθ∗) which is assumed to be optimally set. The government choses the top marginal tax rate τ , holding T (yθ∗) fixed. Let λ be the multiplier on the government’s budget constraint. The government’s objective function is (cid:90) ∞ θ∗ (Φ(Sθ + b) + λ(T (yθ∗) + τ (yθ − yθ∗))Pθ(yθ, hθ))f (θ)dθ W (τ ) = (2.14) The first term in the integral measures the social welfare of the expected utility of individuals, not just workers, above the cutoff. Because Sθ + b = (yθ − T (yθ)− v(hθ))Pθ + (1− Pθ)b is the expected utility for a type θ individual. The last two terms in the integral of (2.14) are the tax revenue generated over the entire range of workers. All revenue generated is weighted by Pθ because while there are f (θ) individuals of type θ, there only Pθf (θ) workers of type θ. The following theorem is the main result of this section: Theorem 2.4.1. The optimal top marginal tax rate is τ∗ = 1 − g 1 − g + a¯y + ¯ηy + ¯ηh (2.15) where g is the marginal social value of utility relative to the cost of raising a dollar of revenue, a is Pareto distribution’s tail parameter, ¯y is the average elasticity of taxable income, ¯ηy is the general equilibrium change in employment probability with respect to a change in income, and ¯ηh is the general equilibrium change in employment probability with respect to a change in hours worked. The rest of this section will walk through the proof of theorem 2.4.1 and then look at its implications. As stated above, the derivation of the optimal τ will follow Saez (2001) by isolating a mechanical effect and a behavioral effect. The mechanical effect captures the increase in social utility due to increased government revenue. Lemma 2.4.1. The mechanical effect of a change in the marginal tax rate is M = (1 − g)(ym − yθ∗) (2.16) with ym measuring the average income above the cutoff. 35 Proof of lemma 2.4.1. The method for determining the optimal τ∗ is to differentiate equation (2.14) and simplify. Doing this gives both the mechanical and behavioral effects. I will isolate just the mechanical effect in this proof and then refer back to equation (2.18) in the proof of the behavioral effect. Differentiate equation (2.14) with respect to τ to get (cid:18) ∂Sθ (cid:18) (cid:90) ∞ (cid:18) Φ(cid:48)(Sθ + b) θ∗ (yθ − yθ∗)Pθ − τ ∂yθ ∂yθ ∂(1 − τ ) −∂yθ ∂(1 − τ ) (cid:19)(cid:19) −∂hθ ∂(1 − τ ) ∂Sθ ∂hθ ∂yθ − (yθ − yθ∗)Pθ + f (θ)dθ (2.17) Pθ − τ (yθ − yθ∗) ∂Pθ ∂yθ ∂(1 − τ ) − τ (yθ − yθ∗) ∂Pθ ∂hθ ∂hθ ∂(1 − τ ) (cid:19) f (θ)dθ This equation can them be reduced by applying the envelope theorem (the change in expected surplus with respect to y and h is zero). We get the equation (cid:90) ∞ (cid:90) ∞ θ∗ ((λ − Φ(cid:48)(θ))((yθ − yθ∗)Pθ))f (θ)dθ θ∗ Pθ + τ (yθ − yθ∗) ∂(1 − τ ) (cid:18) ∂Pθ ∂yθ ∂yθ τ ∂yθ ∂(1 − τ ) + τ (yθ − yθ∗) ∂Pθ ∂hθ ∂hθ ∂(1 − τ ) (cid:19) (2.18) f (θ)dθ = ∂W ∂τ (cid:90) ∞ θ∗ λ + =0 ∂W ∂τ = −λ =0 The mechanical effect comes from the top line of (2.18). If we divide through by λ (for the entire equation) then the top line becomes Let gθ = Φ(cid:48)(θ) λ and c(θ) = Pθf (θ). Then we get )((yθ − yθ∗)Pθ))f (θ)dθ (cid:90) ∞ θ∗ ((1 − Φ(cid:48)(θ) (cid:90) ∞ θ∗ (1 − gθ)(yθ − yθ∗)c(θ)dθ λ (2.19) (2.20) Finally, assume, as in Saez (2001), that gθ = g for all θ > θ∗. That is, the marginal social utility is constant above the cutoff. And define the average income above the cutoff as (with C(θ) being the quasi-CDF of c(θ)).8 (2.21) 8I use the term quasi-CDF to note that C(·) is functioning like a CDF but it fails to 1 − C(θ∗) ym = (cid:90) ∞ θ∗ yθc(θ)dθ 1 integrate to one over its domain. 36 Then we get the mechanical effect of (1 − g)(ym − yθ∗) (2.22) This mechanical effect is identical to the Saez (2001) mechanical effect, with ym being the average income above the cutoff and yθ∗ being the cutoff income. An increase in τ raises additional revenue from all income above the cutoff, ym − yθ∗. Every worker, not just those above θ∗, gets some of the increase in tax revenue because the demogrant, T (0), can be decreased.9 But increasing revenue also has a welfare effect on the workers paying the extra tax, which is measured by −g. When gθ = 0, society places no weight on the utility of workers of type θ. In the case where g = 0, the optimal top marginal rate is equal to the revenue maximizing rate. Lemma 2.4.2. The behavioral effect of a change in the marginal tax rate is B = τ 1 − τ (¯yym + (¯ηy + ¯ηh)(ym − yθ∗)) (2.23) with ym being the average income above the cutoff, as defined in the proof of Lemma 2.4.1. The ¯y term measures the average percentage change in taxable income in response to a change in 1 − τ . And the terms ¯ηy and ¯ηh measure the average percentage change in em- ployment probability in response to a change in 1 − τ . Proof of lemma 2.4.2. Using the second line of equation (2.18) with λ removed (from the division by λ in lemma 2.4.1), we get (cid:90) ∞ (cid:18) (cid:19) ∂Pθ ∂yθ ∂yθ ∂(1 − τ ) + τ (yθ − yθ∗) ∂Pθ ∂hθ ∂hθ ∂(1 − τ ) τ ∂yθ ∂(1 − τ ) Pθ + τ (yθ − yθ∗) θ∗ 9The assumption is that any additional revenue from an increase in the top marginal rate goes into the demogrant not the unemployment benefit. Such an assumption is moot in models where y = 0 is treated as unemployment and T (0) is treated as the unemployment benefit such as Saez (2001), Saez (2002), and Piketty et al. (2014). f (θ) (2.24) 37 Define the following elasticities (the signs of each elasticity is included as well) y(θ) = h(θ) = 1 − τ yθ 1 − τ hθ ηy(θ) = ηh(θ) = ∂yθ ∂(1 − τ ) > 0 > 0 ∂hθ ∂(1 − τ ) ∂Pθ yθ ∂yθ Pθ ∂Pθ hθ ∂hθ Pθ < 0 > 0 Using these elasticities, we can simplify (2.24) to get (cid:90) ∞ θ∗ (y(θ)(yθ + (yθ − yθ∗)ηy(θ)) + (yθ − yθ∗)ηh(θ)h(θ))Pθf (θ) τ 1 − τ Averaging over all workers above the threshold productivity creates three average elasticities, defined as follows. The income weighted average income elasticity is (2.25) (2.26) (2.27) (2.28) (2.29) (2.30) (2.31) ¯y = 1 (cid:90) ∞ ym(1 − C(θ∗)) θ∗ y(θ)yθc(θ)dθ (cid:90) ∞ θ∗ ηy(θ)y(θ)(yθ − yθ∗)c(θ)dθ 1 is ¯ηy = (ym − yθ∗)(1 − C(θ∗)) The income above threshold weighted average employment elasticity with respect to income And the income above threshold weighted average employment elasticity with respect to hours worked is ¯ηh = 1 (ym − yθ∗)(1 − C(θ∗)) (cid:90) ∞ θ∗ ηh(θ)h(θ)(yθ − yθ∗)c(θ)dθ Using these elasticities, we get the behavioral effect, τ 1 − τ (¯yym + (¯ηy + ¯ηh)(ym − yθ∗)) (2.32) (2.33) The elasticity ¯y is the standard elasticity of taxable income (ETI) from Saez (2001) (for a discussion of empirical estimates of the ETI, see Saez et al. (2012)). This elasticity 38 measures the individual response of pre-tax income due to a change in the net-of-tax rate. Interpreting the ETI in the context of this model is different than in Saez et al. (2012) because of bargaining between the worker and firm. Saez et al. (2012) assume that the only margin of choice for the worker is how much to work. The current model, however, includes both an hours worked response and wage (or income) response. Increasing the marginal tax rate does two things in the current model. First, workers want to supply fewer hours of work because leisure is relatively more attractive. This effect is in the Mirrlees model and forms theoretical basis for the ETI in Saez et al. (2012). Second, workers see a decrease in the bargaining power as the marginal tax rate increases. This reduces the worker’s surplus but increases the probability of employment. With bargaining power decreasing, the interpretation of the ETI changes from a labor supply response to a combination of labor supply and bargaining power responses. With lemmas 2.4.1 and 2.4.2, we can prove that the optimal top marginal tax rate is where the mechanical effect equals the behavioral effect. Proof of theorem 2.4.1. Setting M = B gives the equation (1 − g)(ym − yθ∗) = τ 1 − τ (¯yym + (¯ηy + ¯ηh)(ym − yθ∗)) (2.34) Define a = ym/(ym − yθ∗) which is the formula for the tail parameter of the Pareto distri- bution. Then solving for τ in (2.34) gives the optimal τ∗ of τ∗ = 1 − g 1 − g + a¯y + ¯ηy + ¯ηh (2.35) The new part in the elasticity rule comes from the behavioral effect. New in the behavioral effect are the elasticities ¯ηy and ¯ηh. Both measure the response of labor demand to changes in the net-of-tax rate. The elasticity ¯ηy measures the response of labor demand to changes in the net-of-tax rate through changes in pre-tax income. Similarly, ¯ηh measures the response of labor demand to changes in the net-of-tax rate through changes in hours worked. Both 39 ¯ηy and ¯ηh combined represent the total response of equilibrium probability of employment to changes in the top marginal tax rate. These responses are not the same as the direct behavioral elasticity, ¯y, but rather are a general equilibrium response. A change in ηy(θ) is the same for all type θ workers (and similarly for ηh(θ)). When yθ decreases due to an increase in the marginal tax rate, firm profits increase which causes the firm to expand the number of vacancies posted. With more vacancies being posted, the probability of employment increases for all type θ workers. The opposite effect occurs for hθ. Profits decrease when hθ decreases because each match produces less product. Firms respond by posting fewer vacancies. The Pareto parameter a is ym/(ym − yθ∗) and is between 1.5 and 2 for the US (Atkinson et al., 2011; Saez, 2001). This measures the thickness of the right-tail of the income dis- tribution with lower numbers corresponding to a thicker right tail (or more income in the tail). Neither labor demand elasticities ¯ηy and ¯ηh are weighted by the Pareto parameter. The reason is that even though these two elasticities are general equilibrium responses, their effects are localized to workers above the cutoff productivity. Localization comes from the assumption of perfectly segmented labor markets by productivity.10 Because a > 1, the ETI has more weight than the employment elasticities in determining the optimal marginal rate. The impact of the labor demand elasticities is somewhat ambiguous but there are rea- sons to think that changes in pre-tax income changes labor demand more than changes in hours worked. First, in the model labor income is hours worked multiplied by the wage. Any change in hours worked will also be reflected by a change in income. Therefore, any change in the net-of-tax rate will have at least as large effect on income as it will on hours worked. Second, changes in hours worked will probably not change the employment prob- ability because product is not easily observed by the firm. For many high income workers, their marginal product is most likely not easily observable by the firm so small changes in hours worked will not affected how many vacancies are posted by the firm. Putting both 10Contrast this to Ales et al. (2017) who have a general equilibrium span-of-control effect that is weighted by the Pareto paremeter because span-of-control effect extends to all workers. 40 ηy(θ) = − ηh(θ) = θhθ − yθ (cid:18)1 − γ (cid:18)1 − γ (cid:18)1 − γ (cid:19) yθy (cid:19) θhθh (cid:19) θhθh − yθy θhθ − yθ γ γ (2.36) (2.37) (2.38) reasons together means the optimal top marginal rate is higher when there is labor demand effects from search. If we make a few assumptions about how incomes relate to output, then we can pin down the value for ¯ηy + ¯ηh. To do this, note that the elasticities have a simple form, Which combined give ¯ηy + ¯ηh = γ θhθ − yθ Assume, as I will do in the simulations of the three-type mechanism, that γ = 0.5. Then let z = yθ/θhθ be the share of output that is paid to workers above the cutoff θ and that this ratio is constant across types above the cutoff. This allows for a simplification of the sum of the elasticities to ¯ηy + ¯ηh = ¯h − z¯y 1 − z (2.39) One remaining issue is that ¯h is unknown. However, since y = wh, the response of hours worked cannot be larger than the response of pre-tax income. Table 2.1 reports calculates the marginal tax rate as in theorem 2.4.1. The table includes different values of both z and the relationship between ¯h and ¯y. When ¯h is large relative to ¯y then hours worked is more responsive than wages to a change in the tax rate. The opposite is true when ¯h is small relative to ¯y. The two tax rates show the two ends of the interval of plausible optimal tax rates for a given ETI and income share of output. Table 2.1 shows a few patterns for the optimal tax rate. First, when there is a higher income share of output, the optimal tax rate is higher. This comes from the ETI having a greater influence over the general equilibrium response to higher taxes. When every worker is making less due to a higher marginal tax rate, the probability of employment will increase, not only raising revenue but also redistributing welfare to the unemployed. Second, the tax 41 z = 0.5 ¯h = 0.8 × y ¯h = 0.2 × y z = 0.7 ¯h = 0.8 × y ¯h = 0.2 × y ¯y 0.1 0.2 0.3 0.4 0.79 0.66 0.56 0.49 0.88 0.78 0.70 0.64 0.81 0.68 0.59 0.52 0.97 0.94 0.91 0.88 Saez Rule 0.83 0.71 0.63 0.56 Table 2.1: Top marginal tax rates that vary with the elasticities. The income share of output is z = yθ/θhθ and is assumed to be constant over θ above the threshold. The table assumes that a = 2.0 and g = 0. Both of these assumptions are well within the standard range of acceptable values for both parameters. rate is higher when hours worked has a small response relative to the ETI. Again, when hours worked is highly responsive, increasing the marginal tax rate will decrease the probability of employment. In general, most of the optimal tax rates are higher than the implied rate of Saez (2001). 2.5 Three Type Second Best Equilibrium In the second best, the government is unable to distinguish between types. Hidden type imposes two constraints on the government. First, within types that search, the allocation designated for that type should maximize expected utility. This is the incentive compatibil- ity constraint that restricts redistribution from high skilled to low skilled workers. Without incentive compability, the mechanism is not implementable in a decentralized economy. Sec- ond, the government cannot observe search behavior. Therefore, the unemployment benefit is the same for non-searchers and unsuccessful searchers, bS = bN = b. The government wishes to redistribute in two ways. There is redistribution from high to low skilled workers and from searchers to non-searchers. As we will see, these two ways to redistribute are connected. Increasing the marginal tax rate redistributes from searchers to non-searchers by increasing the probability of employment. But increasing the marginal tax rate increasing the average tax rate for higher types which redistributes from high types to low types. For the rest of the section, assume that two types are searching and one type is not. 42 (When we get to section 2.7, the mechanism will determine the types that search.) The three types are denoted by θ3, θ2, θ1 with θ3 > θ2 > θ1. Subscripts on other variables will denote the type. In order to characterize the incentive compatibility constraint, we need to determine the expected surplus from mimicking another type. Define the expected surplus of type i from mimicking type j by i = (yj − T (yj) − v(hj) − b)Pi(yj, hj) Sj (2.40) We can simplify the expected mimicking surplus further by noting that ex-post surplus is (yj − T (yj) − v(hj) − b) = Sj/Pj(yj, hj). Then Sj i = Sj Pi(yj, hj) Pj(yj, hj) The incentive compatibility constraint has the form Si ≥ Sj i (2.41) (2.42) When the mimicked type j is not searching, Sj i = d. With mimicking surplus of non- searchers pinned down, we know that if a lower type is searching then all higher types must also search. Otherwise, the incentive compatibility constraint for the higher type would not hold. Therefore, only four incentive compatibility constraints are active. The benchmark model is the Mirrlees model; which is discussed in the next subsection. After showing the Mirrlees model, I will derive the main analytical and numerical results of my current model. 2.5.1 Discrete Mirrlees Model One goal of this paper is to compare the optimal tax function without search to the optimal tax function that incorporates search. That is, how does the government set tax rates when there is voluntary and involuntary unemployment. An advantage that this paper’s model has compared to Hungerbuhler et al. (2006) is that intensive margin labor supply is included. 43 This addition allows for the distinction between the optimal tax function with search and optimal tax function without intensive margin labor supply. This subsection will cover the two-type Mirrlees model where there is only an intensive margin of labor supply. Then, any differences between the results in this subsection versus the next subsection are from the addition of search into the model. Define the type i worker’s utility as Uθ(yi, hi) = yi − T (yi) − v(hi) (2.43) Since there is no search, and thus no bargaining, every worker is paid their marginal product, θi. Hence, yi = θihi. Then we can remove hi and see utility as (cid:18) yi (cid:19) Uθ(yi) = yi − T (yi) − v θi Notice that the optimal labor leisure choice occurs when 1 − T(cid:48)(yi) = v(cid:48)(hi) θi (2.44) (2.45) This equation will be useful for characterizing the marginal tax rate implied by the solution to the government’s problem. Since T (·) has infinite dimension, we want to remove it from the utility function using post-tax income. The utility function using pre- and post-tax (cid:18) yi (cid:19) income is Uθ(yi, ci) = ci − v (2.46) where ci = yi − T (yi) is the post-tax income. Here, the government can observe pre-tax income and it sets the tax function which makes post-tax income also observable. θi The government uses a direct mechanism to determine the optimal tax function (with θ2 being the higher type) y :{θ1, θ2} → R+ c :{θ1, θ2} → R+ 44 The incentive compatibility constraint is (cid:18) y2 θ2 (cid:19) ≥ c1 − v (cid:18) y1 (cid:19) θ2 (2.47) c2 − v which states that the high type does not wish to mimic the low type. The government’s budget constraint is (y1 − c1)f1 + (y2 − c2)f2 ≥ R (2.48) where R is the revenue requirement and fi is the density of type i. The government’s problem is to maximize social welfare with Φ(Ui) representing the increasing and concave social utility function. Constraining the government’s decision is the incentive compatibility and budget constraints. The government’s problem is (cid:18) y2 (cid:19)(cid:19) θ2 Φ (cid:18) c1 − v (cid:19)(cid:19) (cid:18) c2 − v (cid:19) (cid:18)y1 (y1 − c1)f1 + (y2 − c2)f2 ≥ R c2 − v (cid:18) y1 (cid:19) (cid:18) y2 ≥ c1 − v + Φ θ1 θ2 θ2 max y1,c1,y2,c2 s.t. (2.49) The main analytical result of the model is the well known no distortion at the top result. v(cid:48)(h2) θ2 = 1 =⇒ T(cid:48)(y2) = 0 from (2.45) (2.50) The intuition is that the government receives no additional revenue from increasing the marginal tax rate of the high type but a positive marginal tax rate distorts labor supply. Even though the top marginal tax rate is zero, the average tax rate of the high type is not zero and will be larger than the low type’s average tax rate. As long as Φ(·) is concave, the government will redistribute income from the high type to the low type. 45 2.5.2 Top Two Types Search The government’s full problem, with all possible incentive compatibility constraints, is WII = max y2,y3,h2,h3,S2,S3,b s.t. 3 Φ(d + b)f (θ1) + Φ(S2 + b)f (θ2) + Φ(S3 + b)f (θ3) (P2(y2 − v(h2)) − S2)f (θ2) + (P3(y3 − v(h3)) − S3)f (θ3) = R + b S3 ≥ S2 S3 ≥ d S2 ≥ S3 S2 ≥ d d ≥ S2 d ≥ S3 2 1 1 In order to reduce the number of constraints, we must make an assumption about the relationship between type and the cost of opening a vacancy. Assumption 2.5.1. The ratio of productivity to vacancy cost is increasing in type. That is, θi/κθi > θj/κθj for i > j. With productivity increasing faster than vacancy cost, the probability of employment is increasing with type, holding hours worked and pre-tax income constant. With this 1. If S3 = d, then the other two incentive compatibility constraints If S2 = d then d ≥ S2 1 must be true when assumption 2.5.1 assumption, S3 > S3 2 > S3 are automatically satisfied. holds. Similarly, d ≥ S3 ignored. Furthermore, if S3 ≥ S2 1 must also be true. Therefore, the last two constraints can be 3 and S2 ≥ d hold then S3 ≥ d also holds. S3 ≥ S2 3 ≥ S2 ≥ d (2.51) Finally, note that S3 = S2 increases b will increase welfare.11 11This result hinges on a concave social utility function, Φ(cid:48)(cid:48)(·) < 0. 3 will hold because any revenue-neutral reduction in S3 that 46 The upward IC constraint for type 2 is of interest because the condition for it to be redundant will appear in the continuous type case as well. Since S3 = S2 3, we get y3 − T (y3) − v(h3) − b y2 − T (y2) − v(h2) − b = (cid:19)(1−γ)/γ (cid:18) θ3h2 − y2 θ3h3 − y3 (2.52) (2.53) (2.54) Then S2 ≥ S3 2 along with the previous equation implies ≥ θ3h2 − y2 θ3h3 − y3 θ2h2 − y2 θ2h3 − y3 Rearranged gives the following condition, y3 h3 ≥ y2 h2 That is, the implied wage must be weakly increasing in type. For the rest of the analysis, I will assume that this holds but check after solving the government’s problem numerically. I can use the simplification of the IC constraints so that the expected mimicking surplus is a function of the mimicked type’s surplus and a ratio of probabilities of employment. The middle type’s surplus is pinned down at S2 = d; any relaxation of the high-type’s incentive compatibility constraint comes from a change in employment probabilities. The government’s problem, after simplification, is WII = max y2,y3,h2,h3,S2,S3,b s.t. Φ(d + b)f (θ1) + Φ(S2 + b)f (θ2) + Φ(S3 + b)f (θ3) (P2(y2 − v(h2)) − S2)f (θ2) + (P3(y3 − v(h3)) − S3)f (θ3) = R + b P 2 3 P2 S3 = S2 S2 ≥ d As in the previous subsection, λ is the multiplier on the government budget constraint, µ2 is the multiplier on type 2’s remaining incentive compatibility constraint, and µ3 is the multiplier on the high type’s incentive compatibility constraint. The first-order conditions 47 ∂y3 = λf (θ3) (cid:19) (cid:19) (y3 − v(h3)) + P3 (y3 − v(h3)) − P3v(cid:48)(h3) = λf (θ3) = (Φ(cid:48)(S3 + b) − λ)f (θ3) − µ3 = 0 (cid:19) = 0 = 0 (cid:32) ∂h3 (cid:18) ∂P3 (cid:18) ∂P3 (cid:18) ∂P2 (cid:18) ∂P2 ∂y2 = λf (θ2) (y2 − v(h2)) + P2 (cid:19) (y2 − v(h2)) − P2v(cid:48)(h2) + µ3S2 (P2)2 ∂P 2 3 ∂y2 − µ3S2 (P2)2 P 2 3 P2 − ∂P2 (cid:32) ∂y2 P2 − ∂P2 ∂h2 ∂P 2 3 ∂h2 = 0 (cid:33) ∂h2 = λf (θ2) = (Φ(cid:48)(S2 + b) − λ)f (θ2) − µ2 = 0 = Φ(cid:48)(d + b)f (θ1) + Φ(cid:48)(S2 + b)f (θ2) + Φ(cid:48)(S3 + b)f (θ3) − λ − µ3P 2 3 = 0 are ∂LII ∂y3 ∂LII ∂h3 ∂LII ∂S3 ∂LII ∂y2 ∂LII ∂h2 ∂LII ∂S2 ∂LII ∂b Immediately from the first-order conditions above, we get the following result. Theorem 2.5.1. Given assumption 2.5.1, the marginal tax rate for type 3 is zero. Proof. From the first-order condition on y3, we have Which reduces to ∂P3 ∂y3 y3 P3 − ∂P3 ∂y3 v(h3) P3 = −1 y3 = γθ3h3 + (1 − γ)v(h3) Then, from the first-order condition on h3, we get (cid:18)1 − γ (cid:19) θ3y3 − θ3h3 − y3 (cid:18)1 − γ γ (cid:19) θ3v(h3) θ3h3 − y3 γ Plugging in for y3, we get v(cid:48)(h3) θ3 = 1 =⇒ T(cid:48)(y3) = 0 − v(cid:48)(h3) = 0 (cid:33) P 2 3 = 0 (2.59) (2.55) (2.56) (2.57) (2.58) (2.60) (2.61) (2.62) (2.63) (2.64) (2.65) Using the government’s first-order condition on y2, we can determine that the government is using the tax system to change the probability of employment. 48 Theorem 2.5.2. Given assumption 2.5.1, the probability that a type 2 individual will be employed is greater in the second-best. (cid:18) λf (θ2)P2 1 − Proof. The first-order condition on y2 is (cid:18)1 − γ γ (cid:19) y2 − v(h2) θ2h2 − y2 (cid:19) (cid:18)1 − γ (cid:19)(cid:32) γ − µ3 S2 P2 P 2 3 θ2h2 − y2 − (cid:33) = 0 (2.66) P 2 3 θ3h2 − y2 (cid:19) This equation can be reduced to (1 − γ)v(h2) + γθ2h2 − y2 = (cid:18)Φ(cid:48)(S3 + b) λ (cid:19) S2P 2 3 (P2)2 − 1 (cid:18) 1 − (θ3 − θ2)h2 θ3h2 − y2 > 0 (2.67) Therefore, y2 < (1 − γ)v(h2) + γθ2h2 now but in the first-best the strict inequality was an equality. The hours worked is higher given the income. Since the probability of employment is determined by the difference θ2h2 − y2, that h2 is larger than the corresponding y2 means that the difference has increased and so the probability of employment has increased. Maintain type 3’s incentive compatibility constraint means the government decreases the expected ex-post surplus of the type 2 worker. But by decreasing the ex-post surplus, the government must increase the probability of employment for the type 2 searcher in order to keep ex-ante surplus constant. When the government increases the probability that a type 2 worker becomes employed, the direct effect is that the incentive compatibility constraint is relaxed. 2.5.3 A Numerical Example To illustrate the results above, consider an example with the following parameters. Then the first-best and second-best allocations were calculated. The results are There are three important points to make with the results in table 2.3. First, even a very small marginal tax can increase employment probability. In the example, a 6% increase in the marginal tax rate caused a 3% increase in the probability of employment. Second, when the government cannot observe type, the unemployment benefit must be reduced in order to the middle 49 Parameter d (θ1, θ2, θ3) Value 0.2 (1,2,5) f (θ) (0.1,0.8,0.1) A γ 1 0.5 (κ1, κ2, κ3) (1,2,4) R v(h) 0 h2 Table 2.2: Three type model parameter values. T(cid:48)(y3) T(cid:48)(y2) P3 P2 b P 2 3 /P2 First-Best Second-Best with Search 0 0 0.78 0.250 0.16 3.50 0 0.06 0.78 0.279 0.12 3.02 Table 2.3: Equilibrium tax and transfer system. type to supply effort. There is, effectively, a second informational constraint that reduces redistribution from searchers to non-searchers. Third, the last row shows that the incentive compatibility constraint is being relaxed by the government. By decreasing pre-tax income and hours worked, the government is able to increase the probability of employment for type 2 workers and lower the rent paid to the type 3 workers. 2.6 Constrained First Best Equilibrium While the first best equilibrium is not interesting from a taxation perspective, solving the first best equilibrium sheds light on how the optimal tax function operates on the economy. We can look at the first-order conditions of both the first best and second best equilibria to see how they differ. Any differences are then attributable to the presence of the government’s tax system. Here in the first best, the government can observe an individual’s type and search behavior. Therefore, taxes will not be distortionary. However, the equilibrium is constrained due to the presence of inefficiencies caused by individuals searching for labor. 50 While the government can perfectly observe search behavior, the Hosios (1990) condition does not hold when the government redistributes income. Assume that the government’s social welfare function, denoted by W , contains the aver- sion to inequality function Φ that is strictly concave. The concavity assumes that the gov- ernment wants to redistribute from high types to low types. The government wishes to redistribute more as Φ becomes increasingly concave. Concavity of the social welfare func- tion also determines the direction of the binding incentive compatibility constraints in the second-best problem. With a concave social welfare function, the downward incentive com- patibility constraints are binding. Define Yθ(yθ, hθ) = yθPθ(yθ, hθ) as the expected pre-tax income of a searcher. This expected income formulation will be useful for deriving the government’s budget constraint. The government has perfect information in the first-best so search activity is perfectly observed. Therefore, unemployment benefits depend on whether an individual has searched or not. Let bS denote the unemployment benefit if the individual searched but was unsuc- cessful and let bN denote the unemployment benefit for a non-searcher. Expected utility from searching is Pθ(yθ − T (yθ) − v(hθ)) + (1 − Pθ)bS = Sθ + bS (2.68) The government’s objective function sums up the utility of the individuals who do not search and the utility of those who do search. For each type, utility is weighted by the Φ function. The social welfare function is W = Φ(d + bN )F (θc) + (cid:90) ¯θ θc Φ(Sθ + bS)f (θ)dθ (2.69) Only four quantities in W are manipulable by the government; the surplus, the two unem- ployment benefits, and the search decision cutoff. The government is trying to increase the expected surplus of all searchers. In order to simplify the government budget constraint, define the worker’s expected gross 51 income as Yθ(yθ, hθ) = yθA1/γ (cid:18)θhθ − yθ (cid:19)(1−γ)/γ κθ (2.70) The government budget constraint, with revenues on the left-hand side and expenditures on the right-hand side, is (cid:90) ¯θ θc Pθ(yθ, hθ)T (yθ)f (θ)dθ = R + bN F (θc) + (cid:90) ¯θ (1 − Pθ(yθ, hθ))bSf (θ)dθ θc (2.71) The integral on the right-hand side counts all of the people who searched but failed. The government also has an exogenous revenue requirement that is not used for funding the unemployment benefit, denoted by R. We can simplify the government budget constraint by using the definition of a worker’s expected surplus Which yields the government budget constraint Pθ(yθ, hθ)T (yθ) = Yθ(yθ, hθ) − Pθ(yθ, hθ)v(hθ) − Pθ(yθ, hθ)bS − Sθ (cid:90) ¯θ (Yθ(yθ, hθ) − Pθ(yθ, hθ)v(hθ) − Sθ − bS)f (θ)dθ = R + bN F (θc) θc (2.72) (2.73) The middle term in the integral, Pθ(yθ, hθ)v(hθ), is not in the model of Hungerbuhler et al. (2006). This term shows that successful searching entails a fixed utility cost to the success- ful searcher that changes with θ. By decreasing the pre-tax income, the government has increased the probability that a searcher incurs this fixed cost. In the first-best equilibrium, the government can perfectly observe the type of every individual as well as the search decision. No incentive compatibility constraints are needed. Instead of directly choosing the tax function, the government choses expected surplus, pre- tax income, unemployment benefit, and hours worked and these quantities implicitly defines the tax function. The government’s problem is max yθ,hθ,θc,Sθ,bS ,bN s.t. θc Φ(d + bN )F (θc) + (cid:90) ¯θ (Pθ(yθ, hθ)(yθ − v(hθ)) − Sθ − bS)f (θ)dθ = R + bN F (θc) Φ(Sθ + bS)f (θ)dθ (2.74) (2.75) (cid:90) ¯θ θc 52 Define λ as the multiplier on the government budget constraint. The government can per- fectly observe the type of each individual so the choice of wages, hours, surplus, and search decision is specific for each type. The first-order conditions are (where L is the Lagrangian) ∂L ∂yθ ∂L ∂hθ = λxθ = λxθ (cid:18) ∂Yθ − ∀θ f (θ) = 0 ∂yθ v(hθ) f (θ) = 0 (cid:19) − ∂Pθ (cid:19)(cid:19) ∂yθ v(hθ) + Pθv(cid:48)(hθ) (cid:18) ∂Yθ (cid:18) ∂Pθ ∂hθ = (xθΦ(cid:48)(Sθ + bS) − λxθ)f (θ) = 0 (cid:90) ¯θ (cid:90) ¯θ (1 − xθ)(Φ(cid:48)(d + bN ) − λ)f (θ)dθ = 0 xθ(Φ(cid:48)(Sθ + bS) − λ)f (θ)dθ = 0 ∂hθ = θ ∀θ ∂L ∂Sθ ∂L ∂bS ∂L ∂bN = θ ∀θ (2.76) (2.77) (2.78) (2.79) (2.80) ∂L ∂θc = (Φ(d + bN ) + λbN − Φ(Sθc + bS) + λ(Pθc(yθc, hθc)(yθc − v(hθc))− Sθc − bS))f (θc) = 0 (2.81) where xθ is an indicator function equaling one when θ ≥ θc (when the individual is searching). From (2.78), (2.79), and (2.80) we get λ = Φ(cid:48)(Sθ + bS) = Φ(cid:48)(d + bN ) which implies Sθ + bS = d + bN for all searchers (2.82) (2.83) The government is able to equalize expected utilities across all types and between searchers and non-searchers. Due to the expected surplus plus the unemployment benefit for searchers is equal to the expected utility (not just surplus) of the searcher. Using the fact that at the first-best optimum, surplus equals the utility gain from not searching, we can simplify the search decision in (2.81) to Pθc(yθc, hθc)(yθc − v(hθc)) = d (2.84) This equation defines the constrained first-best cutoff type, denoted by θF B c . The cutoff type is indifferent between searching and not searching. In equilibrium, the difference between bN 53 and bS is the expected surplus minus the utility gained from not searching. If the expected surplus from searching is greater than the utility gain from not searching then non-searchers are rewarded by having a larger unemployment benefit, bN > bS By adding an intensive margin, the government must take into account the utility loss from being successful searching. Increasing the income to hours worked ratio (that is, the wage) has three effects on the searcher. First, the post-tax income increases if the search is successful. Second, the probability of successful search decreases because labor demand has decreased. Third, the probability of incurring the utility cost of exerting effort on the job has decreased. We can simplify the first-order condition for h because and ∂Pθ ∂hθ = A1/γ Y (yθ, hθ) ∂Yθ ∂hθ = (cid:18)1 − γ (cid:19) γ −2+1/γ h θ (cid:18) θhθ − yθ γhθ (cid:19)(1−γ)/γ κθ Thus we get the implicit hours worked equation yθ = γv(cid:48)(hθ)hθ + (1 − γ)v(hθ) (2.85) (2.86) (2.87) Labor supply is set to equal a weighted average of the marginal disutility of labor and the average disutility of labor. The average disutility comes from the fact that higher labor supply translates into higher probability of successful search (if θ > wθ). With higher probability of successful search, there is a higher probability of incurring utility cost v(hθ). The government’s objective is to equalize expected utilities across all individuals. But directly increasing post-tax income for a given type requires reducing pre-tax income through increasing the marginal tax rate. Since expected surplus is constant across types, there is a trade-off between increasing the level of taxation and increasing the marginal tax rate. This trade-off can be seen from solving the bargaining problem of a generic type-θ worker with first-order condition = γ(1 − T(cid:48)(yθ)) − (yθ − T (yθ) − v(hθ) − bS)(1 − γ) ∂Sθ ∂yθ 1 θhθ − yθ = 0 (2.88) 54 And therefore, we get the following expression for post-tax income yθ − T (yθ) = γ 1 − γ (θhθ − yθ)(1 − T(cid:48)(yθ)) + v(hθ) + bS (2.89) The government can change post-tax income through a levels change. But any change in post-tax income must be accompanied by a decrease in marginal taxes as well. For a fixed level of pre and post-tax income and hours worked, the government is trad- ing off between increasing the level of taxes and unemployment benefit and increasing the marginal tax rate. If the unemployment benefit or the level of taxes must be increased then marginal tax rates must be decreased in order to keep incomes and hours constant. The marginal tax rate is determined by first solving for the expected surplus first-order condition with respect to hours worked, = (yθ − T (yθ) − v(hθ) − bS) ∂Sθ ∂hθ (cid:18)1 − γ γ (cid:19) (cid:18) Pθ (cid:19) θ θhθ − yθ − v(cid:48)(hθ)Pθ = 0 (2.90) Combining the two first-order conditions of the bargaining problem, we get T(cid:48)(yθ) = 1 − v(cid:48)(hθ) θ (2.91) which is the same as the in other intensive margin only models. Taking the first-order condition for pre-tax income in the government’s problem, (2.76), we get 1 − which reduces to (cid:18)1 − γ (cid:19) γ (cid:18)1 − γ (cid:19) v(hθ) γ θhθ − yθ yθ θhθ − yθ + = 0 (2.92) Similarly, for the government’s first-order condition on hours worked, (2.77), we get yθ = γθhθ + (1 − γ)v(hθ) (cid:18)1 − γ (cid:19) θyθ − θhθ − yθ γ (cid:18)1 − γ γ (cid:19) θv(hθ) θhθ − yθ − v(cid:48)(hθ) = 0 Using the equilibrium equation of yθ from the first-order condition on yθ, we get v(cid:48)(hθ) θ = 1 55 (2.93) (2.94) (2.95) That is, the marginal tax rate for all types is zero in the first-best. This is not surpris- ing since the government perfectly observes type and can thus use lump sum taxes on all workers. However, the fact that marginal tax rates are zero does not imply that there is no redistribution. In fact, expected utility is perfectly redistributed across all types. 2.7 Decentralizing the Continuous Direct Mechanism In the second best equilibrium, there are a continuum of types. The government cannot observe search behavior or the type of an individual. Not only is there the incentive problem in the Mirrlees model but the government cannot differentiate between failed searchers and non-searchers. Unlike the three-type model, with a continuum, the government must choose the cutoff type that searches, as a part of the direct mechanism.12 This section will also address the problem of implementability of the mechanism in a decentralized economy with taxes. The mechanism outlined below is in a sense too direct. The government can choose the hours worked which is not observable in a decentralized economy. I will show that the government does have sufficient ability to implement the optimal tax mechanism with just an income tax and unemployment benefit. Definition 2.7.1. A direct mechanism is a set of functions y :[θ, ¯θ] → R+ h :[θ, ¯θ] → R+ S :[θ, ¯θ] → R+ that along with the unemployment benefit b and the probability of a successful match P (θ), the tax function is implicitly defined by T (θ) = S(θ)P (θ)−1 + v(hθ) + b − y(θ). The next subsection outlines the Mirrlees (1971) continuous type model. This model, with only an intensive margin of labor supply, presents a useful comparison to the model 12The tax and transfer system that corresponds to the direct mechanism will create the proper incentives to make the cutoff type indfferent between searching and not searching. 56 used in this paper. The zero top marginal tax rate result still holds in the continuous type case, under some added assumptions about the distribution of types. 2.7.1 Continuous Type Mirrlees Model Just like in section 2.5, the Mirrlees (1971) model is used as a comparison. Wages are equal to marginal products and these marginal products are exogenously given. Therefore, utility is Uθ(hθ) = θhθ − T (θhθ) − v(hθ) (2.96) with θ being the marginal product of a type θ worker. Let zθ = θhθ be the pre-tax income and cθ = θhθ − T (θhθ) be the post-tax income. Then the indirect utility function is The government can observe pre- and post-tax income. Now, the government’s objective function is ¯Uθ(zθ, cθ) = cθ − v (cid:16)zθ θ (cid:17) ≡ ¯Uθ (cid:90) ¯θ θ (cid:90) ¯θ (zθ − cθ)f (θ)dθ = R θ (2.97) (2.99) Using the pre-tax and post-tax income simplifies the government’s budget constraint to Φ( ¯Uθ)f (θ)dθ (2.98) where R is the revenue requirement. Finally, the incentive compatibility constraint of the government is dUθ dθ (cid:17) = −v(cid:48)(cid:16) zθ θ θ2 zθ (2.100) Now that the problem has been defines, standard optimal control methods can be used to solve for T (·). Mirrlees (1971) used ¯Uθ as the control variable. Hence, (2.100) represents the law of motion of the state. The main result is the no distortion at the top, zero top marginal 57 tax rate, if either ¯θ < ∞ or f (θ) converges to zero fast enough when θ → ∞. Also, with quasi-linear utility, there is no unemployment in the Mirrlees (1971) model.13 2.7.2 Incentive Compatibility The direct mechanism alone does not imply individuals will truthfully report their type. Therefore, an incentive compatibility constraint is required. Define the following function V (θ, τ ) = (yτ − T (yτ ) − v(hτ ) − b)A1/γ (2.101) (cid:18) θhτ − yτ (cid:19) 1−γ γ κθ V is the expected surplus a type θ individual receives for reporting type τ . For a contract to be incentive compatible, a necessary condition is V (θ, θ) is a maximum over the reported type τ . Therefore, ∂V (θ, θ) ∂τ = 0 Let ˙yτ denote the derivative of y with respect to τ . Then ∂V /∂τ is ∂V (θ, τ ) ∂τ = ( ˙yτ − T(cid:48)(yτ ) ˙yτ − v(cid:48)(hτ ) ˙hτ )Pθ + (yτ − T (yτ )− v(hτ )− b)Pθ (2.102) (cid:18)1 − γ (cid:19)(cid:32) γ θ ˙hτ − ˙yτ θhτ − yτ (cid:33) (2.103) When τ = θ, then the FOCs for Sθ with respect to hθ and yθ set (2.103) to zero. (cid:18) From ∂Sθ ∂yθ : From ∂Sθ ∂hθ : (1 − T(cid:48)(yθ)) − (yθ − T (yθ) − v(hθ) − b) (cid:18) −v(cid:48)(hθ) + (yθ − T (yθ) − v(hθ) − b) 1 θhθ − yθ θ θhθ − yθ (cid:19) (cid:19)(cid:18)1 − γ (cid:19)(cid:18)1 − γ (cid:19) γ γ ˙yτ = 0 ˙hτ = 0 Which confirms that expected surplus is maximized when the worker reports their true type. A more relevant condition is how expected surplus increases with θ. Given the expression for the expected surplus, the law of motion of the surplus (how Sθ changes with θ) should 13There can be voluntary unemployment in the standard Mirrlees model as long as there is utility gained from not working or the utility function has a nonzero second cross-partial derivative. 58 (cid:18)1 − γ (cid:18)1 − γ γ γ ∂V ∂θ = Sθ = Sθ (cid:33) − (θhτ − yτ ) ˙κθ κ2 θ θhτ − yτ (cid:19)(cid:18) κθ (cid:19) 1 (cid:18)1 − γ θ − yθ hθ hτ κθ (cid:19)(cid:32)  if τ = θ (cid:19)(cid:18) hθ − ˙κθ κθ θhθ − yθ − ˙κθ κθ (cid:19) ˙Sθ = ∂Sθ ∂θ = Sθ γ (2.104) (2.105) correspond to the same function that maximizes V (θ, τ ). The first-order condition is Using the envelope theorem, we get the following law of motion of Sθ, Along the path of this law of motion, no worker has an incentive to deviate from their true type. This first-order condition does not encompass the full necessary conditions. An additional second-order condition is also necessary (Salanie, 2005). That condtion is, (cid:18) (cid:19) ∂2V ∂θ∂τ = 1 − γ γ ˙wτ (θ − wτ )2 > 0 (2.106) where wτ = yτ /hτ . This condition reduces to yθ/hθ increasing in θ since all other terms are positive. Both necessary conditions (2.104) and (2.106) are not yet shown to be sufficient condi- tions. The following will show that these two conditions are indeed sufficient for incentive compatibility. For sufficiency, we need to show the following equation holds for all τ , V (θ, θ) − V (θ, τ ) ≥ 0 (2.107) That is, τ = θ is the unique maximizing value for V as a function of τ if (2.104) and (2.106) are satisfied. By the mean value theorem, there exists a ¯τ such that V (θ, θ) − V (θ, τ ) = (θ, ¯τ )(θ − τ ) ∂V ∂τ Because ˙wτ > 0, ∂2V ∂τ is increasing in θ. Since ∂τ ∂θ is positive and ∂V ∂V ∂τ (¯τ , ¯τ ) = 0 is true by assumption. Then ∂V ∂τ (θ, ¯τ ) > 0 59 (2.108) (2.109) (2.110) if θ > ¯τ which implies θ > τ . And so, V (θ, θ) − V (θ, τ ) > 0 when θ > τ . Likewise, ∂V if θ < ¯τ < τ . So V (θ, θ) − V (θ, τ ) > 0 as well. Therefore, sufficiency is proven. ∂τ < 0 Now that the incentive compatibility conditions have been established, we can move onto the government’s problem. 2.7.3 Solving the Mechanism Solving the continuous type problem is similar to solving the three type problem in section 2.5. The structure of both problems is identical; the government maximizes social welfare subject to a budget constraint and an incentive constraint. A single incentive compatibility constraint has been replaced with the local IC constraint from the previous subsection. Now, however, the government can choose which type is indifferent between searching and not. This decision is represented by the choose of θc, the cutoff type. Since expected surplus is increasing in θ, all individuals with productivity above θc will search. The government’s problem is similar to the first-best problem except that a local incentive compatibility con- straint (2.105) is required. max θc,b,yθ,hθ,Sθ s.t. Φ(d + b)F (θc) + Φ(Sθ + b)f (θ)dθ (cid:90) ¯θ (Yθ(yθ, hθ) − Pθ(yθ, hθ)v(hθ) − Sθ)f (θ)dθ − R − b = 0 1 − γ γ (cid:18) hθ θc ˙Sθ = (2.111) (cid:90) ¯θ θc (cid:19) θhθ − yθ − ˙κθ κθ Sθ There is a suppressed third constraint that ˙wθ > 0, which implies that ˙yθ is also increasing (that monotonicity holds).14 The integral constraint is similar to the first best government budget constraint but now bS = bN = b. Let λ be the multiplier on the government budget constraint and q(θ) be the costate variable. The state variables in this problem are Sθ and q(θ). Knowing both of these 14This condition is traditionally left out of the problem. 60 variables for all θ allows us to calculate every other variable of interest. The Hamiltonian is H(λ, b, θc) =(Φ(Sθ + b) + λ(Yθ(yθ, hθ) − Pθ(yθ, hθ)v(hθ) − Sθ))f (θ)− 1 − γ γ q(θ) Sθ (cid:19) − ˙κθ κθ (cid:18) hθ θhθ − yθ (cid:19) f (θ) − q(θ)Sθ (cid:19)(cid:19) v(hθ) f (θ) + q(θ)Sθ The first-order conditions are: (cid:18)∂Yθ (cid:18) ∂Pθ λ ∂hθ − ∂Pθ ∂yθ ∂yθ v(hθ) + Pθv(cid:48)(hθ) (cid:18) ∂Yθ ∂hθ λ − (2.112) (2.113) = 0 (2.114) (1 − γ)hθ (cid:19) γ(θhθ − yθ)2 = 0 (cid:18) (1 − γ)yθ γ(θhθ − yθ)2 When the probability of being employed decreases then searchers are less likely to incur the disutility from intensive labor supply. The first-order condition on the state variable Sθ is − ˙q(θ) = (Φ(cid:48)(Sθ + b) − λ)f (θ) + q(θ) ˙Sθ Sθ There are two transversality conditions q(¯θ) = 0 q(θc)(Sθc − d) = 0 (2.115) (2.116) (2.117) The first-order condition (2.115) is difficult to use without simplification. This equation is a first-order linear differential equation. To simplify the system, define Qθ = q(θ)Sθ. Notice that in (2.113) and (2.114), q(θ)Sθ is in only one term. Therefore, solving for Qθ will help solve both first-order conditions. −( ˙q(θ)Sθ + q(θ) ˙Sθ) = (Φ(cid:48)(Sθ + b) − λ)Sθf (θ) The left-hand side is equal to ˙Qθ so we can get − ˙Qθ = (Φ(cid:48)(Sθ + b) − λ)Sθf (θ) From (2.116), we can integrate to get (cid:90) θ θ −Qθ = (Φ(cid:48)(St + b) − λ)Stf (t)dt 61 (2.118) (2.119) (2.120) Apply the transversality condition (Q¯θ = 0) to get (Φ(cid:48)(St + b) − λ)Stf (t)dt = − (cid:90) ¯θ (Φ(cid:48)(St + b) − λ)Stf (t)dt (cid:90) θ θ Which implies that Qθ = (cid:18) ∂Yθ ∂yθ (cid:19) − ∂Pθ ∂yθ v(hθ) f (θ) = θ (cid:90) ¯θ (Φ(cid:48)(St + b) − λ)Stf (t)dt θ (cid:90) ¯θ (cid:18) 1 − Φ(cid:48)(St + b) (cid:19) λ 1 − γ γ(θhθ − yθ)2 θ Then we can apply the form of Qθ to the first-order condition (2.113) to get Stf (t)dt (2.123) The final two necessary conditions define the equilibrium are for b and θc. Unlike the pre- vious first-order conditions, these two variables are not functions of θ. For the unemployment benefit, = Φ(cid:48)(d + b)F (θc) + ∂H ∂b And the cutoff on searching (cid:90) ¯θ θc Φ(cid:48)(Sθ + b)f (θ)dθ = λ (2.121) (2.122) (2.124) (2.125) H(θc) ≥ Φ(d + b)f (θc) This first-order condition is not standard but it is associated with the well known optimal starting time problems. Here, the solution derivation method for the cutoff is described in Leonard and Long (1992). 2.7.4 Implementing the Mechanism One problem the government can encounter is that there are too few tax instruments to im- plement the direct mechanism in (2.111). At issue is whether the government can sufficiently change h using only {T (·), b} in order to have hours worked in the decentralized equilibrium equal the hours worked in the mechanism. The answer is yes, the government can implement the direct mechanism using {T (·), b}. Lemma 2.7.1. The incentive compatible direct mechanism in (2.111) can be implemented in a decentralized economy without a tax (or subsidy) on h. 62 Proof. Define the deviation expected surplus with a tax on h as ˜V (θ, τ ) = (yτ − T (yτ ) − v(hτ ) − ˜T (hτ ) − b)A1/γ with ˜T (·) being the new tax on h. The expected surplus is ˜Sθ = (yθ − T (yθ) − v(hθ) − ˜T (hθ) − b)A1/γ (cid:19) 1−γ γ (cid:19) 1−γ γ (cid:18) θhτ − yτ (cid:18) θhθ − yθ κθ κθ (2.126) (2.127) In the decentralized equilibrium, workers deviation surplus is now ˜V (θ, τ ) instead of V (θ, τ ). Deviation surplus must be equal to the mechanism’s law of motion for the state variable Sθ. Hence, ∂ ˜V (θ, θ) ∂θ = ˜Sθ (cid:18)1 − γ (cid:19)(cid:18) hθ γ θhθ − yθ − ˙κθ κθ (cid:19) = ˙Sθ (2.128) But the second equality only holds when ˜Sθ = Sθ which corresponds to ˜T (h) = 0 for all h. The intuition is that with a secret tax on hours worked, the mechanism’s incentive com- patible allocation deviates from the expected surplus maximizing decision that the worker would make. Therefore, in the presence of a tax (subsidy) on hours worked, the worker wants to hour fewer (more) hours than the mechanism allocates. Another reason to believe that the government can implement the mechanism in a decen- tralized economy is that any necessary tax on hours worked would have to be budget neutral. To see why, note that the government budget constraint in the mechanism is identical to the government budget constraint in the decentralized economy. If the tax on hours worked is revenue generating then the mechanism revenue is less than government expenses. As a result, the mechanism problem is not solved by solution to the mechanism but this violates the sufficiency conditions. That is, the problem is set up to satisfy the sufficiency conditions of a Hamiltonian problem but solution found would not be a solution to Hamiltonian prob- lem, a contradiction. If, on the other hand, the tax on hours worked was revenue losing then the government budget constraint would be slack and λ = 0. In the next subsection, I show the first-order conditions. Using these conditions shows that the optimal yθ and hθ are zero 63 for all types. Every worker is unemployed because the probability of employment is zero. The Hamiltonian has no unique solution, which also violates the sufficiency conditions. 2.7.5 Analytical Properties of the Tax Function When the government attempts to redistribute income, it has a few levers to choose from. First, there is the simple change in average taxes as in the Mirrlees (1971) model. Second, the unemployment benefit can be increased to redistribute from successful searchers to un- successful searchers and non-searchers. Third, the government can use marginal taxes to directly change the probability of employment for each type that searches. Finally, the gov- ernment can adjust the cutoff productivity. The major change between perfect information and imperfection information cases is that increasing non-searcher utility, by increasing b, comes at the cost of decreasing searchers’ expected surplus Sθ. The cutoff productivity has potential to redistribute by increasing the number of individ- uals searching. If the government decreases θc to θ(cid:48) in [θ(cid:48) c, θc] will increase their utility (Sθ > d). Total surplus increases for searchers but there c then every individual with productivity is some conflict with other levers. The government budget constraint has become greater than zero because the government is generating more revenue. However, surplus and the unemployment benefit must also increase to maintain budget balance. Therefore, unlike models with only an intensive margin, this model allows for more instruments for the gov- ernment to use but there can be conflict between the different instruments even though they are attempting to complete the same objective. Even though we have added search to the Mirrlees (1971) framework, there still is no distortion at the top Theorem 2.7.1. For individuals with θ = ¯θ, pre-tax income, intensive labor supply, and search success probability are equal to the first best. 64 Proof. From (2.116), Thus, (2.113) is reduced to λ and (2.114) is (cid:18) ∂Yθ ∂hθ λ − (cid:18) ∂Yθ (cid:18) ∂Pθ ∂yθ ∂hθ Q¯θ = 0 (2.129) (cid:19) v(hθ) f (θ) = 0 (2.130) − ∂Pθ ∂yθ (cid:19)(cid:19) f (θ) = 0 (2.131) v(hθ) + Pθv(cid:48)(hθ) which are the same as the first-best first-order conditions on pre-tax income and hours worked for type ¯θ. The standard intuition holds for the current model. Having a non-zero marginal tax rate decreases utility for top type with no revenue generated. Therefore, the marginal tax rate is zero causing wages and hours to be undistorted in equilibrium. A more important result is whether individuals with θ < ¯θ have a different probability of employment. The government can change pre-tax income and hours in order to increase the number of individuals who are successful at searching (increasing extensive margin em- ployment). Theorem 2.7.2. For individuals with θ ∈ [θc, ¯θ), the probability of employment Pθ(yθ, hθ) larger than in the first best. Proof. We want to show that Qθ < 0 for θ ∈ [θc, ¯θ). For simplicity of notation, define, Then the first-order condition on b is equal to Φ(cid:48)(θ) = Φ(cid:48)(max[d, Sθ] + b) (cid:90) ¯θ (Φ(cid:48)(θ) − λ)f (θ)dθ = 0 There must be a ˆθ such that θ Φ(cid:48)(ˆθ) = λ 65 (2.132) (2.133) (2.134) (cid:90) ¯θ since Φ(cid:48)(θ) is decreasing in θ. And ˆθ > θc must be true. Now, (Φ(cid:48)(t) − λ)Stf (t)dt Qθ = θ For t > ˆθ, Φ(cid:48)(t) < λ from the concavity of Φ(·) and St > Sˆθ Qt < 0 for t > ˆθ. For t < ˆθ, Φ(cid:48)(θ) > λ and St < Sˆθ , then (2.135) . Put together means that Qθ = And Qθ is bounded by θ (cid:90) ¯θ (Φ(cid:48)(t) − λ)Stf (t)dt (cid:90) ¯θ (Φ(cid:48)(t) − λ)f (t)dt (cid:90) ¯θ θ θ Φ(cid:48)(t)f (t)dt (cid:90) ¯θ Qθ < Sˆθ λ(1 − F (θ)) = (2.136) (2.137) (2.138) < 0 (2.139) Combining this with the first-order condition for b to get So the upper bound on Qθ is Sˆθ (cid:33) Φ(cid:48)(t)f (t)dt The bound is less than zero because Φ(cid:48)(·) is positive and decreasing. Φ(cid:48)(t)f (t)dt − (cid:32)(cid:90) ¯θ 1 − F (θ) Qθ < θ θ The first-order conditions on yθ and hθ imply a similar relation. Using (2.113) and that Qθ < 0, we get that Solving for these derivatives gives ∂Yθ ∂yθ − ∂Pθ ∂yθ v(hθ) > 0 γθhθ + (1 − γ)v(hθ) > yθ And for the first-best, γθh∗ θ + (1 − γ)v(h∗ θ) = y∗ θ (2.140) (2.141) (2.142) Therefore, the difference between output per type θ worker minus pre-tax income has in- creased. θhθ − yθ > (1 − γ)(θhθ + v(hθ)) (2.143) The match probability is determined by θhθ − yθ. When this difference is expanded then the match probability increases. 66 This is the main result of the paper and is consistent with previous optimal tax with search models (Hungerbuhler et al., 2006). Although the government has many tools to use in order to redistribute income, the best tool for the government is to use the probability of employment. The reason why increasing the probability of employment is the best redis- tributive tool is that increasing the employment probability entails increasing hours worked relative to pre-tax income. Increasing hours worked decreases the utility of the worker, mak- ing higher types are less inclined to mimic, and increases the value of that type of worker to the firm. The exact same results happen when pre-tax income is decreased. However, by increasing the probability of employment, the government has slackened the incentive com- patibility constraint. By decreasing pre-tax income relative to hours worked, a mimicking higher type receives too low of a probability of employment given the new levels of pre-tax income and hours worked. Increasing employment does not seem to be an artifact of shutting down intensive margin labor supply. With involuntary unemployment, the government has an incentive to increase the probability of employment through the tax code. Theorem 2.7.3. The type that is indifferent between searching and not searching has weakly higher productivity in the second-best equilibrium compared to the first-best equilibrium. That is, θc ≥ θ∗ with strict inequality if θc > θ. Proof. From the first-order condition on θc, we have H(θc) ≥ Φ(d + b)f (θc) =⇒ Y (yθc, hθc) − Pθcv(hθc) − Sθc ≥ − Qθc λf (θc) (cid:18)1 − γ (cid:19)(cid:18) γ (cid:19) (2.144) (2.145) 1 θchθc − yθc − ˙κθc κθc Because Qθ < 0, the right-hand side is always positive. Thus Y (yθc, hθc) − Pθcv(hθc) > Sθc =⇒ Y (y∗ θc ) − P∗ θc , h∗ θc v(h∗ θc ) > Y (yθc, hθc) − Pθcv(hθc) > Sθc = d (2.146) (2.147) 67 and h∗ θc Where y∗ θc first-best cutoff, θ∗, we have Y (yθ∗, hθ∗) − Pθ∗v(hθ∗) = d, so are the first-best values of y and h at the cutoff type θc. But at the Y (y∗ θc , h∗ θc ) − P∗ θc v(h∗ θc ) > Y (yθ∗, hθ∗) − Pθ∗v(hθ∗) which implies that θc > θ∗ if θc > θ (2.148) (2.149) The intuition for why θc is higher than θ∗ is from the fact that Sθ = d at both cutoffs. If pre-tax income and hours worked are lower in the second-best equilibrium then the cutoff type must rise in order to maintain indifference at the cutoff. Theorem 2.7.4. The unemployment benefit is larger than the in-work benefit. That is, b > −T (yθc). Proof. From the proof of theorem 3, we have Yθc > Sθc + Pθcv(hθc) Which implies that Which reduces to yθcPθc > (yθc − T (yθc) − b − v(hθc))Pθc + Pθcv(hθc) b > −T (yθc) (2.150) (2.151) (2.152) 2.8 Conclusion When the government acknowledges its ability to adjust the employment rate through taxation, marginal tax rates tend to rise. Increasing the marginal tax rate decreases the bargaining power of workers which decreases their income relative to their hours worked. 68 While this makes workers worse off, it also increases the firm’s value of a successful match. As matches increase in value, firms post more vacancies and therefore the employment rate increases. The government can generate positive social welfare not just from decreasing average tax rates but also increasing employment. With just intensive labor supply, the government is just balancing the negative effect of decreasing hours worked and the positive effect of decreasing average tax rates. Now, with extensive labor supply, the government can also add the positive effect of increasing employment. A major limitation of the model is the restriction of the search labor markets. Firms can only create vacancies that serve a single skill and each worker can only search in the labor market of their skill. Obviously, this perfect segmentation by skill is not observed in real labor markets. While this is a well known problem in search and tax models, there does not appear to be a solution. Even with this limitation, the model still provides insights into how the government can tax workers and redistribute to those with zero labor supply. A second limitation is that the model does not generalize to a dynamic setting in a clear way. The main issue lies in how unemployment benefits work. Modern unemployment benefit systems work on a replacement rate of previous income. Clearly, the model presented in this chapter does not follow such a system. However, it is not clear that the such a change to the model would change the marginal tax schedule. The most likely change is that it would increase marginal tax rates for high income earners. Since a fix replacement rate for all workers would entail a large amount of expenditure on unemployed high skill workers, the government would want to increase the probability of employment for those workers. This would further reinforce the main results of the chapter, that marginal tax rates can increase employment if bargained wages are responsive to changes in the marginal tax rate. 69 CHAPTER 3 BRAIN DRAIN IN THE US: THE EFFECT OF UNEVEN MIGRATION ON INCOME INEQUALITY 3.1 Introduction In 2010, 2.9% of Miami’s high skill population was a new migrant and 2.1% of the previous year’s low skill population was a new migrant while about even percentage of each skill left the city. Contrary to what we might think about local labor demand, the skill premium increased by 0.17 along with both the 90/10 ratio and the 75/25 ratio.1 Across the US, the average migrant is 6% more likely to be a college educated worker than a worker staying in their local labor market. However, changes in income inequality and gross migration into local labor markets are positively correlated. There are obvious potential confounding factors driving the positive relationship between gross migration and inequality such as differential demand shocks by skill. My paper will look at how changing the composition of internal migrants moving into or out of a local labor market affects local income inequality. I empirically test whether relatively more high skill in-migration or out-migration affects the skill premium. Using data from the 2005-2017 American Community Survey (ACS), I measure the changes to high skill and low skill local labor supply due to migration. Because changes in local income inequality and migration flows might be correlated with changes in labor demand, I use a shift-share instrument to isolate shifts in labor supply. The shift-share instrument creates a predicted amount of migration based on both current and historical migration patterns. Historical flows of migration are unlikely to be correlated with local labor demand shocks in the sample period that would potentially drive observed migration patterns. With the instrument, I find that increasing the amount of high skill workers relative to low skill workers through migration increases local income inequality. 1The skill premium is defined here as the average high skill income divided by the average low skill income. 70 While estimating the effect of migration on income inequality is an empirical exercise, I present a theoretical model in section 3.2 in order to guide the empirical specification and to isolate the fundamental forces connecting income inequality and migration. I model local production using a constant elasticity of substitution (CES) production function where the only inputs are high and low skill workers. High skill workers are intrinsically more productive than low skill workers. The economy is made up of many locations and each location has a representative local firm that uses the CES production function as well as technology that is heterogeneous across locations. Solving for the wages of both high and low skill workers gives an expression for the skill premium. Local relative labor supply and the intrinsic skill of high skill workers determine the skill premium in each location. Using the expression for the skill premium, I derive an equation that maps the change in the skill premium with the amount of migration into and out of a location. The effect of migration into a location depends on the skill distribution of the migrants relative to the skill distribution of the incumbent workers. The empirical specification, in section 3.3, describes a quantity I call the migration pre- mium which captures the effect of migration on labor supply. A measure of the skill distribu- tion of migrants must relate that distribution to the skill distribution of incumbent workers. This is not a matter of controlling for the skill distribution of incumbents but building that skill distribution directly into the migration premium. With the incumbents skill distribu- tion embedded into the migration premium, the in-migration premium measures the shift in the relative labor supply due to migration into a location. When the in-migration premium is equal to one, the relative labor supply curve remains the same as the previous period. When the in-migration premium is above one then the relative labor supply curve is shifted out. I also construct a similar measure for out-migration. The empirical results will focus on the in-migration premium because this measure suffers less from attenuation bias than the out-migration premium. Changes in the skill premium are caused by both shifts in demand and supply. I use 71 a modified version of the Card (2001) immigrant enclave instrument to isolate a shift in labor supply due to migration. The instrument contains two parts. First part is the past number of migrants moving between an origin and a destination location divided by the origin population. This generates a historical share of migration between origin and destination. The second part is the current amount of out-migration from the origin location to all destination locations. When I add up this predicted migration across all possible origin locations, I get a predicted amount of migration for the destination location. By using past share rather than current share, any serial correlation in labor demand shocks should dissipate leaving the instrument uncorrelated with any current local labor demand shock (Goldsmith-Pinkham et al., 2018). The exogeneity of this instrument comes from using data on migration flows from the distant past, 1990 in my paper, to break the serial correlation. I use two datasets (discussed in detail in sections 3.4 and 3.5); a primary dataset for estimation and a secondary dataset that constructs part of the instrument. The primary dataset is the 2005-2017 yearly ACS (Center for Economic and Policy Research, 2019) which is a 1% sample of the US population. The ACS provides detailed, nationally representative information on individual income, education, and migration. I use commuting zones (CZs) as the level of geography since CZs correspond to local labor markets and cover the entire US. The ACS does not report CZ so each individual is placed in both their current CZ and the CZ they lived in the past year. This allows me to calculate both in-migration and out- migration by education (my proxy for skill). The secondary dataset is the 1990 US Census (Ruggles et al., 2019). The Census data is used to construct the instrument’s historical migration shares. All of my estimates are at the CZ level. In section 3.5, I estimate that a one percentage point increase in local relative labor supply due to in-migration causes a 0.42 point increase in the local skill premium.2 The increase in the local skill premium suggests that high skill migrants are complements to high skill incumbent workers. However, the effect of an in-migration premium increase on 2The estimates for out-migration are statistically insignificant. 72 average high skill and low skill incomes reveals that high skill migrants are substitutes to both types of workers. Low skill workers see a larger decrease in average income than high skill workers. I have identified two potential reasons for low skill workers experiencing a decrease in average income. First, I find that relatively less mobile occupations tend to have lower pay. This is potentially due to occupational licensing requirements or the lack of other viable locations for a particular occupation. The correlation between occupational income and relative mobility is stronger for college educated workers. Second, skill requirements for job postings have increased since the beginning of the Great Recession (Hershbein and Kahn, 2018). The upskilling of jobs shifted the frontier between high skill and low skill occupations to lower levels of income. Combine this with my finding that occupations that are relatively immobile are higher paying than relatively mobile jobs. With upskilling and people moving more when they are in low paying occupations means that the high skill migrants are more likely to be in competition with low skill stayers. My paper contributes to two main literatures.3 First, there is the spatial inequality literature (Baum-Snow and Pavan, 2012; Combes et al., 2012; Moretti, 2013; Diamond, 2015; Farrokhi, 2018; Baum-Snow et al., 2018; Farrokhi and Jinkins, 2019). A common theme in this literature is the relationship between city size and wage inequality within cities (Baum- Snow and Pavan, 2012; Combes et al., 2012). Larger cities tend to have higher income inequality. My contribution to this literature is to explicitly estimate the effect of more high skill migration relative to the underlying relative labor supply.4 The second literature looks at the effect of migration on local areas using labor demand shocks Notowidigdo (2011); 3My focus here is on internal migration. Blau and Kahn (2015) survey the foreign mi- gration and income inequality literature. Most, if not all, of the foreign migration literature does not focus on spatial differences in local income inequality. Foreign migration flows are considerably smaller than gross internal migration flows (although net internal flows are an order of magnitude smaller than the gross flows). See Lewis and Peri (2014) for a review of the literature on foreign immigration and its consequences on local economies. 4Diamond (2015); Baum-Snow et al. (2018) look at changes in the overall relative labor supply on income inequality. These changes include the effect of migration as well as the effect of laborforce participation. 73 Yagan (2014); Monras (2015). None of these papers estimates effects on local inequality, although Notowidigdo (2011) does separate out high and low skill workers. Notowidigdo (2011) looks at why low skill workers are less likely to move out of local labor markets than experience negative labor demand shocks. My contribution to this literature is to look at the effect of migration on local inequality using labor supply shocks. Using Chinese data, Xing (2014) shows some evidence that rural to urban migrants change the income distribution. The effect is driven by the fact that rural to urban migrants in China tend to be low skill compared to urban residents. 3.2 Theoretical Model In this section, I develop a regression equation using a constant elasticity of substitution production function with only high and low skill workers as inputs. The two skill types are denoted by i = {l, h} with i = h denoting the high skill type. The country contains C locations with an individual location denoted by c. Each individual supplies one unit of labor inelastically and there is full employment in all locations. Individuals decide how much of the consumption good, xict, to consume and where to work. Firms produce a single output good, yct, that is the same across all locations. The representative firm in each location aggregates both types of labor using a constant elasticity of substitution function. There are no frictions on goods trade within the country, so the price of output is set nationally and normalized to one. Local technology, measured by Ac, is heterogeneous across locations, reflecting that some locations are geographically more conducive for production. One unit of high skill labor generates θ > 1 units of efficiency units of labor. The amount of employed individuals of skill type i in period t is Lict. The production function is F (Llct, Lhct) = Ac (cid:0)Lρ lct + θρLρ hct (cid:1)α/ρ = yct (3.1) where α < 1 means that labor has decreasing returns to scale. Labor is not perfectly substitutable across types. The substitutability parameter, ρ ∈ [0, 1), is restricted so that 74 intrinsic ability has a logically consistent effect on the skill premium. Higher θ does not decrease the skill premium. Local labor markets are perfectly competitive, wages equal marginal products. Equilib- rium labor supplies determine the marginal product in each location. (cid:1) α−ρ (cid:1) α−ρ ρ ρ lct + θρLρ hct lct + θρLρ hct lct (cid:0)Lρ wlct = αAcLρ−1 (cid:0)Lρ whct = αAcθρLρ−1 (cid:18) Llct hct = θρ whct wlct Lhct (cid:19)1−ρ (3.2) (3.3) (3.4) Combining both skill’s wage equations gives the within city skill premium, The skill premium is dependent on the intrinsic differences in skill and on relative labor supplies, regardless of whether the economy is in equilibrium or not. The resulting changes to the skill premium depends on how the equilibrium labor supply ratios change. Migration across locations will determine any changes in the skill premium. The skill premium equation shows how the skill premium responds to differing amounts of migration by skill. The initial labor supply of type i in t is Lict then the change skill premium when migration has occurred is, (cid:32) (cid:33) ∆ whc,t+1 wlc,t+1 = whc,t+1 wlc,t+1 − whct wlct (cid:32) (cid:33)1−ρ (cid:32) Llc,t − M out Llc,t+1 Lhc,t+1 Lhct − M out = θρ = θρ (cid:18) Llct Lhct − θρ (cid:19)1−ρ (cid:33)1−ρ lc,t+1 + M in hc,t+1 + M in lc,t+1 hc,t+1 (3.5) (cid:19)1−ρ (cid:18) Llct Lhct − θρ Where M out ic,t+1 is the total out-migration from c by skill type i between periods t and t + 1. Similarly, M in lc,t+1 is the total in-migration. Let Sic,t+1 = Lict − M out ic,t+1 be the number of workers staying in location c between periods t and t + 1. Then the change in the skill 75 premium is rewritten, by pulling Slc,t+1/Shc,t+1 out, as (cid:32) (cid:33) (cid:32) (cid:33)1−ρ ∆ whc,t+1 wlc,t+1 = θρ Slc,t+1 Shc,t+1 1 + (cid:124) 1 + M in hc,t+1 Shc,t+1 M in lc,t+1 Slc,t+1 (cid:123)(cid:122)  (cid:125) In-Migration Premium ρ−1 (cid:19)1−ρ (cid:18) Llct Lhct − θρ (3.6) The in-migration premium captures the percentage change in the relative labor supply curve due to in-migration. If there was no in-migration or the amount of in-migration had the same skill distribution as the workers who stayed in c then the total change in the skill premium would be a function of the relative number of high skill stayers minus the relative number of original high skill workers. When this ratio is above one then the relative labor supply curve is being shifted out due to more high skill workers migrating into the location. The in-migration premium does not capture the effect of total relative labor supply shifts. Total shifts in labor supply also capture changes to labor force participation. The corresponding measure for out-migration is − θρ ρ−1 (cid:18) L1ct L2ct − θρ (cid:19)1−ρ (cid:18)L1ct L2ct (cid:19)1−ρ (3.7) (3.8) (cid:18) w2c,t+1 (cid:19) w1c,t+1 ∆ = θρ = θρ = θρ L2c,t+1 (cid:18)L1c,t+1 (cid:19)1−ρ − θρ (cid:32)L1c,t − M out L2ct − M out (cid:19)1−ρ (cid:18)R1c,t+1 L2ct 1c,t+1 + M in 2c,t+1 + M in 1c,t+1 (cid:19)1−ρ (cid:18) L1ct (cid:33)1−ρ  1 − M out (cid:124) (cid:125) (cid:123)(cid:122) 2c,t+1 R2c,t+1 1 − M1c,t+1 R1c,t+1 2c,t+1 R2c,t+1 Out-Migration Premium where Ric,t+1 = Lict + M in ic,t+1 is the population of skill i if there was no out-migration. When the out-migration premium is above one then there has been relatively more low skill out migration which shifts the relative labor supply curve out. When both migration premia are above one, the relative labor supply curve is shifting out. There are a number of different economic reasons for migration and these reasons can have different effectiveness across skill. Location c is either more attractive for high skill 76 migrants than for low skill migrants or the increase in income for low skill migrants is too small to overcome low skill worker’s moving costs. An in-migration premium above one in a single location implies an out-migration premium below one in some other location. Some location’s brain gain is another location’s brain drain. The empirical results in section 3.5 will primarily use the in-migration premium measure for because of data limitations. The out-migration premium has some attenuation bias due to how past location is measured, which I detail in section 3.4. 3.3 Model Specification Any regression relating local inequality and migration needs to capture shift in the relative supply curve. As in (3.6), changes in the skill premium depend on the skill distribution of migrants into or out of a CZ and the skill distribution of workers in the CZ. The main independent variable is the in-migration premium. 1 + 1 + M in 2ct S2ct M in 1ct S1ct = Zin ct (3.9) I use Zin ct to denote the in-migration premium. Without any migration into a CZ, changes in the skill premium are a function of the difference between the previous year’s relative population and the relative amount of remaining workers. I also use the out-migration premium for some regressions. 1 − M out 2ct R2ct 1 − M out 1ct R1ct = Zout ct (3.10) In the next section, I discuss why the out-migration premium is less reliable than the in- migration premium. The main specification uses the in-migration premium to capture the effect of in-migration on the change in the skill premium, ∆ wH ct wL ct = α + β1Zin ct + ∆Xct + γt + ct (3.11) 77 Where γt is a year dummy and ct is the error term. Changes in CZ characteristics are captured by the ∆Xct term. The out-migration specification is similar ∆ wH ct wL ct = α + β1Zout ct + ∆Xct + γt + ct (3.12) The year dummies in both specifications are important to include because both the migration premium and, to a lesser extent, the skill premium exhibit procyclical behavior. The coefficient β1 shows whether high skill migrants function like complements or sub- stitutes for high skill incumbent workers. With the definition of (3.9), β1 can be interpreted similarly to an elasticity. A change in the migration premium from 1 to 1.01 translates into 0.01× β1 change in the skill premium. Since the skill premium is defined as a ratio as well, if β1 = −0.5 then a one percent increase in the relative labor supply curve due to in-migration is decreasing high skill income by 0.005 percent of low skill income. Controlling for existing relative labor supply does not accurately capture the relationship between relative labor supply and the relative supply of migrants. Separating the migration premium into the effect of migrants and controlling for the skill distribution of the existing population will not correctly show the change in income inequality. For instance, if the relative supply of migrants is 1.05, meaning that there are 5% more high skill migrants than low skill migrants. If the remaining worker relative labor supply is 1.00 then the relative labor supply curve is shifting out due to migration. But if the remaining relative labor supply curve is 1.20 then the relative labor supply curve is shifting in. In the first case, the migration premium is greater than one; indicating that migration is increasing the relative supply of high skill workers. But in the second case, the migration premium is less than one; indicating the opposite effect. Even with a well identified estimate, the coefficient on the relative amount of migration would be misleading. Ultimately, whether migration changes inequality or not depends on the distribution of skills already existing in the location. Without properly accounting for this existing distribution, any estimates of the effect of migration on inequality will not be interpretable. 78 The migration premium specification takes uses information on both in-migration and out-migration for both premia. In the in-migration premium, the denominator for each skill includes the amount of out-migration. If there is a large amount of out-migration then the number of stayers is reduced. Hence, in-migration has a larger impact if there is a large amount of out-migration. The experiment that the regression equations 3.11 and 3.12 ask is if having slightly more high skill migrants move in or move out changes local inequality. 3.4 Data I use the ACS as the main dataset for estimation. Before I can measure migration, I have to assign workers to their current and past local labor markets. However, the ACS does not report geography that corresponds to a local labor market. I describe in this section how I create the local labor market geography and how I determine migration flows into and out of local labor markets. While geography is a problem in the ACS, the extensive information on migration, demographics, and socioeconomics makes the ACS one of the best datasets to use for studying migration (Monras, 2015). The ACS is a 1% representative random sample of the US population. Each respondent gives information about income (in 2005 dollars), education, labor force status, demograph- ics, and migration. The ACS records the previous year’s location of all respondents. By having migration from the past year, I can see the effect of migration on the change in the skill premium.5 The ACS is representative of the US population, not the US labor force. Therefore, I clean the sample to make it representative of the US labor force. I drop individuals who were not working in the past year, were students, or were outside the ages of 20 and 65 years old from the sample. I define high skill workers as individuals with a college degree or more. While low skill workers possess a high school diploma or less. There is not a clear skill 5The US Census cannot be used to do this. The Census records migration only going 5 years back. But income is for the current year only. Hence, using successive Census samples would mean matching the 5 year migration flows with 10 year changes in the skill premium. 79 category to place individuals with some college into. Workers with some college are closer to college educated workers, particularly those who dropped out of college after several years, while others are not. The gap in education between high and low skill workers creates a clear divide between the two groups. Including worker with some college can dramatically change the skill premium in some CZs depending on which skill group they are added to. With a sample of the labor force, I now assign each individual to a current CZ and past CZ. CZs have not traditionally been used to study migration, partly because CZs are not recorded in the Census or in the ACS. The CZs must be constructed out of other levels of geography recorded by the US Census Bureau. I use Public Use Micro-Areas (PUMAs) to assign workers to labor markets. PUMAs are the smallest level of geography in the ACS that still abide by data confidentiality restrictions.6 These are small areas of land whose boundaries are defined at every census. PUMAs are made for statistical purposes rather than for economic purposes. However, PUMAs are reported for all ACS respondents and cover the entire US which allows me to map PUMAs to CZs for each worker. The boundaries of a PUMA follow the boundaries of counties and census tracts.7 PUMAs are too small to be considered as labor markets. For example, Los Angeles County contains 69 PUMAs while commuting flows suggest that LA County comprises one labor market. To study the effects of migration on local labor markets, I use CZs as a level of geography because a CZ best corresponds to a local labor market (VanHeuvelen, 2018). Based on the US Department of Agriculture definition, CZs function like a local labor market (Tolbert and Sizer, 1996). Individuals are highly likely to work with the set of counties that defines the CZ. The majority of workers outside of a CZ do not work in the CZ. Construction of CZs does not take into account the population of the CZ. This disregard for population allows for analysis of rural labor markets, which are often not studied due 6The ACS also records Metropolitan Statistical Area (MSA), state and country. Only MSAs can be considered as functioning like a labor market. MSAs are more akin to the idea of a city and so they do not cover the entire US territory. 7The addition of census tract boundaries is what causes the problems described below. 80 to data constraints. Low population locations, however, often contain no core economic area, such as a city.8 CZs emphasize connection between counties that comes from heavy commuting flows between counties, particularly in locations near a core business area. In rural counties, this connection becomes potential commuting flows (Fowler et al., 2019). Changes to supply and demand in one county of a CZ are supposed to be felt in all other counties that comprise the CZ. Fowler et al. (2019) point out that in some rural CZs, the effect of a shift in supply or demand will not be felt by all counties in the CZ. While CZs are meant to function like a local labor market, the performance of CZs as labor market decreases with population density. However, rural CZs are still able to capture some of the supply and demand shifts of labor markets which makes these better than Metropolitan Statistical Areas (MSAs) or Core Business Statistical Areas (CBSAs) for studying labor markets. There are two hurdles to overcome for placing workers in their current and past CZ. The first hurdle is allocating workers into their current CZ when that worker’s PUMA of residence crosses multiple CZs. This is solved using the crosswalk provided by Dorn (2009). The second hurdle allocating workers into their past CZ if they have moved. I overcome this hurdle by constructing a crosswalk similar to Dorn (2009). Both problems stem from the data confidentiality rules that govern creation of the US Census as well as how PUMAs are defined. Since no one-to-one mapping between PUMA and CZ exists, I split and reweight individ- uals whose PUMA is contained in more than one CZ. For example, if a PUMA is contained in two CZs then all individuals in the PUMA are allocated to both CZ. The new individual weights equal the percentage of the PUMA’s population that is contained in the CZ. Dorn (2009) provides a crosswalk between CZs and PUMAs using this reweighting scheme based on county and PUMA population counts published by the US Census Bureau. If 80% of the 8A large number of papers use core measures of geography such as MSA or Core Business Statistical Areas (CBSAs). This level of geography puts the concept of a core economic area at the heart of the definition of geography. These areas, however, place less weight on whether people actually work in the core area and whether people live in the core. 81 population of a PUMA is contained in CZ1 and the rest in CZ2, each person in the PUMA is split into both CZs with 80% of their weight going into CZ1.9 The method outlined by Dorn (2009) has one downside going forward. PUMAs crossing multiple CZs should have roughly similar skill distributions in both CZs. The reweighting scheme using PUMAs does not take into account the skill composition at a lower level of geography because skill com- positon is not reported by the U.S. Census Bureau. An extreme example is if all the high skill workers in a PUMA lived in one CZ and all the low skill workers lived in the other CZ. In this scenario, no reweighting would be necessary. But the current reweighting scheme would place high skill and low skill workers in both CZs. This issue is insurmountable with the ACS and publicly available Census data. However, such an example is quite extreme. Previous year’s location is complicated by data confidentiality as well. Solving this is nearly identical to the allocating individuals to current CZs but I construct the crosswalk. Previous year’s location is recorded as a MIGPUMA. All MIGPUMAs are collections of PUMAs and cover a larger area than a single PUMA. Allocating individuals into the CZ they lived in the previous year causes more splitting because a single MIGPUMA might cover many PUMAs. If a MIGPUMA covers four PUMAs and each PUMA contains counties in 2 CZs then each individual that previously lived in the MIGPUMA could have lived in eight different CZs. Once individuals are split into current CZ and previous CZ, there are slightly more than 44 million observations. I construct MIGPUMA to CZ crosswalks using a similar method to Dorn (2009). The difference is that my reweighting uses the percentage of the MIGPUMA’s population that is contained in the CZ.10 After creating the new weights for each worker, I calculate and check the migration rates for all CZs. The total amount of migration each year is shown in Figure B.2. There is a clear decrease in both inter-state and inter-CZ migration in response to the Great Recession. 9The ACS weights individuals and households in order to create a nationally represen- tative sample. Some other datasets do not have such a weighting scheme (VanHeuvelen, 2018). 10This probability relies on the correlation between MIGPUMA and counties, which is provided by the Missouri Census Data Center. 82 This fits with the analysis of Monras (2015) who finds that decreased migration is an optimal response to a negative labor demand shock. I check in-migration and out-migration flows against the IRS county-to-county flows for 2005-2010 using the data created by Hauer and Byars (2019).11 The IRS reports the number of people who filed a tax return that moved between counties each year. I aggregate the county-to-county flows to the CZ level and measure the percent of population that migrated in and out. Comparing the ACS and IRS data sets is not perfect. My ACS sample is of the population of US workers whereas the IRS sample is off all taxpayers. Retired individuals who earn an income are counted in the IRS data but not in my sample. Post-retirement migration is quite common which will affect the comparison(Greenwood, 1997). On average, the weighting scheme under counts migration by half a percentage point of the CZ population for in-migration and over counts nearly one percentage point for out-migration. The out-migration inaccuracy is driven mostly by a few outliers where the amount of out-migration is much larger in the ACS data than the IRS data. After removing the outliers, the discrepancies are due to the weighting system and the differences in population. The in-migration flows are close to what one would expect. The IRS sample should see more migration (because of the retirees) than the sample of workers. With CZs constructed and CZ-to-CZ migration flows created, I can calculate summary statistics for CZs and for migrants and non-migrants. CZ averages of income, skill premium, and migration premium are shown in table B.1. Average incomes and the in-migration premium decrease around the Great Recession. The in-migration premium is consistently above one except between 2007 and 2010. This means the average CZ’s relative labor supply curve is shifting out due to in-migration during the majority of the years cover in the data. During the Great Recession and the beginning of the recovery, the average CZ experienced a small shift back in the relative labor supply curve due to in-migration. The migration premium is near one for most CZs. The average absolute change in the in-migration premium is nearly 1.01; I will use this size of change 11The IRS data set here includes only the amount of migration between counties and no other information. 83 when discussing the regression coefficients. However, there are some CZs that experience over a migration premium over 1.1 or under 0.9. The review article by Greenwood (1997) shows that migrants are different than incumbent workers. Table B.2 shows that in the ACS, movers are different from stayers. The variation within each group is large so these two groups are not statistically different from one another. Many of the differences imply potentially smaller moving costs for movers or larger option- values for migrating to a new location. Movers tend to be younger, have slightly fewer kids, and are less likely to be married. All three of these characteristics imply lower moving costs. Curiously, movers report working half an hour more per week than stayers but movers are much less likely to hold a full time job for the year. This might point to some movers filling multiple part-time jobs or that movers with full-time jobs are working much more than stayers with full-time jobs. Movers are slightly less likely to be foreign born. These summary statistics show that movers are potentially different than incumbents, given the large difference is some of the point estimates. 3.5 Results The migration premium measures the percentage change in the relative labor supply curve due to migration. I will focus on in-migration because the shift-share instrument is strongest for in-migration. Focusing on in-migration will quantify a brain gain effect for commuting zones. That is, what is the change in income inequality when there is a change in the relative labor supply curve due to in-migration. Table 3.1 shows the reduced-form results. From panel A, Increasing the relative amount of high skill migrants has little effect on the skill premium over the short-run. The three year and four year changes in the skill premium show a significant negative effect on the skill premium. For every one percentage point increase in the relative labor supply curve due to in-migration, the skill premium decreases by a little over 0.2 percentage points. In panel B, the effect of out-migration is several orders of magnitude smaller. Calculating out-migration 84 Table 3.1: Reduced Form Estimates of the Change in the Skill Premium Panel A: In-Migration Skill Premium 1 Year 2 Year 3 Year 1 Year In-Migration Premium Constant Observations R-squared Number of CZs CZ controls -0.145* (0.0842) 0.131 (0.0851) 8,891 0.0130 741 No -0.0560 (0.0844) 0.0380 (0.0851) 8,149 0.0214 741 No -0.0007 (0.0463) -0.0348 (0.0482) 7,407 0.0239 741 No -0.122 (0.0838) 0.107 (0.0841) 8,891 0.0221 741 Yes Panel B: Out-Migration Skill Premium 1 Year 2 Year 3 Year 1 Year In-Migration Premium -0.000169 (0.000962) Constant Observations R-squared Number of CZs CZ controls -0.0142 (0.00954) 8,891 0.0127 741 No -8.96e-06 (6.11e-06) -0.0179* (0.0103) 2.87e-09 (1.06e-07) -0.0355*** -0.000185 (0.000909) 0.0147 (0.0103) (0.00956) 8,149 0.0214 741 No 7,407 0.0239 741 No 8,891 0.0219 741 Yes Note: Each column uses the same dependent variable, the change in the skill premium. The controls used are the one changes in average age, percentage of women, and percent foreign-born. All controls are separated out by skill and are CZ averages. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 is less precise because MIGPUMAs cross many commuting zones, as described in section 3.4. Splitting individuals across many CZs creates attenuation bias which pushes the coefficients towards zero. The results in table 3.1 are likely biased. If the country was in a spatial equilibrium where no worker wished to move in the recent past then there must have been a shock that moved the economy out of equilibrium. Such a shock is probably a change in labor demand (the data covers the Great Recession). This shock surely changed the wages of high and low skill workers. Hence, the OLS estimates in table 3.1 are confounding the shock that induced 85 migration and the effect that migration has on the local economy. I want to separate out these two effects and focus on migration. 3.5.1 The Shift-Share Instrument The typical Bartik instrument, or shift-share instrument, is used to create labor demand shocks (Blanchard and Katz, 1992). But the shift-share instrument can be used to study migration because it uses the general principle of predicting changes in current quantities by using the distant past. The changes in a local quantity can be predicted by national level changes and how that national change was allocated across CZs in the past. This method eliminates the serial correlation within a CZ that could be caused by shifts in local demand or supply. That is, if a labor demand shock is persistent in a particular location then moving further back in time removes the effect of the demand shock’s persistence. In order to create a shock to in-migration, I use the share of people born in a given state who now live in the current CZ from the 1990 Census. I use birth state because there is no information on PUMA or CZ of birth. Using birth state is beneficial because not every pair of CZs had workers moving between them in 1990. But every state did, so using state level variation allows for future flows between CZs to have positive weight. There might have been zero migration between some CZs in Alabama to rural CZs in North Dakota in 1990 but the fracking boom in the mid-aughts might have opened up some migration. Using states as the origin of movers allows for the instrument to capture some of this new migration flow. However, using birth state has some downsides as well. Mainly, that it is the wrong level of geography compared to the rest of the analysis and I lose some variation in the instrument. Then the base share is multiplied by the total out-migrants from a given CZ in a given year of the ACS (2005-2017). Thus the shift-share instrument, Bict, is (cid:88) d(cid:54)=c Bict = νi,s(d),c,1990 · gi,d,t (3.13) where gi,d,t is the share of the population that out-migrated of skill i from CZ d recorded 86 in ACS year t. And νs(d),c,1990 is the share of people of skill i living in CZ c born in state s (where s changes based on d) in the 1990 Census. This instrument is similar to the instrument used in Card (2001) who uses foreign country of origin instead of birth state. Exogeneity in a Bartik-style instrument comes from the initial share being recorded so far in the past that it has no correlation with current local shocks. Serial correlation in shocks forces the share year to be quite far in the past. Constructing the share too far back creates a weak instruments problem where the share is irrelevant to current shocks. I will use two different testing methods to assess the validity of the instrument. First, I test whether the shares are correlated with national trends in the skill premium. Second, I use the base-year test method of Goldsmith-Pinkham et al. (2018) to test whether the instrument is correlated with known supply or demand shocks. The independent variable used in some of the regression is the migration premium, as defined in the section 3.2. The instrument used is the ratio Ic,t = BH,c,t BL,c,t (3.14) And in the first stages, I will add up the low skil and high skill shocks. Thus, the two year instrument is I2 c,t = BH,c,t + BH,c,t−1 BL,c,t + BL,c,t−1 (3.15) The three year, four year, and so on, instrument is constructed in a similar way to (3.15). Some regressions will just use BH,c,t or BL,c,t alone. I can test the validity of the instrument by using the 1990 shares to predict the skill premium in each CZ. To do this, I construct the following predicted skill premium (cid:88) (cid:18) νH,s,c,1990 (cid:19) Gct = s(cid:54)=s(c) νL,s,c,1990 ¯sps,t (3.16) where ¯sps,t is the average skill premium for state s in year t. The sum includes all of the states that do not correpsond contain CZ c. Hence, I am using the average skill premium of all other states to predict the average skill premium in state s. The average skill premium of 87 the other states is a weighted average using the shares νi,s,c,1990. Then I run the regression spct = β0 + β1Gct + υct (3.17) where υct is the error term. Table 3.2: Exogeneity Test Skill premium Change in skill premium Change in skill premium Gct Constant 0.0000298 (0.000266) 1.069153*** (0.000458) 0.0050021 (0.0345056 ) -0.0003356 (0.000315) Observations R-squared Number of CZs Fixed Effects 9,633 0.000 741 No 8,892 0.000 741 No 0.0057539 (0.0360415) -0.0003355 (0.000328) 8,892 0.001 741 Yes Note: Each regression tests the exogeneity assumption of the shift-share instrument. Standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 The result of regression (3.17) in table 3.2 show that the 1990 migration shares interacted with the skill premium of all other states does not influence the skill premium. The table shows some evidence that the exogeneity condition on the shares might hold. This means the shift-share instrument is valid in the sense that it is not picking up a national trend in the skill premium. The second test is the Goldsmith-Pinkham et al. (2018) base-year test. This test looks at whether the instrument is correlated with CZ characteristics that are associated with either labor supply of labor demand shocks. If the instrument is correlated with well-known labor supply shocks, then we can have some confidence that the instrument is capturing a shift in labor supply. However, if the instrument is also correlated with well-known labor demand shocks, then we might be worried about interpreting the results and the validity of the instrument. In picking the covariates, I follow Goldsmith-Pinkham et al. (2018) by picking variables that are traditionally associated with demand or supply shocks. For supply 88 shocks, I use the percentage of foreign born (separated by skill) in a CZ. This is exactly what Card (2001) uses to construct his supply shock instrument. For demand shocks, I use the percentage of workers in each 2-digit occupation. This works similarly to the traditional Bartik instrument. Any significant occupation-skill category is then matched to whether or not that occupation was growing over projected period.12 Table 3.3: Base-Year Test Instrument Instrument 5.268*** (0.426) -0.947*** (0.207) Foreign-born, HS Foreign-born, LS Occupation-HS: business and financial Occupation-HS: physical, life, social science Occuaption-HS: education and library Occuaption-HS: arts, sports, media Occuaption-HS: healthcare support -0.0000789*** (0.0000291) 0.0000739** (0.0000347) 0.0000667*** (0.0000152) 0.0000611** (0.0000299) -0.0003379*** (0.0001038) Observations R-squared 741 0.19 741 0.43 Note: This shows the results of the base-year test for the in-migration shift-share instru- ment. Only significant occupations are shown and the only significant occupations are for high skill workers. Standard errors are in parentheses. *** p<0.01, ** p<0.05, * p<0.1 Table 3.3 shows the results for the in-migration shift share instrument. First, the instru- ment is highly correlated with the percentage of foreign-born workers, for both high and low skill. This test adds significant confidence that the instrument is indeed capturing a labor supply shock. Second, in a separate regression, I regressed the instrument on all 2-digit occupation-skill groups. Only five of 46 possible occupation-skill groups had significant coef- ficients and these are the coefficients reported in table 3.3. All five of these occupation were 12All 5 of the significant occupation-skill groups are in occupations that were growing above average nationally. So this matches closely with the logic of the Bartik instrument. 89 growing faster than the average occupation so these are occupations probably experiencing a demand shock. A few cautionary points are in order. First, we would expect two or three skill-occupation groups to be significant even by chance. Second, 13 occupations were grow- ing at an above average rate. Hence, the instrument is not correlated with most of the fast growing occupations. Third, the instrument is not correlated with any low skill occupation- skill group. Put together, there is weak evidence that the instrument is correlated with a labor demand shock. Now, I turn to determining whether the instrument is weak or not. 3.5.2 First Stage The first stage uses the shift-share migration shock instrument to predict the migration premium. Here, the instrument is the ratio of the high skill instrument over the low skill instrument from (3.14). The form of the instrument mimics the form of the move premium, which is a ratio of high skill migration to low skill migration. A major problem when using the shift-share instrument with migration is how far in the past to go with the shares. Using only a few years in the past is likely to create a strong instrument but it will probably be still affected by the serial correlation that the instrument is trying to break (Goldsmith-Pinkham et al., 2018). Going very far into the past breaks the serial correlation but might create a weak instrument. The optimal distance to go balances these two concerns. It is impossible to know what that optimal distance is in practice. Using the 1990 Census gives at least 15 years between the shocks that created the 1990 migration patterns and the first year of the ACS sample. The weak instrument problem is testable using first stage F-statistic rule of thumb (Stock and Yogo, 2005). Table 3.4 shows the first stage regression with the shift-share ratio containing only one year of data. The dependent variable is the migration premium from the right-hand-side of (3.11). Column one only includes year dummies and the second column includes year dummies and CZ controls. Testing for weak instruments is done by looking at the F-statistic 90 on the overall first stage regression. Stock and Yogo (2005) show that a good rule of thumb in the linear instrument variables regression case is that the F-statistic should be greater than 10. Both in-migration specifications (with and without CZ controls) pass the weak instrument test. Given the results in table 3.4, I am fairly confident that the shift-share instrument is strong for predicting in-migration. While table 3.2 provides some evidence that the instrument is valid, I will present more evidence in the next two subsections that the instrument is generating an exogenous shift in supply. 3.5.3 Changes to Local Income Inequality The main measure of income inequality is the skill premium.13 Local income inequality differs considerably across CZs but there are a few regional patterns. The coasts tend to have more inequality while the central plains tend to have less inequality. The map in figure B.3 shows the skill premium for the entire US in 2015. Using the shift-share instrument, Panel A shows the changes to the skill premium coming from in-migration. The estimates show that the skill premium is increasing with more high skill migrants. This result points to high skill migrants being complementary to high skill incumbent workers. That is, increases in the relative supply curve due to migration helps incumbent high skill workers relative to incumbent low skill workers. While the coefficients are not identical over each time horizon, they are statistically similar across all horizons. The estimate for all workers implies that increasing the relative labor supply curve by one percent due to in-migration will increase the skill premium by 0.421 percentage points. The last column looks at just the change in the skill premium for incumbent workers only. Even though the coefficient is smaller, it is not statistically different from the full population esti- mate in the first column. A positive coefficient points to complementarity but may actually be hiding differences in substitutability across skill. While the coefficients are statistically significant, they are small compared to the aver skill premium. The average skill premium 13Results are similar for the 75-25 ratio. 91 Table 3.4: First Stage Estimates Panel A: In-Migration In-Migration Premium (1) (2) Bartik Ratio Constant Observations R-squared Number of CZs CZ controls Panel B: Out-Migration Out-Migration Premium 0.0145*** (0.00147) 0.994*** (0.00199) 0.0145*** (0.00153) 0.989*** (0.00192) 9,632 0.0459 8,891 0.0473 741 No (1) 741 Yes (2) Bartik Ratio Constant Observations R-squared Number of CZs CZ controls -0.00834** (0.00391) 1.001*** (0.00401) -0.00837** (0.00391) 1.002*** (0.00402) 8,670 0.0014 741 No 8,670 0.032 741 Yes Note: Each column uses the same dependent variable, move probability ratio but differs with the amount of years that the migration shock goes back. The 4 year shock includes the shocks for all previous four years added together. Regression shown in the table do not include year dummies. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 across CZs is nearly 2 while the average change due to the migration premium is 0.0042. Part of the reason for the small size is that the migration premium does not move much over one year. Places receiving a large number of high skill migrants still receive some low skill migrants. Hence, any shift in the relative labor supply curve due to migration is go- ing to be smaller than just the flow of high skill migrants would suggest. However, places that consistently experience an in-migration premium above one over many years will see an 92 economically significant increase in their skill premium. Panel B of table 3.5 shows the effect of out-migration on the skill premium. None of the coefficients are statistically significant. Even if these coefficients were statistically different from zero, their magnitudes are considerably smaller than the in-migration coefficients. One reason for the insignificant results is that the out-migration instrument is much weaker than the in-migration instrument. A second problem is that out-migration is much noisier than in-migration due to the larger geography that the ACS uses to code past location. This creates attenuation bias in the estimator which biases results toward zero. With both of these problems, it is not surprising that the out-migration results are not significant, unlike the in-migration results. Which part of the income distribution is causing the increase in the skill premium is necessary to understand what is causing the increase. Looking at average high skill and low skill incomes also helps unpack whether high skill migrants are complements to high skill workers or are substitutes to both skill types. Tables 3.6 and 3.7 look at the effect of higher move probability ratio on average incomes by skill. These regressions are analagous to table 3.5 with only a change to the dependent variable, replacing the skill premium with average income by skill group. Table 3.7 shows that the effect is mainly a large decrease in low skill incomes. The point estimates for the decrease in income is larger in absolute magnitude for low skill workers. Therefore, the skill premium is increasing because the percentage of income lost due to migration shifting the labor supply curve is much larger for low skill workers. The effect for the one year change is negative but insignificant for high skill workers. Once CZ controls are added, the magnitude of the point estimate is cut in half. The decrease in low skill incomes could potentially be explained by some high skill workers filling vacancies normally filled by low skill workers. Hershbein and Kahn (2018) find that job postings tended to have increased skill requirements in cities hit hardest by Great Recession. Both education and, to a lesser extent, experience requirements increased and these increases lasted up to 2015. 93 Table 3.5: Change in Skill Premium Panel A: In-Migration Skill Premium (1) (2) (3) Migration Premium Constant 0.421*** (0.152) 0.444*** (0.154) 0.379** (0.151) -0.451*** -0.476*** -0.403*** (0.154) (0.156) (0.154) Observations Number of czone First-stage F-statistic Incumbents only CZ controls 8,891 741 20.30 No No Panel B: Out-Migration Skill Premium (1) Migration Premium Constant Observations Number of czone First-stage F-statistic Incumbents only CZ controls -0.934* (0.505) 0.892 (0.495) 8,670 741 2.56 No No 8,891 741 22.06 No Yes 8,891 741 20.30 Yes No (2) (3) -0.965* (0.509) 0.920* (0.499) 8,670 741 3.41 No Yes -0.825* (0.461) 0.790* (0.452) 8,670 741 2.56 Yes No Note: Each column uses the same dependent variable, the change in the skill premium with only the length of the change differing. Hence, the 4 year column is measuring 4 change in the skill premium. Panel A measures the migration premium with in-migration while panel B measures the migration premium with out-migration. All regressions include year dummies. The controls used are the one changes in aver- age age, percentage of women, and percent foreign-born. All controls are separated out by skill and are CZ averages. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 There seems to be evidence that high skill migrants are substitutes to both low skill and high skill incumbent workers in a CZ. Just like the results in table 3.5, the results for average incomes are quite small when we 94 scale by the average of the migration premium. The change in low skill income represents a 0.4 to 0.6 percentage point change in average income. These results are being scaled to one year’s change in the migration premium. Many CZs that receive a high proportion of high skill migrants in one year will most likely receive many high skill migrants the next year. Over time, these small coefficients can add up to a sizable change in the skill premium and average incomes. Table 3.6: Change in High Skill Income Average High Skill Income (1) (2) (3) In-Migration Premium Constant Observations Number of czone First-stage F-statistic Incumbents only Controls -1,719 (4,203) 3,313 (4,268) -920.7 (4,241) 2,474 (4,314) -2,464 (4,275) 4,266 (4,346) 8,891 741 20.30 No No 8,891 741 22.06 No Yes 8,891 741 20.30 Yes No Note: Each column uses the same dependent variable, the change in average high skill income. All regressions include year dummies. The controls used are the one changes in average age, percentage of women, and percent foreign-born. All controls are separated out by skill and are CZ averages. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 Looking at how the average income of each skill group changes is helpful but does not decompose the income distribution enough. Table 3.9 looks at the changes in income at particular percentiles on the local income distribution. The only significant changes occur at the 25th and 38th percentiles. Both of these percentile incomes decrease in response to a shift in the relative labor supply curve. Changes at the 25th and 38th percentile seem to occur too low on the income distribution when the labor supply curve is shifting out. But these changes are in line with other docu- mented evidence of changes in the labor market. As referenced earlier, the skill requirement 95 Table 3.7: Change in Low Skill Income Average Low Skill Income (1) (2) (3) In-Migration Premium -9,633*** -9,530*** -9,426*** Constant (1,952) (1,992) (1,952) 10,935*** 10,852*** 10,759*** (1,985) (2,025) (1,987) Observations Number of czone First-stage F-statistic Incumbents only Controls 8,891 741 20.30 No No 8,891 741 22.06 No Yes 8,891 741 20.30 Yes No Note: Each column uses the same dependent variable, the change in average low skill income. All regressions include year dummies. The controls used are the one changes in average age, percentage of women, and percent foreign- born. All controls are separated out by skill and are CZ averages. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 of jobs increased during the Great Recession (Hershbein and Kahn, 2018). Some jobs that previously required only a high school diploma before 2007 would require a college degree by 2010-2015 period. Second, Beaudry et al. (2014, 2016) show that demand for cognitive skills typically associated with college degrees have moved down the occupational ladder. Jobs with high cognitive skill content have fewer employment prospects and are paid less than in the past. Both the upskilling of previously low skill jobs and the decline of cognitive tasks creates a knockdown effect (Beaudry et al., 2014). The knockdown effect pushes high skill workers into previously low skill occupations which, in turn, pushes low skill workers into lower paying jobs. As more high skill migrants enter a location, the knockdown effect increases. From table B.2 younger workers are more mobile, yet some occupations are also more mobile than others.14 Occupation specific mobility barriers, like licensing, might bias some occupations towards immobility. For instance, lawyers must pass a state bar exam in order to 14There are not any significant differences in average age across two-digit SOC occupation code. 96 practice law. There is some degree of reciprocity for these licenses but there is no state that gives full reciprocity. Therefore, mobility is highly restricted to movement within a state. The average salary for a college educated worker in the legal occupation is $139,500 which is much higher than the average salary of a college educated worker. The general point is that there might be negative correlation between occupation mobility and the average income for that occupation. In order to test whether occupational mobility and income are connected, I look at whether a particular occupation has more in-migrants than the average. Let M in oict be the number of in migrants in occupation o of skill i in location c ∈ C and Loict being the number of workers in that occupation. Then the relative mobility of occupation o is measured by (cid:32) (cid:88) 2017(cid:88) (cid:33) Doi = 1 T × C c∈C t=2005 M in oict Loict − M in ict Lict (3.18) where T × C is the total number of CZ-year observations. When Doi > 0 then occupation o has more workers in-migrating than the average.15 Then I look at the correlation between Doi and the average income for each occupation by skill in table 3.8. I find that for both high skill and low skill workers, occupational mobility and average income are negatively correlated. The size of the correlation is twice as strong for high skill workers (-0.448) as it is for low skill workers (-0.231). We would expect to see zero correlation between occupational income and relative mobil- ity if there are no occupational barriers to moving. Workers are observed in their destination occupation, not their previous location occupation. There is considerably large negative cor- relation between relative mobility and income for high skill workers. Since the ACS reports the occupation in the destination location, not the past location, workers are moving into the lower paying occupations. Although, they are not necessarily changing occupations while changing locations. It seem likely that there is a link between the mobility of each occupation 15Occupation are limited to the 2-digit SOC code. Even at this level of aggregation, not all CZs have workers in every occupation. Therefore, I omit CZs where there is no worker of that occupation. 97 Table 3.8: Migration Bias and Incomes by Occupation Occupation Management Business and Financial Computer and Mathematical Architecture and Engineering Life, Physical, and Social Science Community and Social Service Legal Educational Instruction and Library Arts, Design, Entertainment, Sports, Media Healthcare Practitioners and Technical Healthcare Support Protective Service Food Preparation and Serving Related Building and Grounds Cleaning Personal Care and Service Sales and Related Office and Administrative Support Farming, Fishing, and Forestry Construction and Extraction Installation, Maintenance, and Repair Production Transportation and Material Moving Military Specific Correlaton between Mobility and Income High Skill Low Skill Rel. Mobility Income Rel. Mobility Income -0.0070 -0.0013 0.0065 0.0045 0.0099 -0.0041 -0.0054 -0.0097 0.0099 -0.0011 0.0030 -0.0092 0.035 -0.011 0.0052 -0.00054 -0.0018 0.025 -0.0093 -0.0070 -0.0078 -0.0032 0.12 114,675 86,996 88,780 93,613 79,296 49,655 134,097 52,943 61,099 102,654 40,165 70,261 26,948 35,149 30,567 86,132 47,041 42,301 56,797 58,672 57,325 57,524 49,724 -0.0059 -0.0049 0.0065 -0.0019 -0.0024 0.0000026 -0.0066 -0.0073 0.0068 -0.0038 -0.00094 0.00052 0.0098 -0.0037 -0.0024 0.0028 -0.0048 0.0092 0.0025 -0.0039 -0.0047 -0.00080 0.22 60,235 49,002 60,658 57,274 48,764 32,977 49,264 23,286 40,260 39,328 24,744 39,174 19,031 23,377 20,733 33,576 31,958 24,409 38,236 43,977 34,642 34,012 23,939 −0.284 −0.189 Relative mobility measures the difference between the percentage of residents with a given occupation who are migrants minus the percentage of residents who are migrants. For example, if 2% of the residents in a location work in the legal occupation and are migrants while 4% of of total residents are migrants then the relative mobility will be -0.02. I have averaged over all CZ-year pairs to create the relative mobility for each occupation. and the amount of income. Occupations that have significant barriers to mobility, like the legal profession, might also see their incomes increase due to the market power that the lack of mobility causes. Two factors seem to be at work. First, migrants tend to be younger but more educated than workers remaining in a local labor market. Thus competition between migrants and staying workers is not occurring at the top end of the income distribution where experience and a college degree are required. Second, the occupations that have relatively more migra- 98 tion also pay less. Migrants are already starting at a lower point on the income distribution. Together, migrants are competing with either low paid college educated workers or highly paid workers who only have high school degree. These two statistics explain why migrants cause an increase in the skill premium. Table 3.9: Change in Income by Percentile Panel A: Change in Income at Select Precentiles Percentile p13 p25 p38 p63 p75 p88 In-Migration Premium Constant 609.629 (1124.119) -220.731 (1127.132) -5365.035*** (1331.815) 5516.411*** (1336.953) -3665.717** (1543.049) 3738.543** (1554.77) 643.982 (2289.25) -771.058 (2300.046) 4929.965 (4929.965) -4932.823 (3241.096) -1395.583 (4644.619) 1821.985 (4653.988) Observations Number of czone 8,891 741 8,891 741 8,891 741 8,891 741 8,891 741 8,891 74 Note: All coefficients are estimated using 2SLS using the one year ratio of instruments, first stage F-statistic is 20.30. Standard errors in parentheses and are clustered at the CZ level. All regressions include year dummies. *** p<0.01, ** p<0.05, * p<0.1 One identification issue is that the skill premium increasing could point to the instrument picking up a relative demand shock. That is, a negative demand shock to both high and low skill workers but with larger magnitude impact on low skill workers. This would explain why high skill and low skill average incomes are decreasing but it would not necessarily explain why the effect is concentrated in the 25th to 38th percentile of the income distribution. While there is not necessarily a single test to rule this out, the following subsection will provide some evidence that this threat to identification is not present. 3.5.4 Assessing the Plausibility of the Supply Shift Any effect on average incomes could also influence labor force participation and unemploy- ment. Any change to labor force status could also reflect a change in local labor demand. If the instrument is also picking up negative local labor demand shocks then we would expect to see increases in unemployment and non-participation. In order to check whether there are changes for labor force status, I run the same regressions with the instrumented move probability ratio as the independent variable but high skill and low skill unemployment are 99 the dependent variables. I will also discuss a few tests of whether the relative supply shift is identified. Looking at unemployment as well as labor-force participation and out-migration will help determine whether a relative supply shift is being captured or whether a relative demand shift is a better explanation of the results. The main issue is whether both high skill and low skill workers experienced a negative demand shock. If low skill workers experienced a larger negative demand shock then we would expect to see an increase in unemployment or exit from the local laborforce. Table B.3 and B.4 show no increase in the unemployment rate of high skill workers or low skill workers. Almost all of the point estimates are negative. The main results on employment seems to be that in-migration has no effect on unemployment. We can reject the idea that increases in relative labor supply due to in-migration are coming at the expense of already employed workers. If displacement was occurring then we would see an increase in the unemployment rate. Somewhat puzzling is that lack of change in low skill unemployment. Only after two years worth of migration is there a significant change in unemployment and that is downard. The significant effect disappears when looking at longer changes to the unemployment rate. One important caveat to remember is that the data covers the recovery from the Great Recession. Businesses were most likely at their leanest in 2008 and 2009 because they had shed many workers in order to reduce costs. After 2009, business were expanding and thus willing to hire both high skill and low skill workers. Without year dummies, the coefficients on the one year change in the unemployment rates for both skill is negative and highly significant. This points to a national downward trend in unemployment that was caused by the recovery to the Great Recession. These results on unemployment do not match Granato et al. (2015) who find that in- creased high skill migration increases unemployment disparities across regions. However, the methods used here are not quite apples to apples comparisons. My results suggest that, in the short-run, migration-caused shifts in relative labor supply have little impact on local 100 unemployment rates for both high and low skill workers. Granato et al. (2015) separate workers into more skill groups which could also explain some of the discrepancies with my results. The lack of change in unemployment provides some evidence that the instrument is not picking up a shift in relative labor demand. It is difficult to see how a positive demand shock could be driving the results since average incomes decrease. The main threat is if both skill types receive a negative shock to demand but low skill workers receive a relatively larger negative shock. We would expect a negative shock to demand to increase unemployment for at least one skill type but this does not appear to be the case. In fact, the mostly negative coefficients point to positive demand shock but any positive demand shock would also increase average incomes, which is not observed. Unemployment is not the only choice that displaced workers have in response to increased migration. Workers can remove themselves from the local labor market entirely through either migration out of the CZ or not searching for work. These measures will also test whether there is a negative demand shock or not. The instrument predicts the amount of relative migration into a CZ. To see whether this is picking up a negative relative demand shock, I regress the change in the ratio of high skill to low skill employment on the instrument. The results are in column one of table B.5. As the instrument increases, the relative amount of employment increases. We should expect that the instrument would increase the relative employment by a small amount because the instrument is predicting relative in-migration. But the coefficient is so small that the instrument is probably causing an increase in relative employment due to relatively large decrease in low skill employment compared to high skill employment. This increase provides some evidence that negative shocks to both skills are not being captured by the instrument. The second and third columns of table B.5 also provide evidence that the instrument is valid. Relative out-migration and relative not-in-labor-force are both unchanged in response to an increase in the lagged in-migration premium (not lagging also yields similar coefficients). If 101 there was a negative demand shock then we would expect to see these measures significantly decrease. The point estimates are negative, however, which is slight evidence of a negative labor demand shock. Finally, I estimate the out-migration response for both skill types in table B.6. Both coefficients are positive and significant which suggests an out-migration response to the increase in relative labor supply. The coefficients are small, representing less than a one percent change in out-migration in response to a one percentage point increase in lagged relative labor supply. Because the change is so small, it most likely is a result of increased in-migration rather than a labor demand shock. Putting all of the evidence together, there seems to be little reason to believe that the instrument is capturing negative demand shocks to each skill type. Unemployment and out-migration show no economically significant response to in-migration. 3.5.5 Extending the Theoretical Model to Incorporate the Results The decrease in both high skill incomes and low skill incomes helps rule out a simple expla- nation for the increase in the skill premium. In the theoretical model from section 3.2, the intrinsic skill differences between high and low skill workers was represented by a constant θ. Crucially, θ did not vary across locations. There is considerable evidence that more pro- ductive workers tend to sort themselves into more productive firms (Combes et al., 2008). Hence, θ could potentially vary across CZs with higher θ representing locations with more high productivity firms. If more productive firms are located in CZs that have positive net flows of migrants then these areas will have an in-migration premium above one and a rising skill premium. For the skill premium to rise, either firms are becoming more productive for high skill workers, θc is increasing, or high skill migrants are filling vacancies in above average productivity firms. In either case, high skill average incomes should be increasing as well. The results in table 3.6, however, show high skill incomes decreasing. What if, however, θc is negatively correlated with the in-migration premium. Occupa- 102 tional barriers that decrease mobility for higher paying occupations, particularly for high skill workers, might also be more prevalent in locations with lower in-migration premia. Consider the case where θco is the intrinsic skill premium for occupation o in location c. Decompose θco into two parts θco = θ + θogoc (3.19) where θ is the multiplicative bonus that a worker receives for completing their college degree, θo is the premium for a college degree in occupation o, and goc is the percentage of the labor force of location c working in occupation o. Averaging across all occupations gives θc = θ + (3.20) where O is the set of all occupations. Then the covariance between θc and Zct is θogoc o∈O (cid:88) Zct, θ + Zct, (cid:88) (cid:88) o∈O θogoc o∈O Cov[Zct, θogoc] θoCov[Zct, goc]  θogoc  Cov[θc, Zct] =Cov =Cov (cid:88) (cid:88) o∈O o∈O = = And so this covariance is negative if locations with high in-migration premium have low employment of high θo occupations. 3.5.6 Robustness Checks The results in table 3.5 still hold when I factor age into the definition of skill. Motivating using age is that migrants are much younger than staying workers. As figure 3.1 shows, the modal age for a migrant is around 20 years younger than a stayer. Age, and thus experience, might be substitutable with education making a high school degree holder at age 50 more skilled than a college degree holder at ag 25. 103 Figure 3.1: Kernel density of age for migrants and stayers. I factor in age by redefining high skill as having a college degree and age of at least x years or a high school degree and age of at least y year. With y being much greater than x. I vary x and y in order to try to capture the effect of age. Unfortunately, age does little to change the main results. In table 3.10, the results using the original definition of skill are compared with two different definitions of skill. Skew 1 corresponds to high skill be defined as workers with a college degree and age of at least 25 or anyone with age of at least 60 years. Skew 2 corresponds to high skill be defined as workers with a college degree and age of at least 30 or anyone with age of at least 55 years. With the change in the definition of skill, the migration premium also changes as well as the skill premium. The first stages still pass the F-statistic rule of thumb (using the original 104 Table 3.10: Change in Skill Premium with Different Skill Definitions In-Migration Skill Premium Original Skew 1 Skew 2 Migration Premium Constant 0.421*** (0.152) -0.451*** (0.154) 0.325* (0.191) -0.345 (0.189) 0.544** (0.231) -0.548 (0.225) Observations Number of czone First-stage F-statistic CZ controls 8,891 741 20.30 No 8,891 741 18.97 No 8,891 741 13.20 No Note: All regressions include year dummies. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 instrument). All of the results for skewed definitions of skill are within one standard error of the original estimates. Therefore, incorporating age into the definition of skill does little to change the main results. 3.6 Conclusion The main question asked in the previous sections was what effect does the skill composi- tion of internal migration between CZs have on local income inequality. I used a shift-share instrument to isolate just a shift in relative labor supply due to migration. The instrument creates the expected amount of migration if historical migration patterns held in the 2005- 2017 data set. The instrument passes both the standard weak instruments rule-of-thumb and a type of exogeneity test. I find that migration into a CZ that shifts the relative labor supply curve out will cause an increase of 0.42 to the skill premium. The increase in the skill premium is due to low skill workers experiencing a large decrease in their incomes while high skill workers see only a modest, statistically insignificant decrease in their incomes. Shifting the relative labor supply curve seems to have little effect on the unemployment rate or labor force participation for either skill type. 105 Further research into migration and inequality will need to use administrative data, much like Yagan (2014); Notowidigdo (2011); Bartik (2017). Using administrative data that is not bounded by the confidentiality constraints will be better able to measure out-migration. And although in-migration is well documented in the ACS, administrative data that uses a finer level of geography than the PUMA will improve measurement of migration flows. Other papers analyzing migration with survey data have run into similar problems of choosing a geography and having low sample sizes in some rural areas (Kennan and Walker (2011) as one example). This paper shows that why migration might not be a powerful tool to reduce the effects of a recession.16 During recessions, migration tends to decrease, which Monras (2015) notes is an optimal response to negative labor demand shocks. Policies that increase migration through, such as through subsidization, might have adverse effects on labor markets receiving high or low skill workers. A migration subsidy would potentially negatively effect non-college educated workers in a desirable location even if college educated workers are the most likely to migrate. I present two explanations for the increase in the skill premium (and 75/25 ratio). First, occupations that are relatively more mobile are also, on average, pay workers less. This correlation is much stronger for high skill workers. One reason for this correlation is that occupational licensing that gives certain occupations higher income also creates barriers to mobility (Schleicher, 2017). A second reason is that higher paid workers might be less likely to receive a better offer of employment in another location. Higher paid workers tend to be older which could mean higher migration costs as well, leading to lower probability of moving given an increase in income in another location. Second, firms are able to adjust skill requirements for vacancies during recessions because the benefits of such restructuring are higher (Hershbein and Kahn, 2018). Migration then becomes a useful option for college educated workers because younger workers can fill vacancies and gain experience even in a 16Particularly, the decrease in average incomes for some workers. 106 recession. Low skill workers are harmed because the jobs that the high skill migrants are taking have been upskilled. This creates a knock-on effect hurting many low skill work- ers. Hence, income inequality increases rather than decreases with more high skill migrants entering a local labor market. Finally, the analysis in this chapter has been rather agnostic about why people are mi- grating. While the agnosticism is on purpose, I want to think through why people move and how that should shape our interpretation of the results. One obvious economic source of migration occurs when capital changes location. When this happens, in-migration increases into the areas that see an influx of capital. The results show that migrants are going to be younger and more educated. Hence, even if the capital influx is a headquarters, requiring educated and experienced workers, many of the resulting migrants are going to be educated by less experienced. And the competition margin between staying workers and migrants will be lower on the income distribution than expected. 107 APPENDICES 108 APPENDIX A APPENDIX TO CHAPTER 1 A.1 Is Labor Supply Increasing in Productivity? The FOC for l2 on (??) is w2θ2(1 + d) = v(cid:48)(l2 + αd) Differentiate this by w2, θ2(1 + d) + w2θ2 = v(cid:48)(cid:48)(l2 + αd) ∂d ∂w2 (cid:18) ∂l2 ∂w2 + α ∂d ∂w2 (cid:19) Which gives an expression for the change in labor supply, ∂l2 ∂w2 = θ2(1 + d) v(cid:48)(cid:48)(l2 + αd) + (w2θ2 − αv(cid:48)(cid:48)(l2 + αd)) ∂d ∂w2 v(cid:48)(cid:48)(l2 + αd) The FOC for d on (??) is Again, differentiate by w2 w2θ2l2 = αv(cid:48)(l2 + αd) θ2l2 + w2θ2 = αv(cid:48)(cid:48)(l2 + αd) ∂l2 ∂w2 (cid:18) ∂l2 ∂w2 + α ∂d ∂w2 (cid:19) Which gives θ2l2 + (w2θ2 − αv(cid:48)(cid:48)(l2 + αd)) α2v(cid:48)(cid:48)(l2 + αd) ∂l2 ∂w2 ∂d ∂w2 = (A.1) (A.2) (A.3) (A.4) (A.5) (A.6) Putting (A.6) into (A.3) gives v(cid:48)(cid:48)(l2 +αd) ∂l2 ∂w2 = θ2(1+d)+(w2θ2−αv(cid:48)(cid:48)(l2 +αd)) θ2l2 + (w2θ2 − αv(cid:48)(cid:48)(l2 + αd)) α2v(cid:48)(cid:48)(l2 + αd) ∂l2 ∂w2 (A.7) Apply α(1 + d) = l2 to get (2αv(cid:48)(cid:48)(l2 + αd) − w2θ2) ∂l2 ∂w2 = θ2l2 (A.8) 109 Differentiate the FOC on labor supply by l2 to get w2θ2 = αv(cid:48)(cid:48)(l2 + αd) which gives the result ∂l2 ∂w2 = l2 w2 > 0 (A.9) An increase in productivity of high-type workers will increase labor supply. The elasticity of labor supply with respect to productivity is one. A.2 Does an Equilibrium Always Exist? Disequilibrium can occur when a firm’s marginal product exceeds the equilibrium wage plus the change in wage due a change in labor supply. That is, (??) becomes an inequality g(cid:48)(L) > ˜w2 + ∂ ˜w2 ∂L L (A.10) At this point, each firm wants to increase demand for high type workers but is unable to because doing so will reduce the wage of the high skill workers. Workers at the firm that decreases ˜w2 will leave the firm. The firm is clearly not optimizing but it has no ability to decrease the number of high type workers employed. In essence, the equilibrium is under- determined because (??) only gives a bound on L. Using the other equilibrium conditions, (A.10) becomes ˜w2 − w2 < −g(cid:48)(cid:48)(L) α + g(cid:48)(L)θ2 α ˜w2 w2 (cid:18) (cid:19) (A.11) The left hand side is the rent-seeking payment to high type workers. And the right hand side is proportional to the change in marginal product of labor. Hence, no unique equilib- rium exists if the rent-seeking payment is too small compared to the change in marginal productivity of labor. It is difficult to assess when (A.11) is not true. However, the parameter values used in the numerical results can be varied to give an indication of when an equilibrium exists. 110 Figure A.1: The black areas represent parameter values for which an equilibrium exists. 111 APPENDIX B APPENDIX TO CHAPTER 3 B.1 Appendix This appendix is covers the data issues associated with calculating out-migration and the appendix contains many supplementary the figures and tables used in the paper. B.1.1 Where Do Movers Come From? The ACS does not report commuting zone for respondents. The lowest level of geography is the Public Use Micro-Areas (PUMAs), which is used to determine where a respondent in located. PUMAs are determined with each new census and are areas that encompass at least a hundred thousand people. Using PUMAs as the lowest level of geography has some advantages for the Census Bureau. First, PUMAs cover the entire US including the territories.1 Second, they are built using a collection of census tracts and counties.2 Third, PUMAs are (should be) geographically contiguous. For economics, PUMAs are a poor choice of geography because they do not have any relation to an economic concept. PUMAs are too large for neighborhoods and too small for local labor markets. For instance, Los Angeles county contains 69 PUMAs. Dorn (2009) uses a simple method for aggregating PUMAs into CZs. The method splits individuals across CZs because CZs are collections of counties but PUMAs can cut across counties. To split individuals, Dorn (2009) uses the intersection of two probabilities to reweight individuals in a given PUMA-CZ pair. The first probability is the probability you are in county c and PUMA j in time t. The second probability is the probability you are in CZ k and county c in time t. Note, that this second probability is either one or zero, 1I do not use the territories in my analysis. 2There is not a one-to-one mapping between counties and PUMAs though. 112 depending on whether you are in a county that comprises CZ k. Summing across all counties in the CZ gives the new weight. Let r denote the number of people in an area then, from (cid:88) c∈C Dorn (2009), wjkt = rjct rjt rckt rct (B.1) Where C is all the counties comprising CZ k. These weights are contained in David Dorn’s online PUMA to CZ crosswalk files. Each weight splits people in a PUMA across any intersecting CZs. PUMAs with a low number of people located in the CZ receive a low weight. The problem is more complicated when you want to look at migration. The Census Bureau reports different PUMAs for individuals who migrated in the past year, these are called migration PUMAs (MIGPUMAs). MIGPUMAs are collections of PUMAs which means that there is a one to one mapping between PUMAs and MIGPUMAs. But also, MIGPUMAs change with every census. Two additional issues appear for MIGPUMAs. First, MIGPUMAs are larger than PUMAs so a MIGPUMA could contain area that is outside of a single CZ. Second, data for rjct only exists for census years. The second problem is the main obstacle because I have to estimate rjct for off-census years. Luckily, we can get county and PUMA population estimates for each year and so we can estimate rjct for years where I do not have the data. To see the problem more concretely, Figure B.1 shows the intersection of CZs, PUMAs, and MIGPUMAs. In the figure, there is one MIGPUMA that is the combination of PUMA 1, 2, 3, and 4. Even though CZ1 is a part of P U M A1 and P U M A2, map M IGP U M A1 into CZ1 requires splitting observations into CZ2 and CZ3 as well. When splitting a non-mover in P U M A1 requires splitting the individual between CZ1 and CZ2. To estimate the change in rjct each year, I assume that population growth for the part of county c in PUMA j is equal to the total county population growth rate. This assumption is harmless when the county is completely contained within the PUMA. For counties that are partially contained in PUMA j then this assumption could lead to slight mismeasurement 113 CZ1 P U M A1 P U M A2 P U M A3 P U M A4 CZ3 CZ2 Figure B.1: PUMAs, MIGPUMAs, and CZs. MIGPUMA 1 is PUMA 1, 2, 3, and 4 combined. in the weighting scheme. I use equation (B.1) but replace rjct with ˆrjct = rjcs rct rcs (B.2) where s is the most recent past census year. Once we have the new weights, we can easily solve the aggregation problem because we have the correct weighting scheme to split individuals. In order to generate these weights, I use a number of different datasets. County popula- tion estimates come from the US Census Bureau. The US Census Bureau does not report PUMA population estimates but these are calculatable using the ACS. The Missouri Data Center’s GEOCORR creates correlation tables between counties and PUMAs. These corre- lation tables give the percentage of population in a given county that falls in a given PUMA. Each percentage is from the most recent past census. MIGPUMA to PUMA crosswalks are from IPUMS. 114 Figure B.2: National migration rates by state and CZ, broken down by year. The graph is not broken down by education group. B.1.2 Results Tables and Figures Table B.1: Summary statistics Year Skill Premium Income HS Income LS In-Migration Premium 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 1.944 1.929 1.926 1.909 1.990 1.989 1.963 1.977 1.949 1.908 1.921 1.958 1.932 83,212 82,482 83,929 82,356 81,693 79,983 79,719 78,802 79,664 81,249 81,285 82,680 85,026 35,110 34,638 35,022 34,727 32,811 32,262 31,958 32,105 32,434 33,114 33,298 33,846 35,382 Note: All quantities are averaged over all 741 CZs. 1.00874 1.00213 0.9982 0.9989 0.9996 0.9966 1.00411 1.00222 1.00511 1.00434 1.00927 1.00701 1.00848 115 Table B.2: Movers versus Stayers, summary statistics Age Kids Married Foreign-born Female Veteran Hours worked per week Full-time, full-year Movers Stayers 35.57 (11.62) 0.656 (1.02) 0.424 (0.49) 0.146 (0.353) 0.458 (0.498) 0.076 (0.264) 41.2 (11.6) 0.56 42.55 (12.05) 0.711 (0.99) 0.583 (0.49) 0.186 (0.389) 0.477 (0.499) 0.063 (0.242) 40.6 (11.1) 0.71 (0.496) (0.453) Note: All quantities are averaged over all 13 years. Standard deviations in parentheses below the means. Table B.3: Change in High Skill Unemployment High Skill Unemployment 1 Year 2 Year 3 Year 4 Year In-Migration Premium Constant -0.0139* (0.00824) 0.0124 (0.00833) -0.00274 (0.00613) 0.00132 (0.00625) 0.00153 (0.00553) -0.00329 (0.00562) 0.00664 (0.00520) 0.000639 (0.00528) Observations Number of czone First-stage F-statistic 8,891 741 14.93 8,149 741 14.66 7,407 741 14.96 6,666 741 14.63 Note: Each column uses the same dependent variable, the change in high skill unem- ployment. All regressions include year dummies. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 116 Table B.4: Change in Low Skill Unemployment Low Skill Unemployment 1 Year 2 Year 3 Year 4 Year In-Migration Premium Constant -0.0270* (0.0141) 0.0192 (0.0142) -0.0303*** (0.0107) 0.0248** (0.0107) -0.0182* (0.00944) 0.0125 (0.00941) -0.00667 (0.00887) 0.0336*** (0.00886) Observations Number of czone First-stage F-statistic 8,891 741 14.93 8,149 741 14.66 7,407 741 14.96 6,666 741 14.63 Note: Each column uses the same dependent variable, the change in low skill unem- ployment. All regressions include year dummies. Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 Table B.5: Relative Employment and Relative Migration Responses VARIABLES HS/LS Employment Out-Migration Premium HS/LS NILF OLS 2SLS 2SLS Instrument Lagged In-Migration Premium Constant Observations Number of czone Year dummies 0.00906*** (0.00212) -0.0136 (0.00582) 8,892 741 Yes -0.876 (0.849) 1.93** (0.860) 8,891 741 Yes -0.0212 (0.219) -0.0265 (0.224) 8,891 741 Yes Note: Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 Table B.6: Migration Responses by Skill VARIABLES HS Out-Migration LS Out-Migration Lagged In-Migration Premium Constant Observations Number of czone Year dummies 0.00519*** (0.00139) -0.00521*** (0.00140) 8,892 741 Yes 0.00305*** (0.00121) -0.00305*** (0.00122) 8,891 741 Yes Note: Standard errors in parentheses and are clustered at the CZ level. *** p<0.01, ** p<0.05, * p<0.1 117 Figure B.3: Spatial distribution of the skill premium across CZs. 118 BIBLIOGRAPHY 119 BIBLIOGRAPHY Daron Acemoglu and Pascual Restrepo. Artificial and Work, http://www.nber.org/chapters/c14027. book 8. University of Chicago Press, Intelligence, Automation, URL January 2018. Laurence Ales, Antonio Andres Bellofatto, and Jessie Jiaxu Wang. Taxing atlas: Executive compensation, firm size and their impact on optimal top income tax rates. Review of Economic Dynamics, 26:62–90, October 2017. Franklin Allen. Optimal linear income taxation with general equilibrium effects on wages. Journal of Public Economics, 17:135–143, 1982. David Arseneau and Sanjay Chugh. Ramsey meets hosios: The optimal capital tax and labor market efficiency. International Finance Discussion Papers 870, Board of Governors of the Federal Reserve System, September 2006. Anthony Atkinson, Thomas Piketty, and Emmanuel Saez. Top incomes in the long run of history. Journal of Economic Literature, 49(1):3–71, 2011. Alexander Bartik. Worker adjustment to changes in labor demand: Evidence from longitu- dinal census data. Working Paper, January 2017. Nathaniel Baum-Snow and Ronni Pavan. Understanding the city size wage gap. Review of Economic Studies, 79(1):88–127, 2012. Nathaniel Baum-Snow, Matthew Freedman, and Ronni Pavan. Why has urban inequality increased? American Economic Journal: Applied Economics, 10(4):1–42, 2018. Paul Beaudry, David A. Green, and Benjamin M. Sand. The declining fortunes of the young since 2000. American Economic Review: Papers and Proceedings, 104(5):381–386, May 2014. Paul Beaudry, David A. Green, and Benjamin M. Sand. The great reversal in the demand for skill and cognitive tasks. Journal of Labor Economics, 34(1):S199–247, 2016. Oliver Blanchard and Lawrence Katz. Regional evolutions. Brookings Papers on Economic Activity, 23(1):1–76, 1992. Francine Blau and Lawrence Kahn. Immigration and the distribution of incomes. In Barry R. Chiswick and Paul W. Miller, editors, Handbook of the Economics of International Migra- tion, pages 793–843. Elsevier, Amsterdam, 2015. 120 Robin Boadway. Tax policy for a rent-rich economy. Canadian Public Policy, 41(4):253–264, December 2015. Robin Boadway and Jean-Francois Tremblay. Optimal income taxation and the labour market: An overview. CESifo Economic Studies, 59(1):93–148, March 2013. Jan Boone and Lans Bovenberg. Optimal labour taxation and search. Journal of Public Economics, 85:53–97, 2002. David Card. Immigrant inflows, native outflows, and the local labor market impacts of higher immigration. Journal of Labor Economics, 19(1):22–64, January 2001. Center for Economic and Policy Research. Acs uniform extracts, version 1.4. Washington D.C., 2019. Pierre-Philippe Combes, Gilles Duranton, and Laurent Gobillon. Spatial wage disparities: Sorting matters! Journal of Urban Economics, 63(2):723–742, March 2008. Pierre-Philippe Combes, Gilles Duranton, Laurent Gobillon, and Sebastien Roux. Sorting and local wage and skill distributions in france. Regional Science and Urban Economics, 42(6):913–930, November 2012. Peter Diamond and James Mirrlees. Optimal taxation and public production i: Production efficiency. American Economic Review, 61(1):8–27, March 1971. Rebecca Diamond. The determinants and welfare implications of us workers’ diverging location choice by skill: 1980-2000. American Economic Review, 106(3):479–524, March 2015. Egbert Dierker and Hans Haller. Tax systems and direct mechanisms in large finite economies. Journal of Economics, 52(2):99–116, 1990. David Dorn. Essays on Inequality, Spatial Interaction, and the Demand for Skills. PhD thesis, University of St. Gallen, September 2009. Farid Farrokhi. Skill, agglomeration, and inequality in the spatial economy. Technical report, Working Paper, 2018. Farid Farrokhi and David Jinkins. Wage inequality and the location of cities. Journal of Urban Economics, 111:76–92, May 2019. Christopher S. Fowler, Leif Jensen, and Danielle C. Rhubart. Assessing u.s. labor market delineations for containment, economic core, and wage correlation. Unpublished, August 2019. 121 Paul Goldsmith-Pinkham, Isaac Sorkin, and Henry Swift. Bartik instruments: What, when, why, and how. Working Paper 24408, National Bureau of Economic Research, July 2018. Nadia Granato, Anette Haas, Silke Hamann, and Annekatrin Niebuhr. The impact of skill- specific migration on regional unemployment disparities in germany. Journal of Regional Science, 55(4):513–539, September 2015. Michael J. Greenwood. In Mark Rosenzweig and Oded Stark, editors, Handbook of Population and Family Economics, volume 1B, chapter 12. Elsevier, 1997. Internal migration in developed countries. Mathew Hauer and James Byars. Irs county-to-county migration data, 1990-2010. Demo- graphic Research, 40:1153–1166, May 2019. Brad Hershbein and Lisa B. Kahn. Do recessions accelerate routine-based technological change? evidence from vacancy postings. American Economic Review, 108(7):1737–1772, July 2018. Arthur Hosios. On the efficiency of matching and related models of search and unemploy- ment. Review of Economic Studies, 57:279–298, 1990. Mathias Hungerbuhler, Etienne Lehmann, Alexis Parmentier, and Bruno Van Der Linden. Optimal redistributive taxation in a search equilibrium model. Review of Economic Stud- ies, 73:743–767, 2006. Laurence Jacquet, Etienne Lehmann, and Bruno Van Der Linden. Optimal redistributive taxation with both extensive and intensive responses. Journal of Economic Theory, 148: 1770–1805, 2013. John Kennan and James Walker. The effect of expected income on individual migration decisions. Econometrica, 79(1):211–251, January 2011. Wojciech Kopczuk. A note on optimal taxation in the presence of externalities. Economic Letters, 80:81–86, 2003. Nicholas Lawson. Taxing the job creators: Efficient progressive taxation with wage bargain- ing. Working Paper, 2015. Etienne Lehmann, Alexis Parmentier, and Bruno Van Der Linden. Optimal income taxation with endogenous participation and search unemployment. Journal of Public Economics, 95:1523–1537, 2011. Daniel Leonard and Ngo Van Long. Optimal Control Theory and Static Optimization in Economics. Cambridge University Press, Cambridge, 1992. 122 Ethan Lewis and Giovanni Peri. Immigration and the economy of cities and regions. In Gilles Duranton, J. Vernon Henderson, and William C. Strange, editors, Handbook of Regional and Urban Economics, volume 5, chapter 10, pages 625–685. Elsevier, 2014. James Mirrlees. An exploration in the theory of optimum income taxation. Review of Economic Studies, 38:175–208, 1971. Joan Monras. Economic shocks and internal migration. IZA Discussion Paper 8840, Institute for the Study of Labor, February 2015. Enrico Moretti. Real wage inequality. American Economic Journal: Applied Economics, 5 (1):65–103, 2013. Matthew J. Notowidigdo. The incidence of local labor demand shocks. NBER Working Paper, 2011. Barbara Petrongolo and Christopher Pissarides. Looking into the black box: A survey of the matching function. Journal of Economic Literature, 39(2):390–431, June 2001. Thomas Piketty, Emmanuel Saez, and Stefanie Stantcheva. Optimal taxation of top labor incomes: A tale of three elasticities. American Economic Journal: Economic Policy, 6(1): 230–271, 2014. Thomas Piketty, Emmanuel Saez, and Gabriel Zucman. Distributional national accounts: Methods and estimates for the united states. Working Paper 22945, National Bureau of Economic Research, December 2016. Casey Rothschild and Florian Scheuer. Optimal taxation with rent-seeking. Review of Economic Studies, 68:1225–1262, 2016. Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek. IPUMS USA: Version 9.0 [dataset]. IPUMS, Minneapolis, MN, 2019. Emmanuel Saez. Using elasticities to derive optimal income tax rates. Review of Economic Studies, 68:205–229, 2001. Emmanuel Saez. Optimal income transfer programs: Intensive versus extensive labor supply responses. Quarterly Journal of Economics, 117:1039–1073, 2002. Emmanuel Saez and Thomas Piketty. Optimal labor income taxation. In Alan J. Auerbach, Raj Chetty, Martin Feldstein, and Emmanuel Saez, editors, Handbook of Public Economics, volume 5, chapter 7, pages 391–474. Elsevier, 2013. Emmanuel Saez, Joel Slemrod, and Seth H. Giertz. The elasticity of taxable income with respect to marginal tax rates: A critical review. Journal of Economic Literature, 50(1): 123 3–50, March 2012. Bernard Salanie. The Economics of Contracts: A Primer, 2nd Edition. MIT Press Books, 2005. Florian Scheuer and Ivan Werning. Mirrlees meets diamond-mirrlees: Simplifying nonlinear income taxation. Working Paper 22076, National Bureau of Economic Research, March 2016. David Schleicher. Stuck! the law and economics of residential stability. Yale Law Journal, 127(78):78–154, 2017. Joel Shapiro. Income taxation in a frictional labor market. Journal of Public Economics, 88:465–479, 2004. Joseph Stiglitz. Self-selection and pareto efficient taxation. Journal of Public Economics, 17:213–240, 1982. James Stock and Motohiro Yogo. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, chapter Testing for Weak Instruments in Linear IV Regression. Cambridge University Press, 2005. Charles M Tolbert and Molly Sizer. Us commuting zones and labor market areas. ERS Staff Paper, 1996. Tom VanHeuvelen. Recovering the missing middle: A mesocomparative analysis of within- group inequality, 1970-2011. The American Journal of Sociology, 123(4):1064–1116, Jan- uary 2018. Chunbing Xing. Migration, self-selection, and income distributions: Evidence from rural and urban china. Economics of Transition, 22(3):539–576, 2014. Danny Yagan. Moving to opportunity? migratory insurance over the great recession. Work- ing Paper, 2014. 124