MACHINES, ANALYSTS, AND FINANCIAL MARKETS By Xinyu Wang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Business Administration Finance—Doctor of Philosophy 2021 ABSTRACT MACHINES, ANALYSTS, AND FINANCIAL MARKETS By Xinyu Wang In the first essay of this dissertation, I provide evidence on the usefulness of machine learning techniques to predict a firm’s future earnings and its implied cost of capital. These techniques have the potential to offer significant marginal explanatory power over prior approaches which rely on either linear models or analyst forecasts. I adopt a deep neural network approach that incorporates lagged and contemporaneous accounting variables to predict future earnings. My evidence demonstrates that this forecasting approach offers significant explanatory power that can improve on analyst forecasts. In addition, the deep learning approach outperforms linear models and displays substantially less bias than the human analyst forecasts. When I turn to the implied cost of capital from my earnings forecast model, I find that my deep-learning-derived estimates significantly outperform popular linear-model-based estimates. I argue that these findings have interesting implications for a variety of questions in finance and accounting. In the second study composing this dissertation, I turn from machines to people and consider reputational spillovers from buy-side analysts to sell-side analysts after a financial misconduct event. I detect evidence of negative reputational spillovers in the form of diminished market reliance on recommendations by sell-side analysts after buy-side analysts from the same brokerage firm are associated with financial misconduct. This penalty is significantly related to the buy-side analyst’s gender, suggesting that market participants condition their expectations regarding analyst behavior on analyst-specific characteristics. This dissertation is dedicated to my parents, Dr. Gang Wang and Ms. Shuming Yu. iii ACKNOWLEDGEMENTS I would like to express the depth of my gratitude to my adviser and dissertation committee chair Professor Charles Hadlock, who has provided me kind helps and supports during my PhD life. Dr. Charles Hadlock not only teaches me how to do serious research, but also provides me guidance on study, career, and more importantly, life. His charm of personality lights up my way. His optimistic attitude towards life impresses me a lot. His encouragement motivates me to go further no matter what difficulties I will meet in my life. It is my great honor to have him as my adviser. I would like to thank my committee members, Professor Andrei Simonov, Professor John (Xuefeng) Jiang, and Professor Nuri Ersahin. Without their persistent help, I cannot finish the dissertation and graduate from the PhD program. I would like to thank my parents Dr. Gang Wang and Ms. Shuming Yu. They firmly and selflessly support me through my PhD life, my study life, and my whole life. As parents, they build a healthy family environment for me, spare no effort to foster me, and teach me philosophy of life and conduct of life, which also influence my academic career. Without my parents, I cannot make the first achievement of my scholarly life. I feel very lucky that I born in this family! iv TABLE OF CONTENTS LIST OF TABLES .................................................................................................................. viii LIST OF FIGURES ............................................................................................................... viiii CHAPTER 1. The Implied Cost of Capital: A Deep Learning Approach .....................................1 1. INTRODUCTION ...............................................................................................................2 2. LITERATURE REVIEW ....................................................................................................6 2.1. Implied Cost of Capital .................................................................................................6 2.2. Deep Learning ..............................................................................................................7 3. DATA AND EMPIRICAL METHODOLOGY .................................................................11 3.1. Data ............................................................................................................................11 3.2. Deep Learning Model .................................................................................................11 3.3. The Benchmark: Cross-sectional Linear Regression....................................................18 3.4. Computing the ICCs ...................................................................................................18 4. RESULTS .........................................................................................................................20 4.1. Descriptive Statistics of the Deep-Learning-Based and Linear-Regression-Based Earnings Forecasts.............................................................................................................20 4.2. Comparison of the Deep-Learning-Based and Linear-Model-Based Earnings Forecasts ..........................................................................................................................................23 4.3. Performance of Deep Learning Model and Linear Regression on Estimating Composite ICC ...................................................................................................................................26 4.4. Comparison of the Individual Deep Learning Model-based ICC and the Individual Linear Model-based ICCs ..................................................................................................28 4.5.Realized Returns, ICCs, and Firm Characteristics ........................................................32 5. ADDITIONAL EVIDENCE..............................................................................................38 5.1. Extra Predictive Power of Deep Learning on Earnings ................................................38 5.2. Comparison of Deep Learning and Analysts on Earnings Forecasts ............................40 5.3. Deep Learning Model with the Common Variables in the Linear Model ..................... 43 6. CONCLUSION .................................................................................................................46 BIBLIOGRAPHY .....................................................................................................................48 CHAPTER 2. Reputational Spillovers and Information Production by Analysts ........................54 1. INTRODUCTION .............................................................................................................55 2. LITERATURE REVIEW ..................................................................................................59 2.1. Financial Misconduct..................................................................................................59 2.2. Gender Difference ......................................................................................................60 3. DATA AND EMPIRICAL METHODOLOGY .................................................................62 3.1. Data ............................................................................................................................62 3.2. Empirical Methodology ..............................................................................................64 4. RESULTS .........................................................................................................................67 4.1. Comparison of Average CAR Value Before and After Buy-Side Analysts' Misconducts ..........................................................................................................................................67 v 4.2. Comparison in One Year Before and After the Misconduct Dates ...............................73 4.3. Comparison of Non-Misconduct Related Recommendations .......................................79 5. CONCLUSION AND FUTURE POTENTIAL DIRECTIONS ..........................................86 BIBLIOGRAPHY .....................................................................................................................88 vi LIST OF TABLES Table 1: Descriptive Statistics of the Variables in the Deep Learning Model. ............................16 Table 2: Descriptive Statistics of Estimated Earnings. ...............................................................22 Table 3: Comparison of Deep Learning's Forecasts and Model's Forecasts on Earnings............. 24 Table 4: The Predictability Power of Composite ICC on Future Returns, Based on Deep Learning and Linear Model. ......................................................................................................28 Table 5: Individual ICCs and Realized Returns. ........................................................................31 Table 6: Realized Returns, Composite ICCs of Deep Learning, and Firm Characteristics. ......... 34 Table 7: Efficiency of Deep-Learning-Based Earnings Per Share (EPS). ...................................40 Table 8: Comparison of Analysts' Forecasts and Deep Learning's Forecasts on Earnings. .......... 42 Table 9: Comparison of Deep Learning and Linear Model, with the Same 6 Independent Variables. ..................................................................................................................................45 Table 10: Comparison of Average CAR Value Before and After Buy-Side Misconducts. ..........67 Table 11: Comparison of Average CAR Value Before and After Buy-Side Misconducts, by Gender. .....................................................................................................................................70 Table 12: Moving All Misconducts Dates 1 Year Earlier ...........................................................73 Table 13: Moving All Misconducts Dates 1 Year Later. ............................................................75 Table 14: Moving All Misconducts Dates 1 Year Earlier, by Gender. ........................................77 Table 15: Moving All Misconducts Dates 1 Year Later, by Gender. ..........................................78 Table 16: Comparison of Average CAR Value of Brokerage Without Misconducts. ..................81 Table 17: Comparison of Average CAR Value of Brokerage Without Misconducts, by Gender.83 vii LIST OF FIGURES Figure 1: An Example of the Operation Process of a Single Neuron. .........................................13 Figure 2: The Basic Structure of a Deep Learning Model ..........................................................14 Figure 3: An Example of the Financial Misconduct Report........................................................63 viii CHAPTER 1. The Implied Cost of Capital: A Deep Learning Approach 1 1. INTRODUCTION Estimating a firm’s expected returns is a key issue in finance and accounting, and implied cost of capital (ICC), as a popular proxy of expected returns, has been widely applied in literature.1 Prior researchers have largely relied on time-series estimates (e.g., Ball and Watts, 1972; Brooks and Buckmaster, 1976; Brown and Rozeff, 1978; Myers, Drake, Bradshaw, and Myers, 2012) and/or analyst forecasts on earnings as inputs into the implied cost of capital (or expected returns) calculation. However, there are some important limitations to these approaches, including limited data availability on analyst forecasts and substantial noise when relying on time-series estimates. In a major step forward, Hou, van Dijk, and Zhang (2012) offer a model-based framework to estimate future earnings and a firm’s corresponding implied cost of capital (or expected returns). This approach has proven quite popular and has been relied on in several recent studies. In this paper, I follow the principles behind the Hou et al. (2012) approach by considering a model-based method for predicting earnings. Instead of relying on the classic linear regression model, I use deep learning techniques trained on common accounting information to offer a substantial improvement to the earnings prediction model. Since both simple regression models and the complicated thought processes of analysts appear to uncover important elements of a firm’s earnings dynamics, my hope is that a computer trained to think in some ways like the unstructured brain of an analyst but with more discipline (e.g., no optimism bias) may offer the best of both 1 Previous literature about implied cost of capital (ICC) estimation includes Gordon and Gordon (1997), Gebhardt, Lee, and Swaminathan (2001), Claus and Thomas (2001), Easton (2004), Ohlson and Juettner-Nauroth (2005), etc. Implied cost of capital has been treated as a good proxy of expected returns (Pástor, Sinha, and Swaminathan, 2008; Li, Ng, and Swaminathan, 2013), and it has been applied to study the relevance of expected returns to various types of risk (Lee, Ng, and Swaminathan, 2009; Chava and Purnanandam, 2010; Hwang, Lee, Lim, and Park, 2013; Dhaliwal, Judd, Serfling, and Shaikh, 2016; Kalev, Saxena, and Zolotoy, 2019), labor mobility (Donangelo, 2014), information asymmetry (El Ghoul, Guedhami, Ni, Pittman, and Saadi, 2013), political factors (Boubakri, Guedhami, Mishra, and Saffar, 2012; Boubakri, El Ghoul, and Saffar, 2014), option trading (Naiker, Navissi, and Truong, 2013), financial constraints (Campbell, Dhaliwal, and Schwartz, 2012), etc. A list of top finance and accounting journal papers using ICC as a dependent variable of analysis can be found in Lee, So, and Wang (2020). 2 worlds and therefore a substantially improved prediction ability. Recent advances in deep learning, a subbranch of machine learning techniques in artificial intelligence, appear ideally suited to this task. In particular, the deep neural network approach simulates the activity of neurons in the human brain when reacting to informational inputs such as quantitative data, text, or imagery. While current models of this type are far less sophisticated than the human brain, they offer a potential compensating advantage when it comes to prediction in that various types of human subjectivity bias can be eliminated. Thus, the performance of these types of models in predicting earnings and estimating the associated implied cost of capital is ultimately an empirical question. To conduct this analysis, I generate deep-learning-based earnings forecasts for up to five future years and calculate the implied cost of capital using these estimates of earnings. I show that my derived earnings forecasts do offer marginal explanatory power above and beyond analyst forecasts. Moreover, the deep-learning forecasts display substantially less bias than the human analyst forecasts. When I turn to predicting returns and the implied cost of capital from my earnings forecast model, I find that my deep-learning-derived estimates significantly outperform the linear-model-based estimates of Hou et al. (2012). This study contributes to the literature in multiple ways. First, this is the first study that generates an implied cost of capital by an approach that has both model-based features and human brain-based features. Prior studies do not allow these two types of features to be combined. Deep learning is a machine learning approach that is ideally suited to this task, as it is an artificial intelligence method for combining the powerful yet unstructured creative nature of human thinking with the discipline of computer-directed and structured optimization. Thus, one contribution of this study is to illustrate in one specific context the promise of this approach more generally to 3 address important issues in finance and accounting. A second contribution of this study is to offer a substantial step forward in more accurately estimating a firm’s ICCs. In particular, I show that 38 of the most common accounting items can be used by a deep neural network to generate earnings predictions that lead to an ICC that predicts returns substantially better than prior model-based approaches. Thus, any researcher interested in estimating an implied cost of capital would be well served by using the techniques explored in this study. An additional contribution of this study is that I establish that behavioral factors in human thinking that lead to biases in predictions by analysts can be solved by deep-learning methods. While Hou et al. (2012) have established that linear regression models can also lead to bias-free predictions, I show that the data can be used much more richly and in a human-like way while still eliminating this flaw in true human thinking. Thus, machine learning has the potential to richly and creatively process information while placing structure and discipline on the information aggregation process. Finally, my study shows that machine learning and deep learning can be used in prediction contexts related to real activities by firms rather than solely financial market predictions. While prior authors have explored deep learning models in the areas of asset pricing, portfolio management, and mortgage risk, the promise of this approach in predicting the outcome of a firm’s real activities (earnings, investment, growth) has not been widely recognized. Hopefully this study will lead to additional work in which predicting the values of real variables is an important underlying economic issue. The rest part of the article is organized in the following structure. Section 2 reviews the literature, while Section 3 introduces the data, models, and empirical methodology. In Section 4, I 4 present the main results regarding the performance of the deep learning models, both in an absolute and relative sense, in forecasting earnings and in estimating the implied cost of capital. Section 5 reports a set of additional evidences and robustness checks, while Section 6 will reach to conclusions. 5 2. LITERATURE REVIEW 2.1. Implied Cost of Capital The expected return is a major topic in research related to firm resources allocations or decision making. Since the ex ante expected returns are unknown, in the past, researchers use historical realized returns to forecast expected returns based on time-series models, but academic researches (e.g., Froot and Frankel, 1989; Elton, 1999) indicate that the expected returns by time- series models do not work well, majorly since ex post realized returns are noisy. Besides, for the new firms with limited historical information, it is hard to get a proper estimate on expected return based on non-sufficient ex post realized returns. Starting from Botosan (1997), researchers widely use the implied cost of capital (ICC) as a new proxy of the expected return. ICC is the discount rate equating the share price and the present value of expected future cash flows. It is appeal to people because of the present value relation involved as well as the use of forecasts of firms’ fundamentals in the future. According to Wang (2017), more than 70 papers from 1997 to 2016 have been published in the top journals of finance and accounting. In recent studies, researchers use ICCs as the proxy of expected returns to study the relevance of expected returns to regulations or acts (Ashbaugh-Skaife, Collins, Kinney, and Lafond, 2009; Dhaliwal, Krull, and Li, 2007) , disclosure (Botosan, 1997; Botosan and Plumlee, 2002; Francis, Khurana, and Pereira, 2005b; Francis, Nanda, and Olsson, 2008), option trading (Naiker et al., 2013), risk (Chava and Purnanandam, 2010; Hwang et al., 2013; Dhaliwal et al., 2016), financial constraints (Campbell et al., 2012), information asymmetry (El Ghoul et al., 2013), auditor’s characteristics (Chen, Chen, Lobo, and Wang, 2011; Khurana and Raman, 2006; Krishnan, Li, and Wang, 2013), conservatism (García Lara, García Osma, and Penalva, 2011), tax (Dhaliwal, Krull, Li, and Moser, 2005; Dhaliwal, Heitzman, and Li, 2006; Goh, Lee, Lim, and 6 Shevlin, 2016), and internal control weakness (Ogneva, Subramanyam, and Raghunandan, 2007). Since ICC is an important proxy of expected returns in academic research, a good way of estimating ICC may affect the results of research a lot. Majorly, the studies treat forecasted earnings of financial analysts as expected future cash flows and calculate ICC. However, analysts’ forecasts contain not only objective information, but also the subjective judgments from financial analysts, which may lead to biased results on the estimation of ICC. Also, since financial analysts will only do forecasting for a limited number of firms, using analysts’ forecasts may limit the coverage of firms. Due to considerations such as bias, firms coverage, and mixed results on the performance of forecasting future realized returns, Hou et al. (2012) introduce an innovative method on estimating ICC. They use a cross-sectional model to get estimated earnings first and then calculate ICC using the average of ICCs from five commonly use ICC estimation methods. By doing the cross-sectional estimation, they reduce the level of bias on earnings estimates comparing to analysts’ forecasts, increase the range of firms covered in the analysis, and improve the performance of estimated ICCs on forecasting the future realized returns. This cross-sectional linear model-based method has been widely used in recent studies. 2.2. Deep Learning Machine learning is a set of algorithms that can provide machines ability to learn by themselves without being explicitly instructed or programmed. The phrase “Machine learning” was first came up by Arthur Samuel in 1952. In the long history of development, there are lots of machine learning techniques applied in the real world, including business, health industry, and security, etc. Traditional machine learning techniques include Bayesian model averaging, random forest, nearest neighbors, support vector machine, and neural networks, etc. For different purposes, people rely on either linear machine learning algorithms (for instance, LASSO and ridge 7 regression), or non-linear machine learning algorithms (for example, decision trees) for more complex tasks. Based on the characteristics of each machine learning algorithms, people use machine learning algorithms for pattern forecasting, regression, or classification purposes. In recent years, artificial intelligence and machines are applied in various areas, and automated valuations start playing an important role in the capital market. People use machine learning and artificial intelligence techniques in areas such as asset pricing (Bianchi, Büchner, and Tamoni, 2019; Brogaard and Zareei, 2018; Ye and Zhang, 2019; Bloch, 2019), credit risk management (Altman, Marco, and Varetto, 1994; Kim and Sohn, 2004; Abdou, Alam, and Mulkeen, 2014; Khandani, Kim, and Lo, 2010; Bonelli, Figini, and Giovannini, 2017), exchange rate prediction (Peng and Albuquerque, 2019) and fraud detection (Bertomeu, Cheynel, Floyd, and Pan, 2019; Bao, Ke, Li, Yu, and Zhang, 2019), etc. For example, Ding, Lev, Peng, Sun, and Vasarhelyi (2019) show that machine learning can improve managerial estimates. They use machine learning techniques to generate loss estimates using the data of insurance companies and conclude that the estimates by machines are better than the managers’ actual estimates. Though machine learning techniques share common characteristics like self-learning and updating ability, the operating principles behind each of them are various. However, moving into the era of big data, people need to deal with tasks with huge amount of data and more complex analysis structures, and traditional machine learning algorithms are facing some obstacles in application. For example, most of the time, people need to manually determine which features (or variables) to be used before the construction of traditional machine learning models. This feature creation process, or feature engineering, generally depends on the knowledge in the certain area that we are applying the machine learning model. For example, a doctor diagnoses a patient as a certain illness based on symptoms, and symptoms here are features 8 used to make a judgement. But, professional knowledge based on the past experience may not always be a good source of finding the best features for the machine learning model, and human beings have the limitation on time, ability and energy to perfectly deal with the huge amount of data. Thus, people start to consider a new algorithm which can not only complete the learning process as traditional machine learning algorithms, but also efficiently do feature engineering works instead of human beings. To mimic the complicated human actions like feature engineering, people may think about an algorithm which can closely simulate the operation of brain. Neural network is a machine learning technique borrowing the features of neurons of creatures, and deep neural network, the deep learning algorithm discussed in this paper, is a more complicated version of neural networks and an advanced machine learning technique mimicking the operating principles of creature neurons and human brains. Different from traditional neural networks, deep neural network generally includes multiple hidden layers between an input layer and an output layer, which may analyze input data with much more dimensions. With more layers and neurons, deep neural network can transform the information through a complicated network structure in a way imitating the decision making process of human beings. According to Zhou and Feng (2017), the layer-by- layer processing principle and representation learning ability are the keys of success of deep neural networks in different areas. These two characteristics allow deep learning models to generate new features in the learning process without the involvement of human beings and ensure the feature transformation within the model structures, which is more efficient than traditional machine learning algorithms. Besides, the multilayer structures and the choices of activation functions at neuron level provide stronger ability to analyze big data. These advantages make deep learning algorithm a promising machine learning technique to deal with big data, reduce human 9 involvement, and provide better estimation once the model is well trained. Thus, in recent years, deep learning becomes one of the hottest machine learning algorithms applied in the real world, including its application in creating Google’s AlphaGo and detecting Covid-19 cases (Zheng, Deng, Fu, Zhou, Feng, Ma, Liu, and Wang, 2020). But in business area, based on current literature, this technique imitating neurons and human brains is employed only on portfolio management (Heaton, Polson, and Witte, 2017), mortgage risk management (Sirignano, Sadhwani, and Giesecke, 2018) and asset pricing (Chen, Pelger, and Zhu, 2019; Messmer, 2017). My study in this paper will be the first one applying deep learning techniques to earnings forecasts and ICC estimation. 10 3. DATA AND EMPIRICAL METHODOLOGY 3.1. Data The data related to firms comes from Compustat and CRSP. The study includes firms from NYSE, Amex, and Nasdaq, and sharecodes should be 10 or 11. All firms in the sample should be available in both Compustat and CRSP, and the range of data is from 1962 to 2018. When constructing the accounting variables for the deep learning model, I first rank the frequency of all accounting items in Compustat across the whole time period and then pick up the most frequent seventy items. Next, I calculate the correlation between each pair of the seventy variables and rule out the variable with a lower level of frequency if the two variables in the pair have a correlation higher than 0.9. In the end, there are 38 variables from Compustat left in the sample. For the variables used in estimating the linear model-based earnings and ICCs, I follow the definitions in Hou et al. (2012) and construct the variables for the pooled cross-sectional regressions. I also obtain forecasted earnings of financial analysts and actual earnings of firms from I/B/E/S. 3.2. Deep Learning Model In this paper, the deep learning model I employ is a deep neural network. A deep neural network is a type of machine learning technique that imitates the operating process of human brains. Different from general neural networks, deep neural networks have more than two layers, meaning that there will be at least one layer between input layer and output layer, leading to a more sophisticated estimation process. Once the input data comes available, the information will be passed through the “neurons” on different layers and finally reach to the output layer, showing the output of the whole analysis process. With enough training data, deep neural networks can adjust by itself and provide judgments on the final output results like human beings. As indicated in Amel-Zadeh, Calliess, Kaiser, and Roberts (2020), the deep learning 11 approach is a model that maps independent variables to a predicted value with a set of parameters: 𝑦𝑦� = 𝑓𝑓(𝑥𝑥; Θ). In particular, Θ represents a set of weights and biases, and the formula can be rewritten as 𝑦𝑦� = 𝑓𝑓 (𝑥𝑥; 𝑤𝑤, 𝑏𝑏), in which w represents weights and b represents biases. Learning means the process to find the optimal values of w and b so that 𝑦𝑦� = 𝑓𝑓 (𝑥𝑥; 𝑤𝑤, 𝑏𝑏) can be the best estimation comparing to the actual value y. Thus, the general process is using the training data to find the optimal values of w and b first and utilizing the input data to calculate the forecasts. In this paper, the number of inputs of a single unit (or neuron) is determined by the number of neurons on the previous layer. For instance, the single neuron on the first hidden layer will have 38 input variables from the input layer of the model. Each single neuron has a non-linear activation function, which will combine the inputs and conduct an output of this neuron for next layer. The process is shown in Figure 1. There are many common forms of activation functions (e.g., Sigmoid, Tanh, Relu, etc., according to Zhou (2019)). I use “Relu” as the form of the activation function, which is the commonly used activation function, and the activation function used here is: 𝑎𝑎 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0, 𝑧𝑧) (1) Thus, the output value from a single neuron i equals to: 𝑛𝑛 𝑎𝑎𝑖𝑖 = 𝑚𝑚𝑚𝑚𝑚𝑚 (0, � 𝑤𝑤𝑖𝑖𝑖𝑖 𝑥𝑥𝑗𝑗 + 𝑏𝑏𝑖𝑖 ) (2) 𝑗𝑗=1 where 𝑎𝑎𝑖𝑖 is the output value of neuron i, 𝑤𝑤𝑖𝑖𝑖𝑖 is the weight between input j and neuron i, 𝑥𝑥𝑗𝑗 is the value of input j, and 𝑏𝑏𝑖𝑖 is the bias. 12 Figure 1: An Example of the Operation Process of a Single Neuron Once a neuron conducts an output value, the output becomes one of the inputs for the neurons on the next layer. After discussing the principles behind single neurons, we focus on the full picture of the model shown in Figure 2. The deep learning model I applied in this research include two hidden layers between input layer and output layer. The input layer consists of the 38 accounting items from financial statements. The information is passed through the “links” between each neuron following the principles introduced in Figure 1, while each link is related to a specific weight. An estimated value 𝑦𝑦� can be calculated at the output layer. The above procedure from inputs to 𝑦𝑦� is called “forward propagation.” At the very beginning, when the deep neural network is an initial one or “a blank brain,” the weights are assigned randomly (e.g., all weights can be an equal value). During the training process, once 𝑦𝑦� is calculated using inputs from training data, the machine will apply mean squared error (mse) as the loss function to compare the estimated value 𝑦𝑦� with the actual value y, and will use optimizer to update weights and biases in 𝑓𝑓(𝑥𝑥; Θ). The optimizer used in this research is RMSprop, an optimizer similar to the gradient descent algorithm 13 with momentum. After repeating the training process above, a deep learning model with optimal parameters can be reached, and earnings forecasts can be calculated based on the model and new input data. Figure 2: The Basic Structure of a Deep Learning Model Starting from 1968 to 2018, for each year, I utilize the data of the past 10 years as training data and construct five deep learning models in five different earnings horizons (from one year ahead to five years ahead) for the year t. Taking the information availability issue into consideration, I apply a 3-month lag on the financial reporting, meaning that I treat the accounting data (including actual earnings in training data set) of firms having fiscal years located in the range from April of previous year to March of current year as the data for current year. Thus, during the training process, to estimate the models for year t, both accounting items and actual earnings from 14 financial statements should be available before the March of year t. I continue this process to construct deep learning models for each year t in a 10-year rolling base. After the training of models for each year t, still considering the information availability issue, I put the accounting item numbers of companies with fiscal years within the range from April of previous year t-1 to March of current year t into the deep neural network models for year t to compute earnings forecasts of firms in year t + λ (λ=1 to 5). For each firm, in each year, I generate the estimated earnings for up to five future years. Since deep neural network models in this study require non- missing observations for all the 38 independent variables, to maximize the coverage of firms in the estimation, I choose the 38 independent variables based on the frequency of available observations and the correlations among variables. Table 1 presents a list of accounting items used in the deep learning model and descriptive statistics. 15 Table 1: Descriptive Statistics of the Variables in the Deep Learning Model 16 Table 1 (Cont’d) 17 3.3. The Benchmark: Cross-sectional Linear Regression Following the steps introduced in Hou et al. (2012), from 1968 to 2018, I use the past 10- year data to do pooled cross-sectional linear regressions for each year t: 𝐸𝐸𝑖𝑖,𝑡𝑡+𝜆𝜆 = 𝛼𝛼0 + 𝛼𝛼1 𝐴𝐴𝑖𝑖,𝑡𝑡 + 𝛼𝛼2 𝐷𝐷𝑖𝑖,𝑡𝑡 + 𝛼𝛼3 𝐷𝐷𝐷𝐷𝑖𝑖,𝑡𝑡 + 𝛼𝛼4 𝐸𝐸𝑖𝑖,𝑡𝑡 + 𝛼𝛼5 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑖𝑖,𝑡𝑡 + (3) 𝛼𝛼6 𝐴𝐴𝐴𝐴𝑖𝑖,𝑡𝑡 + 𝜖𝜖𝑖𝑖,𝑡𝑡+𝜆𝜆 where 𝐸𝐸𝑖𝑖,𝑡𝑡+𝜆𝜆 is the firm i’s earnings in year t + λ, for λ= 1, 2, 3, 4, or 5; 𝐴𝐴𝑖𝑖,𝑡𝑡 denotes the total assets of year t from Compustat; 𝐷𝐷𝑖𝑖,𝑡𝑡 is the dividend payment of year t; 𝐷𝐷𝐷𝐷𝑖𝑖,𝑡𝑡 equals to 1 if firm i is a dividend payer for year t, and equals to 0 if not; 𝐸𝐸𝑖𝑖,𝑡𝑡 is the earnings of year t; 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑖𝑖,𝑡𝑡 equals to 1 if firm i has negative earnings for year t, and 0 if not; 𝐴𝐴𝐴𝐴𝑖𝑖,𝑡𝑡 represents accruals in year t. Also, based on the consideration of accounting information availability, a three-month lag is applied in the calculation of earnings forecasts. For each year t and each firm i, I multiply the accounting numbers from firms whose fiscal years are located within the range of April of the previous year t − 1 and March of the current year t with the coefficients from the linear model above for a certain year t to calculate the forecasted earnings on year t + λ, for λ= 1 to 5. 3.4. Computing the ICCs The ICC is the discount rate equating the security share price to the sum of present values of future cash flows. In the past literature, there are mainly five methods of calculating ICC. Following Hou et al. (2012), in this study, I use the average of ICCs calculated by the five most commonly used methods: Gordon and Gordon (1997), Gebhardt et al. (2001), Claus and Thomas (2001), Easton (2004), and Ohlson and Juettner-Nauroth (2005). I first calculate the five ICCs individually using the future five years forecasted earnings 18 from both deep learning and linear regression. The market equity and stock price at the end of each June are also used when doing calculation. After getting the five ICCs for both deep learning model and linear regression model, I compute the “composite” ICC, the equal-weighted average of these five Implied Cost of Capital values for each firm and each year, for both deep learning model and linear regression model. Since in some cases, the equation can not get a converged solution using the earnings forecasts and other accounting variables, I may not generate five available ICCs for each firm and each year. I will take an observation into account if there is at least one implied cost of capital value available and calculate the “composite” ICC only considering available ICCs for the firm on year t. 19 4. RESULTS 4.1. Descriptive Statistics of the Deep-Learning-Based and Linear-Regression-Based Earnings Forecasts Table 2 includes the averages across years of mean and median of one-year, two-year, and three- year ahead forecasted earnings estimated by the deep learning model (Panel A) and Hou et al. (2012) linear regression (Panel B) in each period listed, and it also includes the correlation between earnings forecasts from deep learning and linear regression (Panel C). When constructing the table, the market equity at the last day of June for each firm is used to scale the earnings forecasts from both models. Although extreme earnings forecasts for certain firms affect the mean of the estimated earnings, the results in median columns in Panel A show that there is a decreasing trend of earnings forecasts from late 1970, and the trend reverses for the period after 2008, both for deep learning based forecasts and for linear regression based forecasts. In general, the medians of earnings forecasts estimated by linear regression are larger than the ones estimated by deep learning model for all the three horizons in each period after late 1970, showing that the linear model provides more optimistic earnings forecasts comparing to the deep learning model. However, to check which model can provide relatively less biased or better estimates, further tests using actual earnings are needed, and more results will be shown in the following subsection. Table 2 also indicates the number of firms with estimated earnings by deep learning model in Panel A and the number of firms with estimated earnings by the linear model in Panel B in each defined period. For the deep learning model, the number of firms has increased from around 2500 in the 1968-1973 period to almost 7000 at the end of 20 century, and decreased to a level of around 4000 in recent years. The linear model has a similar pattern, but it covers more distinct firms in every sub-period. The major reason behind is that for the deep learning model, all the 38 variables 20 need to be non-missing to provide an estimate on earnings, while for the linear model, I only need the information from six variables to do the estimation. Obtaining earnings forecasts using the deep learning model needs more available data than doing so using the linear model. Panel C shows the correlations between the deep learning based forecasted earnings and the linear regression based forecasted earnings. Since both models are relying on the quantitative information from financial statements, correlations between earnings forecasts are generally higher than 0.90, except the correlations between two-year horizon earnings forecasts from deep learning and other forecasts (from 0.74 to 0.94). To further check which method can generate better estimates of earnings and ICCs, I will discuss the analysis on earnings forecasts and ICCs in the following sections. 21 Table 2: Descriptive Statistics of Estimated Earnings This table includes descriptive statistics of the one-year, two-year, and three-year ahead estimates of earnings generated by deep learning (Panel A) and the linear forecast model (Panel B), and the correlations between the deep-learning-based forecasts and the linear- regression-based forecasts (Panel C). Et+1, Et+2, and Et+3 represent the one-year, two-year, and three-year ahead earnings forecasts. They are divided by market equity at the end of June of each year for scaling purposes. N is the number of distinct companies for which the forecasts are obtainable during each period. The sample period is 1968-2015. 22 4.2. Comparison of the Deep-Learning-Based and Linear-Model-Based Earnings Forecasts Table 3 includes a comparison of the performance of deep-learning-based earnings forecasts to the performance of linear-model-based forecasts. I generate averages of the mean as well as median forecast bias, accuracy, and the annual earnings response coefficients across all years from 1968 to 2018, and the difference between the two sets of earnings forecasts. I use this set of sample because the generating process of earnings forecasts needs past ten years’ sample observations for each year, and earnings forecasts as well as three years’ actual earnings are included in this analysis. The comparison is based on a common firm-year range, meaning that all the observations included in the comparison have available actual values (income before extraordinary items) from Compustat and available earnings forecasts from both deep learning and cross-sectional linear regression. Each comparison includes one-year, two-year, and three-year periods as the horizons. 23 Table 3: Comparison of Deep Learning's Forecasts and Model's Forecasts on Earnings This table compares the bias, accuracy and annual earnings response coefficient (ERC) of deep learning’s forecasts on earnings and linear regression model’s earnings forecasts. Bias equals actual earnings minus earnings forecasts, divided by market equity at the end of June each year for scaling. Accuracy refers to the absolute value of forecast bias. Annual ERC is the coefficient by regressing buy- and-hold returns on forecast bias. To make annual ERCs of deep learning model and linear model comparable, I also standardize the bias by one unit variance before the regression. All numbers are calculated in time-series averages. The sample is from 1968 to 2018. Panel A includes averages of mean as well as median of bias across years. Panel B includes averages of mean as well as median of accuracy across years. Panel C includes averages of annual ERC. The numbers under each average are the t-statistics. 24 According to definitions from Hou et al. (2012), forecast bias equals actual earnings minus forecasted earnings, divided by market equity value of the end of June for scaling purposes. In Table 3, the biases calculated by the earnings forecasts of deep learning model and linear model are all negative for one-year, two-year and three-year horizons, showing that the two models provide optimistic forecasts on average comparing to the realized earnings. Relying on the same sample base, the deep learning models provide less biased earnings forecasts (-0.0498, -0.0274, - 0.0184 for one-year, two-year, and three-year ahead forecast), comparing to the earnings forecasts by the linear regression (-0.0740, -0.0852, -0.1153 for one-year, two-year, and three-year ahead forecast), based on the average of means for each year. The differences between the forecasting bias of the two approaches are 0.0242 for one-year ahead forecast, 0.0578 for two-year ahead forecast, and 0.0968 for three-year ahead forecast. The results indicate that the estimates of earnings by deep learning are less biased than those by the linear model for all the three horizons. When using the average of medians across years instead, I reach a consistent conclusion. The forecast accuracy, discussed in Panel B, is the absolute value of forecast bias. For deep learning model, the time-series averages of the means of the forecast accuracy are 0.1601 for one- year horizon, 0.1995 for two-year horizon, and 0.2138 for three year horizon, while for the linear regression, the averages are 0.1837, 0.1984, and 0.2224 for the three horizons. The differences between the average of the means of forecast accuracy are -0.0237 for one-year ahead estimation, 0.0011 for two-year ahead estimation, and -0.0086 for three-year ahead estimation, showing that the deep learning model performs significantly better for one-year horizon (t-stat of the difference is -4.07). For the two-year and three-year horizon, the averages of earnings forecasts accuracy are very close for both the two models. The earnings response coefficient (ERC), representing the response of the market on the 25 difference between the actual value and forecasted value, is a better way of measuring the quality of earnings forecasts. Following previous literature, I apply the “annual ERC” approach defined in Hou et al. (2012) in the article. It checks if earnings forecasts can meet the expectation of the market. I regress the buy-and-hold returns on forecast bias, and the coefficients are the annual ERCs. To make the annual ERCs from the two models comparable, I standardize forecast bias by one-unit variance before doing regression. Then, I take the average of the means of annual ERCs for each year. For the deep learning model, the averages are 0.0624 for one-year horizon forecast, 0.2034 for two-year horizon forecast, and 0.3506 for three-year horizon forecast, which are larger than those from the linear model (0.0501 for one-year horizon, 0.1522 for two-year horizon, and 0.2531 for three-year horizon). The differences between the averages are 0.0123, 0.0512 and 0.0975 for one-year, two-year and three-year ahead forecasts, which are all positive values, showing that the market reactions to the unexpected earnings are larger for the deep learning model. Because the difference between actual earnings and forecasted earnings causes a larger magnitude of reactions from the market, it shows that the estimates from the deep learning model match the expectation of the market better, and the earnings forecasts by the deep learning model proxy the expected earnings better. 4.3. Performance of Deep Learning Model and Linear Regression on Estimating Composite ICC Following Guay, Kothari, and Shu (2011) and Hou et al. (2012), I measure the performance of deep learning model and linear regression on estimating composite ICC by evaluating the ability of the composite ICC on predicting future realized returns. Since implied cost of capital is a proxy of expected returns, a good ICC estimate should be able to predict the real returns in the future. After estimating the earnings forecasts, I generate individual ICCs and composite ICC using the earnings forecasts from deep learning and linear regression model. In the following 26 analysis, I group firms into ten deciles according to the rank of their composite ICCs at the end of June of each year, in which group 1 includes firms having the lowest ICC while group 10 has firms with the highest ICC. Each group is treated as a portfolio. Next, I calculate the equal-weighted buy-and-hold returns of each group for each year, starting from July of the current year to June of the next year. Then, I compute the average of all the equal-weighted buy-and-hold returns across all years in the sample for each portfolio separately. Since it is equal-weighted average returns, the result can be easily affected by extreme return values from specific firms, such as small firms. To avoid this effect, I trim the annualized returns for each year at the 1st and 99th percentiles (excluding the extreme values beyond percentiles). As a proxy of expected returns, ICC should have the forecasting power for future returns. Thus, we are expected to see a higher average return for the decile with a higher rank of ICC, and the monotonically increasing trend should be more obvious for the model with better estimates of ICC. In Table 4, comparing the results from deep learning and linear regression, the study shows that the difference between group 10 and group 1 for deep learning model is 0.0685 (0.1513- 0.0828), while the difference for linear regression model is 0.0438 (0.1336-0.0899), for the annualized buy-and-hold return between year t and year t+1. The difference for the deep learning model (0.0685) is larger than the one for the linear regression model (0.0438). When discussing the annualized buy-and-hold returns from year t to year t+2, the difference for deep learning model (0.0873-0.0159=0.0714) is still larger than that for linear regression model (0.0894- 0.0234=0.0659), though the magnitude is much smaller than the one for one-year horizon (0.0714- 0.0659=0.0055 for two-year horizon). For three-year horizon, though the result shows that the performance of the linear regression is slightly better than the deep learning, the magnitude is very small (0.0719-0.0711=0.0008). Thus, from the spread of group 10 and group 1, the deep learning 27 model can generate earnings forecasts which will contribute to better estimated ICCs on average for a relatively short term (one-year horizon), and for a relatively long term (two-year and three- year horizon), earnings forecasts from the two models will generate ICCs with similar predicting power for the future realized return. Besides, comparing to the average returns of portfolios in linear regression case, the average returns for deep learning model are in a relatively monotonic increasing pattern, especially for one-year horizon, showing that the deep-learning-based ICC has a better predictive ability than the one from linear regression on average. Table 4: The Predictability Power of Composite ICC on Future Returns, Based on Deep Learning and Linear Model The table compares the averages of annualized equal-weighted average buy-and-hold returns across years for each decile. Firms in each decile are ranked by the composite ICC from deep learning and linear model at the end of June each year, and are reclassified on each June. Time- horizons include one-year, two-year, and three-year ahead, and the range of the sample is from 1968 to 2018. 4.4. Comparison of the Individual Deep Learning Model-based ICC and the Individual Linear Model-based ICCs I follow the same principle behind the results of Table 4 to conduct Table 5 using the five types of individual ICCs from both the deep learning and the linear regression. In Panel A, I calculate the correlations between individual ICCs from deep learning model and linear regression. The correlations between the individual ICCs from the linear model are higher than the associated 28 correlations from deep learning. The correlations from the linear model are ranging from 0.6656 (OG and CT) to 0.9048 (Gordon and OJ), while the correlations from the deep learning model are from 0.2367 (OG and GLS) to 0.6083 (MPEG and CT). Also, considering the correlations between the deep learning based ICCs and linear model based ICCs, I find that excepting OJ, for all the other four methods, the correlations between different methods under the same deep learning model are lower than the correlations between the same method under the two different models. More specifically, the correlation between GLS under deep learning and GLS under linear model is 0.5298, which is higher than the correlations between GLS under deep learning and other methods under deep learning (from 0.2367 to 0.4261). The rest three methods are in the same pattern (0.7035 vs. from 0.4124 to 0.6038 for CT, 0.6435 vs. from 0.3160 to 0.6083 for MPEG, 0.6427 vs. from 0.3160 to 0.5588 for Gordon). The results indicate that, at least under deep learning model, the choice of ICC estimation method matters, and taking average of ICCs is necessary for the analysis in this study. Besides, the correlations between composite ICC and individual ICCs are higher than the correlations between individual ICCs. For the deep learning model, the correlations are from 0.5857 (between composite and GLS) to 0.8715 (between composite and OJ), while for the linear model, the correlations are from 0.8558 (between composite and CT) to 0.9259 (between composite and Gordon). The potential reason behind the improvement of correlations may be the smaller number of available individual ICCs observations comparing to composite ICC. The results show that the composite ICC can better represent the implied cost of capital in this research comparing to the five individual ICCs solely. Panel B shows the results of the predictability power tests applied in Table 4 applying individual deep learning model-based ICCs, while Panel C shows the results using individual linear model-based ICCs. Only results of 10-1 spreads are included in the two panels. Comparing 29 the results from Panel B and Panel C, I find that for one-year period, except that of OJ method (0.0439 vs. 0.0476), the 10-1 spreads of the average returns for all other four individual deep learning model-based ICCs are larger than those associated spreads for the individual linear model- based ICCs accordingly, showing that not only the composite ICC, but also individual ICCs obtained by using the deep learning model have an equivalent or stronger predicting power on future real returns. This result further indicates that the deep learning model-based ICCs are better proxies of expected returns comparing to the linear model-based ICCs. For two-year and three- year horizons, the results are relatively mixed considering different individual ICCs. It shows that when evaluating the predicting power of ICC on expected returns for relatively longer horizons, specific methods matter for evaluating the performance of the deep learning model and the linear model. For example, when using GLS as the method of generating ICCs, the deep learning model will get ICCs with stronger predicting power on future realized returns for all the three horizons, while the ICCs from the linear model can outperform those from the deep learning model when applying MPEG as the method of calculating ICCs. The potential reasons behind include the lack of available estimates as well as the various structures of different methods. Thus, the composite ICC, taking the average of ICCs from the five methods and bringing the largest number of observations into the analysis, is more reliable in the analysis of the predicting power than individual ICCs. 30 Table 5: Individual ICCs and Realized Returns The table indicates the correlations between the individual implied cost of capital from different models and composite implied cost of capital (Panel A) and the means of the 10-1 return spreads across years related to the individual deep-learning-based ICCs (Panel B) and the individual linear-regression-based ICCs (Panel C). The correlation between any pair of ICCs is calculated using the same set of observations having available values for both ICCs. The sample period is from 1968 to 2018. 31 4.5. Realized Returns, ICCs, and Firm Characteristics In previous asset pricing literature, researchers mainly treat realized returns as the proxy of expected stock return to test if certain characteristics of firms can explain the variation of expected returns. Since in the previous sections, I reach to the conclusion that deep-learning-based implied cost of capital is a better proxy of expected returns, we should expect that the company’s characteristics which can explain the variation of realized returns should also be able to explain the variation of the deep learning model-based composite ICCs, if the variation of realized returns explained by those characteristics indeed represents the variation in expected returns. In this section, following Hou et al. (2012), I include 13 firm-level characteristics to do the tests. For each year, Beta is the market beta computed by the past sixty monthly returns (24 months availability is the minimum) for each stock in June. Size represents the natural logarithm of market equity at the last day of June. Leverage stands for the ratio of book debt to book equity. NOA is the ratio of net operating assets to lagged total assets. BE/ME is the natural logarithm of the quotient of book equity and market equity at the end of last fiscal year. CAPEX is generated from dividing capital expenditure by lagged total assets. Idiosyncratic volatility represents the standard deviation of the residuals calculated by the last sixty monthly returns (24 months availability is the minimum) applying the market model in each year’s June. Asset growth represents total assets growth rate. Accruals is the ratio of accruals to lagged total assets. Analyst coverage represents the number of analysts following a certain company in June. Analyst dispersion is the quotient of the standard deviation of analysts’ estimates and the stock price of the company at the last day of June. Earnings smoothness is the ratio of earnings volatility to operating cash flow volatility generated by last ten years’ data (five years availability is the minimum). Accruals quality is calculated according to a modified Jones (1991) method introduced in Francis, LaFond, Olsson, and Schipper 32 (2005a), utilizing a cross-sectional regression of accruals on one, the change in sales revenue, and gross property, plant, and equipment (PPE), after all dependent and independent variables being divided by lagged total assets for scaling purposes. I do the regression for each Fama and French (1997) industry which has twenty or more observations in a certain year. Then, I use the parameter estimates from the regression to multiply one, the difference between change in sales revenues and change in accounts receivable, and PPE respectively, and scale them by lagged total assets, to calculate the firm-level normal accruals. Accruals quality is the absolute value of abnormal accruals, which is the difference between actual accruals divided by lagged total assets and normal accruals. Following the steps of Hou et al. (2012), I put an additional negative sign in front of the original earnings smoothness and accruals quality values, making these variables better represent the smoothness of earnings and accruals quality of firms. I apply Fama-MacBech regressions of realized returns as well as the deep learning model- based ICCs on the 13 firm-level characteristics to check if these characteristics can explain the variation of realized returns and ICCs. The sample period is from 1968 to 2018 for both realized returns and ICCs. The composite ICCs are estimated on June each year, and realized returns are the returns from July of the current year to June of the following year. In Table 6, I list the averages of coefficients across years, as well as time-series Newey-West t-statistics, for different regressions. Panel A presents the results from regressing realized returns on the 13 characteristics separately, while Panel B includes those from regressing composite ICCs on those characteristics. 33 Table 6: Realized Returns, Composite ICCs of Deep Learning, and Firm Characteristics This table shows the means of the coefficients across years (as well as the time-series Newey-West t-stat) generated by Fama-MacBeth regressions of firm-level annual realized returns or deep learning model-based composite implied cost of capital on different firm characteristics annually. Panel A is for realized returns, while Panel B and C are for ICC. Firm characteristics are defined following Hou et al. (2012). The sample period is from 1968 to 2018. *, **, *** indicate significance at 10% level, 5% level, and 1% level separately. 34 The results from Panel A and Panel B can be categorized into three groups. First, both the realized returns and deep learning based ICCs have positive relations with five characteristics: BE/ME (0.044 for realized returns vs. 0.048 for deep learning based ICCs), leverage (0.001 for both realized returns and ICCs), idiosyncratic volatility (0.160 for realized returns vs. 0.741 for ICCs), analyst dispersion (0.290 for realized returns vs. 0.803 for ICCs), and earnings smoothness (0.016 for realized returns vs. 0.001 for ICCs). Within the characteristics above, BE/ME has significant coefficients for both realized returns (t-statistics of 5.24) and ICCs (t-statistics of 13.32), while leverage (t-statistics of 4.18), idiosyncratic volatility (t-statistics of 6.03), and analyst dispersion (t-statistics of 6.92) have significant coefficients in the regressions of ICCs only. Second, both realized returns and composite ICCs have negative relations with six characteristics, including size (-0.003 for realized returns and -0.043 for ICCs), CAPEX (-0.078 for realized returns vs. -0.083 for ICCs), asset growth (-0.036 for realized returns vs. -0.029 for ICCs), accruals (-0.079 for realized returns vs. -0.058 for ICCs), NOA (-0.016 for realized returns vs. -0.130 for ICCs), and analyst coverage (-0.002 for realized returns vs. -0.003 for ICCs). Especially, size has a significant coefficient in the regression of ICCs (t-statistics of -9.79) only; CAPEX has significantly negative coefficients in regressions of both realized returns (t-statistics of -3.14) and ICCs (t-statistics of -4.13); asset growth has significant coefficients in the regressions of both realized returns (t-statistics of -2.65) and ICCs (t-statistics of -2.60); accruals is a significant independent variable for both regressions (t-statistics of -2.51 for realized returns vs. t- statistics of -3.73 for ICCs); NOA is significant only in the regression of ICCs (t-statistics of - 3.33); analyst coverage (t-statistics of -8.21) has significant effect in explaining ICCs. Third, two characteristics have coefficients with different signs in the regressions of realized returns and composite ICCs. For market beta, the coefficients are -0.003 in the regression 35 of realized returns and 0.005 in the regression of ICCs, but both the relations are not significant (t- statistics of -0.28 for realized returns vs. t-statistics of 0.67 for ICCs). Another one is accruals quality, with positive coefficient (0.009) in the regression of realized returns but negative coefficient (-0.108) in the regression of composite ICCs. Only the coefficient in the regression of ICCs is significant (t-statistics of -3.88). From the results in Panel A and Panel B, at least three parts of conclusions can be reached. First, the firm characteristics having explanation power on ex post realized returns in the previous research can explain the cross-sectional variation of ICC, and the coefficients of 11 characteristics are in the same directions, though accruals quality is one exception out of the 13 characteristics. Market beta is another exception, but there is little evidence that market beta effect exists on both realized returns and ICCs. This result provides indirect evidence that deep learning based composite ICCs can be a good proxy of expected returns, since ICCs and future realized returns of firms can be explained by the same set of characteristics and have coefficients with the same sign for most cases. Second, the 11 characteristics other than market beta and earnings smoothness have significant effects on explaining the variation of ICCs, while for realized returns, only BE/ME, CAPEX, asset growth and accruals have significant effects. Third, the market participants require expected returns for owning the stocks of firms with small size, high book-market ratio, high leverage, high idiosyncratic volatility, high analyst dispersion, low accruals, low capital expenditure, low asset growth speed, low analyst coverage, low NOA, or low accruals quality. Among firms with these features, those with high book-to-market ratio, low CAPEX or asset growth, or low accruals have large realized returns, but not for firms with other features. Panel C includes regressions of deep learning ICCs on multiple independent variables simultaneously. The first regression has three independent variables: beta, size, and BE/ME. This 36 regression’s results strengthen the findings from the regressions with individual independent variables in Panel B: market beta has a positive but not significant coefficient; the coefficient of size is significantly negative, and BE/ME has a positive and significant relation with deep- learning-based implied cost of capital. For the following regressions, I regress ICCs on each of the rest characteristics after controlling beta, size, and BE/ME in each regression. In these regressions, comparing to Panel B, the sign of the coefficients in front of the independent variables stays the same except analyst coverage, and all the independent variables with significant results in Panel B still have a significant relation with ICCs even adding the three control variables, providing a further confirmation of the relations shown in Panel B. It indicates that the firm characteristics discussed in this section can explain the cross-sectional variation of deep learning based ICCs in a relatively consistent way. In the new regression, analyst coverage has a positive and significant coefficient instead of a negative and significant coefficient in the regression without control variables. In the end, I regress ICCs on all the 13 firm characteristics. After taking all characteristics into consideration, I find that the coefficients in front of idiosyncratic volatility and analyst coverage have an opposite sign comparing to the ones in Panel B, and the coefficients of idiosyncratic volatility (t-statistics of -0.41), CAPEX (t-statistics of -0.20), and asset growth (t-statistics of - 1.60) become insignificant, while market beta has a significant coefficient with a t-statistics of 3.41. In summary, the results of Table 6 indicate that the firm characteristics discussed can describe the cross-sectional variation of the realized returns and ICCs in a similar way but not in the same significant levels for certain characteristics. 37 5. ADDITIONAL EVIDENCE 5.1. Extra Predictive Power of Deep Learning on Earnings In the previous section, I have discussed how deep learning can outperform linear models on forecasting earnings as well as estimating ICCs. In this section, instead of comparing models, I will link deep learning to analysts’ forecasts for the first step. Hou et al. (2012) show that the cross-sectional linear regression approach can make better forecasts on earnings than financial analysts. However, since financial analysts have a different information set with the model when forecasting earnings, and some of the information may come from the sources other than financial statements, we are not sure if a better model based on publicly available information only can still provide extra predictive power on earnings once controlling the forecasts of financial analysts. Although the comparison between models and financial analysts is “unfair” due to different information sets, we are still curious that, once machines can have the same set of information, what will happen. More specifically, do machines have a stronger ability of information analysis than human beings? Although based on the current level of science development, it is very hard to input all the information available for financial analysts to machines and finish the comparison above, we can check if machines can provide extra predictive power even relying on publicly available information only, a subset of the information set of financial analysts. In other words, if machines can do that, it means that machines do a better job than financial analysts on analyzing available information, which will give the incentive for combining machines and human beings’ works together in the future. In this section, based on publicly available information, I first apply deep learning in the forecast of Street earnings per share, which is the target value for analysts’ forecasts in I/B/E/S, to 38 see if machines can provide extra predictive power to earnings when controlling analysts’ forecasts. To simplify the operation and reduce the information gap between machines and analysts, different from the sections above, I include the top 25 commonly appeared variables from quarterly financial statements to construct the deep learning model forecasting earnings per share. In Aubry, Kräussl, Manso, and Spaenjers (2019), the authors estimate prices of artworks by machine learning to check if machines can provide extra explanation power for actual prices. Following their principles, I apply the following regression model to analyze earnings issues: � 𝚤𝚤,𝑡𝑡 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸ℎ𝚤𝚤𝚤𝚤𝚤𝚤 𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖,𝑡𝑡 = 𝛼𝛼 + 𝛽𝛽1 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 � 𝚤𝚤,𝑡𝑡 + 𝜖𝜖𝑖𝑖,𝑡𝑡 (4) where 𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖,𝑡𝑡 represents the actual earnings per share (EPS) for firm i in year t, � 𝚤𝚤,𝑡𝑡 represents the EPS forecast for firm i in year t from financial analysts, while 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 � 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸ℎ𝚤𝚤𝚤𝚤𝚤𝚤 𝚤𝚤,𝑡𝑡 represents the EPS forecast for firm i in year t from deep learning model. When constructing the sample for this analysis, I only consider the firm-year observations with both analysts’ forecasts and deep learning model forecasts. Actual Street earnings per share 𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖,𝑡𝑡 and analysts’ forecasts are available from I/B/E/S. For the estimates by analysts, I only rely on the latest forecasts by analysts before the end of each fiscal year, so that the estimates in this analysis are the most accurate estimates in the belief of analysts before the end of the fiscal period. Since there may be multiple analysts focusing on the same firms, I use the mean of all forecasts by analysts on firm i in year t as 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖,𝑡𝑡 . When constructing the deep learning model, I use the most recent quarterly financial data items before the fiscal period ending date of actual EPS (at least one quarter before the fiscal period ending date). 39 Table 7: Efficiency of Deep-Learning-Based Earnings Per Share (EPS) The table includes the regression results of the equation: � 𝚤𝚤,𝑡𝑡 + 𝛽𝛽2 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸ℎ𝚤𝚤𝚤𝚤𝚤𝚤 𝐸𝐸𝐸𝐸𝐸𝐸𝑖𝑖,𝑡𝑡 = 𝛼𝛼 + 𝛽𝛽1 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 � 𝚤𝚤,𝑡𝑡 + 𝜖𝜖𝑖𝑖,𝑡𝑡 The dependent variable is the actual earnings per share of firm i on year t; the independent variables are analysts’ EPS forecasts and deep learning model’s EPS forecasts. The deep learning model includes the top 25 commonly appeared variables from quarterly financial statements. Analysts’ EPS forecasts are the averages of the latest forecasts on firm i by analysts before the end of each fiscal year. I regress firms’ actual earnings per share on analysts’ forecasts and the predictive EPS by deep learning model. The results in Table 7 show that, the coefficients for analysts’ forecasts and deep learning’s forecasts are both positive and significant, with t-statistics of 190.53 for analysts and t-statistics of 11.70 for machines, stating that the estimates by deep learning model can provide extra explanation power to the actual value of EPS after controlling the forecasts provided by � 𝚤𝚤,𝑡𝑡 is 0.99979 while the coefficient of the analysts. The coefficient of the variable 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 � variable 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸ℎ𝚤𝚤𝚤𝚤𝚤𝚤𝚤𝚤,𝑡𝑡 is 0.06444, showing that though financial analysts can provide relatively close estimates on EPS (or earnings) on average, machines can still extract information which is useful for the prediction of actual EPS (or earnings). 5.2. Comparison of Deep Learning and Analysts on Earnings Forecasts In Hou et al. (2012), the authors compare the linear regression’s earnings forecasts to analysts’ forecasts, showing that the cross-sectional linear regression can outperform analysts’ forecasts in forecasting earnings. Since the coverage of analysts’ forecasts, linear model forecasts, 40 and deep learning model forecasts are various, to avoid the loss of observations in the comparison process, I compare the deep learning model to the linear model separately in Section 4.2. In this section, I will compare deep learning model to analysts on earnings forecasts. I focus on the common firm-year observations appeared in both analysts’ earnings forecasts and deep learning earnings forecasts, meaning that all the observations in the test should have earnings forecasts as well as actual earnings in both deep learning related format and analysts related format. Since the number of firms covered by financial analysts is smaller than the number of firms covered by deep learning model, the sample of doing this comparison, having a smaller sample size, is different from the sample I used in Section 4.2. Since the number of the three-year ahead earnings forecasts in I/B/E/S is relatively small, in this section, I only discuss one-year and two-year ahead earnings forecasts. 2 I generate actual earnings as well as analysts’ forecasts from I/B/E/S. Since there may be several analysts who are focusing on a certain firm, and analysts may not announce and update forecasts in each month, I assign the most recent announced forecast on this firm by this analyst for a certain month as the forecasts of this month. Analysts’ forecasts for firms equal to the mean of forecasts from all analysts providing forecasts on the firms at the end of June, scaled by the price of stocks of the firms. For deep learning model, the methodology is the same as the one applied in Section 4.2, relying on the data items from annual financial statements. To make the values from analysts forecasts and those from deep learning comparable, earnings forecasts and actual earnings for deep learning are scaled by market equity at the end of June in each year. 2 In the previous research, people may also use the estimation of long-term growth rates in I/B/E/S and the two-year ahead forecasted earnings to compute the three-year ahead forecasted earnings to increase the available observations in the analysis if the three-year ahead earnings forecasts made by analysts are not enough. 41 Table 8: Comparison of Analysts' Forecasts and Deep Learning's Forecasts on Earnings This table compares the bias, accuracy and annual earnings response coefficient (ERC) of deep learning’s earnings forecasts and analysts’ earnings forecasts. Bias equals actual earnings minus earnings forecasts, divided by market equity at the end of June each year for deep learning and by the stock price at the end of June each year for analysts for scaling purposes. Accuracy refers to the absolute value of forecast bias. Annual ERC stands for the coefficient by regressing buy-and- hold returns on forecast bias. To make annual ERCs of deep learning model and analysts comparable, I also standardize the forecast bias by one unit variance before the regression. All numbers are calculated in time-series averages. The sample is from 1983 to 2018. Panel A includes averages of mean as well as median of bias across years. Panel B includes averages of the mean as well as median of accuracy across years. Panel C includes the averages of annual ERC. The numbers under each average are the t-statistics. The results in Table 8 show that deep learning can provide less biased earnings forecasts than financial analysts. Though from the mean of the bias, for one-year ahead forecasts, deep learning seems to provide estimates with a higher level of bias, the potential reason behind is the influence of extreme values in the estimating process. From the median, deep learning provides less biased one-year ahead forecasts (0.0035) than financial analysts (-0.0066). For the two years time horizon, deep learning’s forecasts are still less biased than forecasts of financial analysts, for both median (0.0033 vs. -0.0267) and mean (-0.0163 vs. -0.0404). More importantly, when comparing the annual earnings response coefficient, I find that the mean annual ERCs of deep 42 learning model (0.0688 for one-year horizon and 0.2123 for two-year horizon) are larger than those of analysts (0.0488 for one-year horizon and 0.1332 for two-year horizon), indicating that deep learning model can make earnings forecasts which meet the market expectation more. The result is consistent with the conclusion in Section 4.2 and Hou et al. (2012), showing that deep learning model can generate earnings estimates which are less biased and better fit the market expectation, comparing to the linear model as well as financial analysts. 5.3. Deep Learning Model with the Common Variables in the Linear Model In Section 4.2, I apply a deep learning model with 38 variables and find that this model can outperform the linear model in Hou et al. (2012) in forecasting earnings. But the reason behind this result may also be the different amount of information used in each model, since the deep learning model used in Section 4.2 has more variables than the linear model. Thus, I construct the deep learning model with the same 6 variables used in the linear model, repeat the test in Section 4.2, and see if the deep learning model can still outperform the linear model based on the same amount of information. In Table 9, the deep learning model, with only six variables, has more common observations with the linear model in the analysis, leading to a larger set of firm-year observations. For forecasting one-year horizon earnings, deep learning has a bias of -0.0631 on average, while the linear model’s bias is -0.0705 on average. The deep learning model is also less biased than the linear model in forecasting earnings two-year (-0.0072 vs. -0.0792) and three-year ahead (-0.0075 vs. -0.1043) on average. Averages of medians further confirm that earnings forecasts by deep learning model are less biased than those by the linear model (-0.0033 vs. - 0.0042 for one-year ahead, 0.0070 vs. -0.0147 for two-year ahead, and -0.0029 vs. -0.0261 for three-year ahead). More importantly, deep learning approach has a larger annual ERC than the linear regression approach in all the three horizons (the differences are 0.0210 for one-year ahead 43 earnings forecasts, 0.0592 for two-year, and 0.1120 for three-year), showing that the forecasts by deep learning model better reflect the expectation of the market on future earnings. With the same amount of information and a larger range of firm-year pairs in the sample, the deep learning model still outperforms the linear model, showing that deep learning is a better approach to predict earnings. 44 Table 9: Comparison of Deep Learning and Linear Model, with the Same 6 Independent Variables This table compares the bias, accuracy and annual earnings response coefficient (ERC) of deep learning’s earnings forecasts and linear model’s earnings forecasts. The deep learning here uses the same variables from the linear regression approach of Hou et al. (2012). Bias equals actual earnings minus earnings forecasts, divided by market equity at the end of June each year for scaling. Accuracy refers to the absolute value of forecast bias. Annual ERC is the coefficient by regressing buy-and-hold returns on forecast bias. To make annual ERCs of deep learning model and linear model comparable, I also standardize the bias by one unit variance before the regression. All numbers are calculated in time-series averages. The sample is from 1968 to 2018. Panel A includes averages of mean as well as median of bias across years. Panel B includes averages of mean as well as median of accuracy across years. Panel C includes averages of annual ERC. The numbers under each average are the t-statistics. 45 6. CONCLUSION In this article, I demonstrate that a specific deep learning model, a deep neural network, can be used productively in the contexts of earnings forecasting and uncovering a firm’s implied cost of capital. I show that deep learning techniques can lead to earnings forecasts that offer marginal information content above and beyond human analyst forecasts. Thus, machines taught to think in some ways like humans can, at least on some dimensions, extract information from observable data that human analysts either miss or process incorrectly. Of particular note, I show that my deep learning model leads to much less bias than what is exhibited by analysts. However, the machine learning approach is imperfect, as there is clearly some valuable predictive information detected by analysts, perhaps from outside the system, which is not fully captured by my model. Turning to the implied cost of capital (ICC), I show that the ICC derived from deep- learning earnings forecasts are better predictors of future realized returns than corresponding ICCs derived from linear-regression earnings forecast models of the type advanced by Hou et al. (2012). My evidence indicates that combining analyst predictions and deep-learning techniques may lead to substantially superior forecasts than either approach used in isolation. In some sense, my study shows that observable data can be used much more effectively than the linear regression approach by unleashing the creativity of artificial intelligence and allowing it to search in an unstructured way for nonlinear predictive patterns. On the human side, research such as Loudis (2019) shows that analyst forecasts can be decomposed in a novel way that extracts useful information from the brain of an analyst while eliminating certain biases in their incentives or thought patterns. Hopefully, a combination of improved machine models along the lines of what I present, coupled with improved adjustments to human analyst predictions, will lead to far more accurate predictive models of earnings and a firm’s implied cost of capital. Artificial intelligence 46 techniques are improving at a rapid rate, so the prospect of substantial research progress along these lines in the near future appears high. 47 BIBLIOGRAPHY 48 BIBLIOGRAPHY Abdou, H. A., S. T. Alam, and J. Mulkeen. 2014. Would Credit Scoring Work for Islamic Finance? A Neural Network Approach. International Journal of Islamic and Middle Eastern Finance and Management 7:112 – 125. Altman, E. I., G. Marco, and F. Varetto. 1994. Corporate Distress Diagnosis: Comparisons Using Linear Discriminant Analysis and Neural Networks (the Italian Experience). Journal of Banking & Finance 18:505 – 529. Amel-Zadeh, A., J.-P. Calliess, D. Kaiser, and S. Roberts. 2020. Machine Learning-Based Financial Statement Analysis. Working Paper. Ashbaugh-Skaife, H., D. W. Collins, W. R. Kinney, and R. Lafond. 2009. The Effect of SOX Internal Control Deficiencies on Firm Risk and Cost of Equity. Journal of Accounting Research 47:1–43. Aubry, M., R. Kräussl, G. Manso, and C. Spaenjers. 2019. Machine Learning, Human Experts, and the Valuation of Real Assets. HEC Paris Research Paper No. FIN-2019-1332. Ball, R., and R. Watts. 1972. Some Time Series Properties of Accounting Income. The Journal of Finance 27:663–681. Bao, Y., B. Ke, B. Li, Y. J. Yu, and J. Zhang. 2019. Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach. Working Paper. Bertomeu, J., E. Cheynel, E. Floyd, and W. Pan. 2019. Using Machine Learning to Detect Misstatements. Working Paper. Bianchi, D., M. Büchner, and A. Tamoni. 2019. Bond Risk Premia with Machine Learning. WBS Finance Group Research Paper No. 252. Bloch, D. A. 2019. Option Pricing with Machine Learning. Working Paper. Bonelli, F., S. Figini, and E. Giovannini. 2017. Solvency Prediction for Small and Medium Enterprises in Banking. Decision Support Systems 102. Botosan, C. A. 1997. Disclosure Level and the Cost of Equity Capital. The Accounting Review 72:323–349. Botosan, C. A., and M. A. Plumlee. 2002. A Re-Examination of Disclosure Level and the Expected Cost of Equity Capital. Journal of Accounting Research 40:21–40. 49 Boubakri, N., S. El Ghoul, and W. Saffar. 2014. Political Rights and Equity Pricing. Journal of Corporate Finance 27:326 – 344. Boubakri, N., O. Guedhami, D. Mishra, and W. Saffar. 2012. Political Connections and the Cost of Equity Capital. Journal of Corporate Finance 18:541 – 559. Brogaard, J., and A. Zareei. 2018. Machine Learning and the Stock Market. Working Paper. Brooks, L. D., and D. A. Buckmaster. 1976. Further Evidence of the Time Series Properties of Accounting Income. The Journal of Finance 31:1359-1373. Brown, L. D., and M. S. Rozeff. 1978. The Superiority of Analyst Forecasts as Measures of Expectations: Evidence from Earnings. The Journal of Finance 33:1–16. Campbell, J. L., D. S. Dhaliwal, and W. C. Schwartz. 2012. Financing Constraints and the Cost of Capital: Evidence from the Funding of Corporate Pension Plans. The Review of Financial Studies 25:868–912. Chava, S., and A. Purnanandam. 2010. Is Default Risk Negatively Related to Stock Returns? The Review of Financial Studies 23:2523–2559. Chen, H., J. Z. Chen, G. J. Lobo, and Y. Wang. 2011. Effects of Audit Quality on Earnings Management and Cost of Equity Capital: Evidence from China*. Contemporary Accounting Research 28:892–925. Chen, L., M. Pelger, and J. Zhu. 2019. Deep Learning in Asset Pricing. Working Paper. Claus, J., and J. Thomas. 2001. Equity Premia as Low as Three Percent? Evidence from Analysts’ Earnings Forecasts for Domestic and International Stock Markets. The Journal of Finance 56:1629–1666. Dhaliwal, D., S. Heitzman, and O. Z. Li. 2006. Taxes, Leverage, and the Cost of Equity Capital. Journal of Accounting Research 44:691–723. Dhaliwal, D., J. S. Judd, M. Serfling, and S. Shaikh. 2016. Customer Concentration Risk and the Cost of Equity Capital. Journal of Accounting and Economics 61:23 – 48. Dhaliwal, D., L. Krull, and O. Z. Li. 2007. Did the 2003 Tax Act Reduce the Cost of Equity Capital? Journal of Accounting and Economics 43:121 – 150. Dhaliwal, D., L. Krull, O. Z. Li, and W. Moser. 2005. Dividend Taxes and Implied Cost of Equity Capital. Journal of Accounting Research 43:675–708. Ding, K., B. Lev, X. Peng, T. Sun, and M. A. Vasarhelyi. 2019. Machine Learning Improves Accounting Estimates. Working Paper. 50 Donangelo, A. 2014. Labor Mobility: Implications for Asset Pricing. The Journal of Finance 69:1321–1346. Easton, P. D. 2004. PE Ratios, PEG Ratios, and Estimating the Implied Expected Rate of Return on Equity Capital. The Accounting Review 79:73–95. El Ghoul, S., O. Guedhami, Y. Ni, J. Pittman, and S. Saadi. 2013. Does Information Asymmetry Matter to Equity Pricing? Evidence from Firms’ Geographic Location*. Contemporary Accounting Research 30:140–181. Elton, E. J. 1999. Expected Return, Realized Return, and Asset Pricing Tests. The Journal of Finance 54:1199–1220. Fama, E. F., and K. R. French. 1997. Industry Costs of Equity. Journal of Financial Economics 43:153 – 193. Francis, J., R. LaFond, P. Olsson, and K. Schipper. 2005a. The Market Pricing of Accruals Quality. Journal of Accounting and Economics 39:295 – 327. Francis, J., D. Nanda, and P. Olsson. 2008. Voluntary Disclosure, Earnings Quality, and Cost of Capital. Journal of Accounting Research 46:53–99. Francis, J. R., I. K. Khurana, and R. Pereira. 2005b. Disclosure Incentives and Effects on Cost of Capital around the World. The Accounting Review 80:1125–1162. Froot, K. A., and J. A. Frankel. 1989. Forward Discount Bias: Is it an Exchange Risk Premium? The Quarterly Journal of Economics 104:139–161. García Lara, J. M., B. García Osma, and F. Penalva. 2011. Conditional Conservatism and Cost of Capital. Review of Accounting Studies 16:247–271. Gebhardt, W. R., C. M. C. Lee, and B. Swaminathan. 2001. Toward an Implied Cost of Capital. Journal of Accounting Research 39:135–176. Goh, B. W., J. Lee, C. Y. Lim, and T. Shevlin. 2016. The Effect of Corporate Tax Avoidance on the Cost of Equity. The Accounting Review 91:1647–1670. Gordon, J. R., and M. J. Gordon. 1997. The Finite Horizon Expected Return Model. Financial Analysts Journal 53:52–61. Guay, W., S. Kothari, and S. Shu. 2011. Properties of Implied Cost of Capital Using Analysts’ Forecasts. Australian Journal of Management 36:125–149. Heaton, J. B., N. G. Polson, and J. H. Witte. 2017. Deep Learning for Finance: Deep Portfolios. Applied Stochastic Models in Business and Industry 33:3–12. 51 Hou, K., M. A. van Dijk, and Y. Zhang. 2012. The Implied Cost of Capital: A New Approach. Journal of Accounting and Economics 53:504 – 526. Hwang, L.-S., W.-J. Lee, S.-Y. Lim, and K.-H. Park. 2013. Does Information Risk Affect the Implied Cost of Equity Capital? An Analysis of PIN and Adjusted PIN. Journal of Accounting and Economics 55:148 – 167. Jones, J. J. 1991. Earnings Management During Import Relief Investigations. Journal of Accounting Research 29:193–228. Kalev, P. S., K. Saxena, and L. Zolotoy. 2019. Coskewness Risk Decomposition, Covariation Risk, and Intertemporal Asset Pricing. Journal of Financial and Quantitative Analysis 54:335– 368. Khandani, A. E., A. J. Kim, and A. W. Lo. 2010. Consumer Credit-Risk Models via Machine- Learning Algorithms. Journal of Banking & Finance 34:2767 – 2787. Khurana, I. K., and K. K. Raman. 2006. Do Investors Care about the Auditor’s Economic Dependence on the Client?*. Contemporary Accounting Research 23:977–1016. Kim, Y. S., and S. Y. Sohn. 2004. Managing Loan Customers Using Misclassification Patterns of Credit Scoring Model. Expert Systems with Applications 26:567 – 573. Krishnan, J., C. Li, and Q. Wang. 2013. Auditor Industry Expertise and Cost of Equity. Accounting Horizons 27:667–691. Lee, C., D. Ng, and B. Swaminathan. 2009. Testing International Asset Pricing Models Using Implied Costs of Capital. The Journal of Financial and Quantitative Analysis 44:307–335. Lee, C. M. C., E. C. So, and C. C. Y. Wang. 2020. Evaluating Firm-Level Expected-Return Proxies: Implications for Estimating Treatment Effects. The Review of Financial Studies. Li, Y., D. T. Ng, and B. Swaminathan. 2013. Predicting Market Returns Using Aggregate Implied Cost of Capital. Journal of Financial Economics 110:419 – 436. Loudis, J. 2019. Expectations in the Cross Section: Stock Price Reactions to the Information and Bias in Analyst-Expected Returns. Working Paper. Messmer, M. 2017. Deep Learning and the Cross-Section of Expected Returns. Working Paper. Myers, L. A., M. S. Drake, M. T. Bradshaw, and J. N. Myers. 2012. A Re-examination of Analysts’ Superiority over Time-series Forecasts of Annual Earnings. Review of Accounting Studies 17:944–968. Naiker, V., F. Navissi, and C. Truong. 2013. Options Trading and the Cost of Equity Capital. The Accounting Review 88:261–295. 52 Ogneva, M., K. R. Subramanyam, and K. Raghunandan. 2007. Internal Control Weakness and Cost of Equity: Evidence from SOX Section 404 Disclosures. The Accounting Review 82:1255–1297. Ohlson, J. A., and B. E. Juettner-Nauroth. 2005. Expected EPS and EPS Growth as Determinants of Value. Review of Accounting Studies 10:349–365. Peng, Y., and P. H. M. Albuquerque. 2019. Non-Linear Interactions and Exchange Rate Prediction: Empirical Evidence Using Support Vector Regression. Applied Mathematical Finance 26:69–100. Pástor, u., M. Sinha, and B. Swaminathan. 2008. Estimating the Intertemporal Risk–Return Tradeoff Using the Implied Cost of Capital. The Journal of Finance 63:2859–2897. Sirignano, J., A. Sadhwani, and K. Giesecke. 2018. Deep Learning for Mortgage Risk. Working Paper. Wang, C. C. Y. 2017. Commentary on: Implied Cost of Equity Capital Estimates as Predictors of Accounting Returns and Stock Returns. Journal of Financial Reporting 2:95–106. Ye, T., and L. Zhang. 2019. Derivatives Pricing via Machine Learning. Boston University Questrom School of Business Research Paper No. 3352688. Zheng, C., X. Deng, Q. Fu, Q. Zhou, J. Feng, H. Ma, W. Liu, and X. Wang. 2020. Deep Learning- based Detection for COVID-19 from Chest CT using Weak Label. medRxiv. Zhou, B. 2019. Deep Learning and the Cross-Section of Stock Returns: Neural Networks Combining Price and Fundamental Information. Working Paper. Zhou, Z.-H., and J. Feng. 2017. Deep Forest: Towards an Alternative to Deep Neural Networks. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 3553–3559. 53 CHAPTER 2. Reputational Spillovers and Information Production by Analysts 54 1. INTRODUCTION Psychological biases exist in individuals’ daily activities, and people sometimes make judgments based on characteristics of others, including religion, age, race, etc. Among those characteristics, gender is a social topic with a long history and large influence. Different treatments due to gender exist not only in daily social life, but also in the business world. In the previous literature, gender bias is widely discussed, especially in the labor market. There is ample evidence that female employees face unequal treatments on many dimensions including compensation (Altonji and Blank, 1999) and job opportunities (Goldin and Rouse, 2000). In financial markets, specific areas with evidence of gender bias include mutual funds (Atkinson, Baird, and Frye, 2003; Niessen- Ruenzi and Ruenzi, 2019), hedge funds (Aggarwal and Boyson, 2016), venture capital (Gompers, Mukharlyamov, Weisburst, and Xuan, 2021), etc. In this paper, I investigate whether investors in financial markets treat financial analysts differently based on gender, especially when financial misconduct is revealed. In the US, investors rely heavily on financial advisory services. In an effort to increase levels of information transparency, FINRA provides a search engine and databases of the historical misconduct records of financial advisers, or buy-side analysts. In this research, buy-side analysts consist of investment advisers (fund managers) and brokers, while sell-side analysts refer to financial analysts providing recommendations to investors or clients of a brokerage firm. Egan, Matvos, and Seru (2019), exploiting data from BrokerCheck, provide a full picture of financial misconduct by buy-side analysts, and Egan, Matvos, and Seru (2017) further discuss the “gender punishment gap” existing in the financial advisory industry, especially for buy-side analysts. They conclude that managers experience gender discrimination on punishment (taste-based or miscalibrated beliefs), which they characterize as “The Gender Punishment Gap.” Female financial 55 advisers are more likely to receive harsher punishment from brokerages comparing to their male colleagues, and managers experience different treatments based on gender. Following the discussion of gender discrimination in Egan et al. (2017), I consider possible explanations for “The Gender Punishment Gap.” Clearly, financial market participants, investors and clients of financial advisory firms are very important for both buy-side analysts and brokerage firms. When making employment related decisions, brokerages may consider the “taste” of existing clients. Thus, a potential reason why brokerage firms treat male analysts and female analysts differently when misconducts happen is that investors or clients in the financial market have different opinions on misconducts by buy-side analysts with different genders, and brokerage firms follow the preference of investors to make the punishment decision. Thus, the question is, do investors/clients of brokerage have gender discrimination in face of misconducts? If so, does gender discrimination of clients/investors affect the business of brokerage firms, such as sell-side recommendations? To answer of the questions, this paper conducts analysis to check if there is reputational spillover effect from the buy-side to sell-side within a brokerage, and more importantly, if investors have gender bias when facing the financial misconducts of buy-side analysts. I primarily focus on three main categories of misconduct: regulatory, criminal, and civil. I use Cumulative Abnormal Return (CAR) value around a sell-side analyst recommendation as a proxy of investors’ trust on the sell-side analyst, and compare the average CAR value before the financial misconduct with that after the financial misconduct to see if there is any significant change on investors’ reaction on recommendations. I find that after a buy-side analyst is associated with a financial misconduct event, there is a significant decrease on the absolute value of average CAR, indicating that the market reacts less to the recommendations from sell-side analysts who are in the same 56 brokerage, and financial misconducts of buy-side analysts negatively affect the reputation of their colleagues on the sell-side. After identifying the reputational spillover effect, I use a popular gender-name database to classify the gender of all buy-side analysts with misconduct. I then repeat the reputation spillover tests for the two gender groups. Interestingly, when female buy-side analysts are associated with financial misconduct, investors appear to have a larger magnitude negative reaction, reflecting on the attitudes to the recommendations from the sell-side analysts in the same brokerage. The negative reputational impact on sell-side analysts is larger when a female buy-side analyst is related to the financial misconduct case. Compared with male analyst misconduct events, female analyst misconduct appears to destroy investors’ trust of sell-side analysts from the same brokerage firm to a much greater extent. This paper contributes to the literature in at least three ways. First, this is the first study that identifies the reputational connection between buy-side analyst and sell-side analyst. Prior studies primarily focus on active connections between buy-side analysts and sell-side analysts with benefit considerations, including information flows in both directions (Cici, Shane, and Yang, 2019; Irvine, Lipson, and Puckett, 2007). This paper demonstrates the existence of reputational spillover effect between the buy-side and sell-side, showing that misconduct events or illegal activities affect a firm’s overall reputation, including employees with different job functions. Second, this study provides empirical evidence that investors/clients of brokerage have gender bias in face of analyst misconduct. Misconducts of female analysts have a larger impact on the business of brokerage firms, especially recommendations. Different treatments exist not only within the brokerage firm, but also widely in the financial market. Under certain situations, different treatments by gender may be client-oriented or market preference oriented. 57 A third contribution of this study is to provide evidence of alternative explanations on the discriminatory behavior by gender in labor markets. Gender bias in the labor market may not fully originate from the labor market alone, and the underlying reasons for such bias may come from the influence of other related areas. Other than gender bias of managers, gender bias of investors/clients can be a reason why brokerage treats analysts differently by gender. According to Gurun, Stoffman, and Yonker (2021), even with misconducts, buy-side analysts with solid connections with clients receive less severe punishment from the brokerage firms. To keep the assets of clients, brokerage firms may hesitate to fire buy-side analysts with misconduct activities. Thus, a connection between gender discrimination in labor markets and that in financial markets may exist. The rest of the article is organized in the following structure. Section 2 is the literature review, while Section 3 shows the data and empirical methodology. In Section 4, I present the main results regarding the reputational spillover effect and gender difference. Section 5 concludes and discusses future potential research directions. 58 2. LITERATURE REVIEW 2.1. Financial Misconduct Misconduct and fraud are major topics in finance and accounting research. Since misconduct is a broad research issue that may be connected to various areas in business, when discussing misconducts or frauds, researchers are often interested in the specific identities of individuals associated with misconduct. For example, targeting the executives of firms, Davidson, Dey, and Smith (2015) show that CEOs and CFOs with past legal troubles have a higher possibility of committing subsequent fraud. McNichols and Stubben (2008) investigate the investment behavior of firms manipulating earnings and show that, compared to others, firms with misconducts overinvest in the period of financial reporting manipulation. Since they are studying firms’ real activities, those researchers consider firm-level misconduct events. 3 Researchers rely on different fraud or misconduct data to exploit the relation between financial misconduct and investors’ behavior (Gurun, Stoffman, and Yonker, 2017; Law and Mills, 2019), investors’ benefits (Piskorski, Seru, and Witkin, 2015), disclosures (Dimmock and Gerken, 2012), competitors (Bertsch, Hull, Qi, and Zhang, 2020), etc. This paper is targeting the financial misconduct of buy-side analysts, or financial advisers, contributing to the literature on misconducts, frauds, or crimes of financial industry employees. There are also other papers discussing the misconducts of buy-side analysts. Egan et al. (2019) draw a full picture of the misconduct behavior of financial advisers or buy-side analysts. They report that a considerable portion of buy-side analysts have a misconduct record, and some of them have offended the law more than once and have had to pay compensation to clients for their illicit behavior. Dimmock, Gerken, and Graham (2018) find that buy-side analysts working in the same location have an effect 3 Karpoff, Koester, Lee, and Martin (2017) summarize classic data sources and proxies of financial misconducts in the past literature. 59 on each other when it comes to misconduct. Using mergers of brokerages, researchers have found that financial advisers will have a higher possibility to commit financial misconducts if their new coworkers have a record of misconduct previously. Honigsberg and Jacob (2021) find that more than 10% of misconducts have been deleted from the records by buy-side analysts, and also report that analysts who attempted to eliminate misconduct records have a significantly higher possibility to be associated with new misconducts. 2.2. Gender Difference Diversity is an important topic in much current academic research. The role of gender in labor markets and business decisions has been a particular issue of much recent discussion given the historical differences in how the different genders have been treated in the workplace. This paper contributes to this nascent but important literature. Unfair treatments arising from gender are formally discussed in work on discrimination theory. When discussing discrimination, researchers classify cases into two major categories: statistical (Phelps, 1972; Arrow, 1973), and taste-based or implicit (Becker, 1957; Bertrand, Chugh, and Mullainathan, 2005). In the previous literature, researchers find that different treatments by gender exist in financial markets, especially with regard to investment activities. Adams, Kräussl, Navone, and Verwijmeren (2021) provide strong evidence of a significant gender discount in the art market. Especially in the countries with higher level of gender inequality, art works by female artists sell for a significant lower price. Gafni, Marom, Robb, and Sade (2020) show that though female entrepreneurs have a higher level of success rate on crowdfunding platforms, people prefer to invest in projects of same-gender entrepreneurs, strongly suggesting the presence of taste-based discrimination. Using data from AngelList, Ewens and Townsend (2020) find that male investors are less interested in startups led by females, though similar 60 situation male-led startups chosen by male investors are ultimately less successful. Studies discussing in-firm gender bias or treatments on analysts are more relevant to this paper. Fang and Huang (2017) show that, though connections with corporate board members will benefit financial analysts, female analysts receive less positive effect from such connections compared to male analysts. The gender bias exists even people know each other very well. According to Duchin, Simutin, and Sosyura (2020), even within a firm, male CEO is distributing more resources or budgets to divisions with a male manager, and this gap of treatments is also related to the growing background and family structure of male CEOs. Distinct from the previous literature, this paper will discuss gender difference on the spillover effect between different job functions within a firm. 61 3. DATA AND EMPIRICAL METHODOLOGY 3.1. Data The data I use related to buy-side analysts comes from IAPD (Investment Adviser Public Disclosure database) and FINRA BrokerCheck. I use similar variables to those selected by Egan et al. (2017). 4 The data includes misconduct years, locations, names, etc. The observations are at the year-analyst level, from 2007 to 2015. In this paper, I focus on three major types of misconducts: regulatory misconduct, criminal misconduct, and civil misconduct. Thus, I select all the misconduct observations within the three categories. Next, I manually search each buy-side analyst on IAPD and BrokerCheck, one-by-one, and find out the exact misconduct date from the reports for each misconduct observation. Figure 3 exhibits an example of the financial misconduct report. For gender information, I exploit the Genderchecker database. It is a database containing 102240 authenticated names worldwide, and each name is categorized as male, female or unisex. 5 4 I am very grateful that Dr.Mark Egan and his co-authors kindly share the investment adviser data used in their research with me. 5 According to the introduction on the official website of Genderchecker, 7% of names in the database are classified as “unisex,” indicating that these names are used for both genders. I exclude these cases from the analysis in this paper. 62 Figure 3: An Example of the Financial Misconduct Report The sell-side analyst data is from I/B/E/S. Since there is no common indicator (such as cik, ticker) between the buy-side analyst database, which is from IAPD and BrokerCheck, and the sell- side analyst database, which is from I/B/E/S, the only way of combining the two databases is to manually match the names of the brokerage firms. On I/B/E/S, there is only a masked code reported 63 for each brokerage firm. Following the previous literature, I rely on two data sources to find out the true name of the brokerages in I/B/E/S. First, I use the translation table from I/B/E/S. This data contains a linkage between the marked code and the true name information before 2010. 6 Second, following the new method introduced in Gibbons, Hiev, and Kalodimos (2021), I manually collect the actual names of brokerage firms and sell-side analysts from a Bloomberg terminal. By matching the recommendations on Bloomberg and I/B/E/S, using the initials of sell-side analysts on I/B/E/S, I can find a link between I/B/E/S data and Bloomberg terminal data. Once I obtain the true name information, I manually match the brokerage firm names by searching the common words in the brokerage names of both the buy-side database and the sell- side database. After creating a link table between buy-side names and sell-side names, I also search each name from both sides using google to confirm that the two names with common words refer to the same firm or the same company group. After creating the brokerage names linktable, I merge the buy-side analyst data to this linktable, and further expand the data to the misconduct level. I then match this to I/B/E/S recommendations from sell-side analysts under the same brokerage firm as each buy-side analyst. In the end, after cleaning the data, I obtain a set of 882 misconduct cases, 756 for male buy-side analysts and 126 for female buy-side analysts. Other than analysts related data, I also use daily stock price data from CRSP for the calculation of CAR values. 3.2. Empirical Methodology The sell-side analysts generally provide various recommendations. In this research, I classify the recommendations with IRECCD=1 or 2 as “buy recommendation” and categorize recommendations with IRECCD=4 or 5 as “sell recommendation.” The subsequent discussion is 6 I am very grateful that Dr.Alexander Ljungqvist, Dr.Huihao Yan and their co-authors for kindly sharing the linkage data with me. 64 separated into the buy recommendation group and the sell recommendation group. When investors are disappointed, if reputational spillover effect exists, they may question the information quality of sell-side analysts, causing a smaller market reaction to these recommendations of analysts. To measure the market reaction to the recommendations of sell-side analysts, I calculate the Cumulative Abnormal Return (CAR) values on the sell-side analyst recommendation dates, setting 150 days as the length of the estimation period in trading days for the market model, 15 days as the gap between the estimation period and the beginning of the event window, and an event window of [-1 day,1 day] around each recommendation announcement date. After obtaining CAR values on the recommendation dates, I set a three months window before and after the misconduct date and take the average of CAR values in the before and after windows of all recommendations provided by sell-side analysts who are from the same brokerage firm as the buy-side analyst accused of misconduct. I then compare the average CAR value before the misconduct with the average CAR value after the misconduct, and test if there is a significant decrease in the absolute value of average CAR. Since I use CAR values as a measure of investors’ reaction to sell-side analysts’ recommendations, a decrease in the CAR value represents a lower level of attention or agreement on the recommendation. I use full sample with both male and female analysts first and divide the full sample into three categories based on the type of financial misconduct. For each category, I repeat the calculation of CAR values, average CAR values, and differences before and after the misconduct. I also do extra analysis to further explore the main evidence. When comparing CAR averages, I do not pair brokerage firms or sell-side analysts between the two sides of the window and do not require that they exist on both sides of the window. The only commonality is that they are all related to a certain buy-side analyst who engaged in 65 misconduct. Also, when calculating the average CAR value on each side, I only delete repeated CAR values on the same recommendation for a certain misconduct case, but not across all cases, since for different misconduct cases they may have common related sell-side recommendations. If I delete the repeated sell-side recommendations across all the misconduct cases, the weight of the CAR values on certain recommendations will be reduced. 66 4. RESULTS 4.1. Comparison of Average CAR Value Before and After Buy-Side Analysts’ Misconducts After obtaining CAR values within the [-3 months, 3 months] window around the buy-side misconduct date, I calculate the average of the CAR values on sell-side recommendations 3 months before and after the misconduct date separately. All the sell-side analysts are in the same firm or under the same group as the buy-side analyst who engaged in misconduct. Then, I use the average of all CAR values in the 3 months before the misconduct date minus the average of all CAR values in the 3 months after the misconduct date. By making this calculation, I obtain the reduction of magnitude of the reaction from investors, and check if such reduction/difference is significant or not. Table 10: Comparison of Average CAR Value Before and After Buy-Side Misconducts 67 In Table 10, I calculate the average of difference between CAR values before the misconduct date and CAR values after the misconduct date to see if there is any significant change on the magnitude of market reaction to the sell-side analysts’ recommendations. First, I divide all sample observations into three categories by the type of misconducts, and within each category. I will discuss buy recommendation and sell recommendation separately, because generally, buy and sell recommendations will cause market reaction in opposite directions. Within the three types of misconducts I focus on, regulatory misconduct category has the largest number of observations in the sample. In the three months period before the regulatory misconduct dates, there are 41,565 observations for buy recommendations, and 13,512 observations for sell recommendations, while within three months after the regulatory misconduct dates, there are 43,360 observations in the buy recommendation group, and 15,153 observations in the sell recommendation group. I find that for the buy recommendation group, there is no significant change in average CAR value, but for the sell recommendation group, the average CAR value changes from -0.0413 to -0.0363, with a - 0.00497 difference. The change is significant, showing that there is a significant decrease in the market reaction to sell recommendations of analysts after the misconduct date. In the criminal misconduct and civil misconduct categories, while there is a decrease in market reactions to sell recommendations, they are not very significant. When taking the full sample into consideration, for the sell recommendation group, the average CAR value changes from -0.0396 to -0.036, with a significant change of -0.00359. These results indicate that investors react to the recommendations of related sell-side analysts less after the misconduct dates of buy-side analysts from the same brokerage, especially for sell recommendations. The results from Table 10 provide evidence of the existence of reputational spillover effects between buy-side and sell-side analysts within the same brokerage firms. With these initial results 68 in hand, I move to the analysis of differences by gender. Based on the structure of the test in Table 10, I further divide the sample based on buy-side analysts’ gender and repeat the analysis steps for males and females separately. If there are differences on the reduction of the market reaction by gender, it will show that investors or clients of brokerage firms exhibit gender-dependent differences in judgment in the face of financial misconduct. 69 Table 11: Comparison of Average CAR Value Before and After Buy-Side Misconducts, by Gender 70 Table 11 includes analysis by gender of buy-side analysts who are associated with financial misconducts. I follow the same principles applied in the analysis of Table 10, but divide the sample by gender. From Table 11, we know that the number of observations related to female buy-side analysts (8,780 for all categories) is around 17% of that related to male buy-side analysts (50,903 for all categories). There are several possible reasons for this difference. First, in the financial advisory industry, female analysts are fewer than male analysts. Also, according to Egan et al. (2017), female analysts are less likely to have financial misconducts in general comparing to male buy-side analysts. In addition, the diversity of different brokerage firms and the availability of data may impact the portion of female related observations in the sample. When checking the results by gender and the type of financial misconduct, I find that for both the male and female sub- categories, when regulatory misconducts happen, there is a significant decrease in the market reaction to sell recommendations made by related sell-side analysts. However, for the male analysts sub-category, the average CAR value moves from -0.0416 before the misconduct dates to -0.0374 after the misconduct dates, with a difference of -0.00413, while for the female analysts sub-category, the average CAR value moves from - 0.0384 to -0.0313, with a -0.00702 difference. This indicates that, when regulatory misconducts happen, the negative reputational spillover effect from buy-side to sell-side is larger if the buy-side analysts involved are women. For criminal misconducts, there is no significant decrease for the male sub-category, but a significant decrease is evident for the female sub-category (from -0.0481 to -0.0302, and P-value is 0.0002). There is no female analyst observation related to civil misconduct observations. For the full sample, though decreases in market reaction appear when male buy-side analysts have financial misconducts, the magnitude is much smaller than that of female analysts when considering sell recommendations (- 0.00212 vs -0.00999). These results establish that, although decreases in market reaction exist for 71 both male analysts related cases and female analysts related cases, the magnitude of the decrease is larger for female analysts related cases, indicating that the reputational spillover effect discussed earlier is stronger when the financial misconducts are related to female buy-side analysts. One potential explanation behind the identified gender difference is that investors have a higher expectation of the moral standards of female analysts, since in general, female employees are less likely to be involved in fraud or illegal cases. Some investors may choose female analysts for an expectation of reliable services and low level of fraud risk. Once a fraud related to a female analyst is detected, investors and clients sharply revise downward their expectations in the face of such unexpected bad news. Due to this disappointment, investors treat the misconduct more seriously, and more negative feedback applies to the brokerage and its services, causing a stronger negative reputational spillover effect. Because female analysts are less likely to be involved in financial misconduct, once female buy-side analysts are associated with financial misconduct, based on the analysis, investors and clients of brokerage firms may lose their trust in the general environment of the brokerage firm and suspect that there may be additional undiscovered illegal activities occurring at the firm. In other words, investors may treat the financial misconducts by female buy-side analysts as a signal of dysfunctional moral environment, reducing overall trust in the brokerage firms. Moreover, financial misconducts of female buy-side analysts may also signal poor performance or unstable operations at brokerage firms, because bad performance may induce financial analysts to take on the risk of acting illegally. If female analysts are willing to engage in illegal activities, it may indicate that the situation of the brokerage firm is quite precarious as employees with high- expected levels of moral standards are willing to engage in these activities to survive. 72 4.2. Comparison in One Year Before and After the Misconduct Dates In the previous section, when comparing genders, I find that a reputational spillover effect exists and that the gender of buy-side analysts affects the level of this effect. To buttress this evidence, I turn next to ruling out some alternative explanations for these results. One alternative is that the phenomenon I detect is periodic, which not only happens on misconduct dates, but also the same days in other years. Thus, I move all misconduct dates to one year ahead and one year after the true date, and redo the analysis in the previous section using the moved misconduct dates. Since the misconduct dates have been moved to new dates, the observations, or sell-side analysts’ recommendations, within the [-3 months, 3 months] window around the new dates form a new sample. The principles behind the new analysis are still the same as those in Section 4.1. Table 12: Moving All Misconducts Dates 1 Year Earlier 73 First, I test if the decreases on the market reaction, or the reputational spillover effect as reported in Table 10, still exist. Table 12 repeats the analysis in Table 10 using the new sample constructed by applying the new dates, which are generated by subtracting one year from all misconduct dates. For example, if the original misconduct date is 01/01/2013, then the new date after the move is 01/01/2012, and the three months window is around the new date for the analysis. There is no significant decrease on the absolute value of average CAR after the new dates for all categories. The only significant result is for the sell recommendation group in the criminal misconduct category, but different from the decreased reaction we are looking for, there is a positively significant change on the market reaction to sell recommendations. Thus, when using alternative (incorrect) dates I do not find similar results as those using the original (true) misconduct dates in Table 10. 74 Table 13: Moving All Misconducts Dates 1 Year Later Next, I use new dates one year after the misconduct dates to construct another new sample and redo the analysis. According to the results reported in Table 13, there is no significant decrease in market reaction. On the contrary, the absolute values of average CAR significantly increase after the new dates. When considering criminal misconduct only, the average CAR value around sell recommendations changes from -0.0343 to -0.042, while for the full sample, the average CAR value decreases from -0.0378 to -0.0402, and both changes are significant. This indicates that, one year after the original financial misconduct dates, there is no significant decreasing trend in the absolute value of average CAR around sell-side analysts’ sell recommendations. It also shows that the reputational spillover effect does not last for a relatively long term, because after one year, the decreasing trend of market reaction is not evident. Based on the results from Table 12 and Table 75 13, we can conclude that the results in Table 10 are not appearing on pseudo-event dates and that the decrease in market reaction only occurs immediately after the actual financial misconduct date. After confirming the general spillover effect is related to the actual misconduct using the preceding placebo-date analysis, I next repeat the gender analysis in Table 11 using the alternative dates by moving the original misconduct dates one year forward and afterward. Table 14 tabulates evidence for the one year earlier dates. Dividing the sample into gender groups, I find that for male analysts analysis, there is only a weak (10% significance level) result for sell recommendations in the criminal misconduct category, following the similar pattern without considering gender in Table 12. However, for misconducts by female analysts, I find significant results in multiple groups. For regulatory misconduct, there is no significant result for buy recommendations, while there is a 5% significance level decrease in the absolute value of average CAR after the new dates. Also, for criminal misconducts of female analysts, a significant decrease in average CAR value exists for buy recommendations. Both sets of results indicate decreases in the market reaction. Moreover, for sell recommendations in the criminal misconduct category for female analysts’, a significant increase in market reaction is evident. This indicates that, one year before the actual financial misconduct dates, investors start paying more attention to the sell recommendations by sell-side analysts who are in the same brokerage as female buy-side analysts with future criminal misconduct. 76 Table 14: Moving All Misconducts Dates 1 Year Earlier, by Gender 77 Table 15: Moving All Misconducts Dates 1 Year Later, by Gender 78 In Table 15, I repeat the analysis by gender using new dates generated by adding one year to the original financial misconduct dates. I find no evidence of a significant (5% level or better) difference, except that for the male analysts’ criminal misconduct category, there is an increase on the market reaction to sell recommendations, which is the same directional finding as reported in Table 13. By combining the results from Table 14 and Table 15, we can conclude that the stronger reputational spillover effect for misconducts by female analysts is not a seasonal phenomenon with a regular timeline. Although in the one year earlier period, there is significant reduction of market reaction to sell recommendations in the female analysts’ regulatory misconduct category and to buy recommendations in the female analysts’ criminal misconduct category, we still reach the same overall conclusion. For the results in regulatory misconduct category, the magnitude of the decrease is larger and more significant for the real financial misconduct dates (-0.00702 with a 0.0057 P-value, vs. -0.00606 with a 0.0392 P-value), and the absolute value of average CAR after the misconduct dates reach to a bottom level (0.0313), smaller than for the one year earlier test (0.0345), showing that though the trends are in the same direction, the original analysis in Table 11 generates more powerful results on the spillover phenomenon. For the criminal misconduct category, because there is no significant result in the analysis using real misconduct dates on this specific group, the results for this group cannot challenge the evidence for a spillover effect in the other categories and groups. 4.3. Comparison of Non-Misconduct Related Recommendations In Section 4.1 and 4.2, for each misconduct date and each buy-side analyst with misconduct, within the three months window, I focus on recommendations provided by sell-side analysts from the same brokerage. In this section, for each buy-side misconduct case, I select all 79 the recommendations which are targeting the same firms within the same windows around the misconduct dates in Section 4.1, but from sell-side analysts who are not from the same brokerage as the misconduct buy-side analyst. I repeat the analysis on all other unrelated recommendations, and compare the results with the ones from Table 10 and Table 11 to see if the results on the reputational spillover effect and the gender difference only exist for misconduct related analysts and recommendations rather than more generally. I start with the sample of financial misconduct and collect all sell-side analysts’ recommendations within the [-3 months, 3 months] window around the buy-side misconduct dates. Then, for each misconduct case, I identify all recommendations within the window which are not related to this misconduct buy-side analyst and his/her brokerage, but are targeting the same set of firms as the misconduct related recommendations. After constructing the sample, I calculate the average CAR values and the difference, repeating the analysis process in the previous sections. When calculating the average CAR value for the recommendations of non-related sell-side analysts, I focus on each specific misconduct case. Thus, a non-related sell-side analyst may become a related one in another misconduct case, but I do not delete his/her recommendations across all misconduct cases, because under certain misconduct cases, those recommendations can be from non-related analysts. 80 Table 16: Comparison of Average CAR Value of Brokerage Without Misconducts Table 16 contains results for the non-misconduct related recommendation sample, without considering gender. There are significant market reaction decreases in certain categories. For regulatory misconducts, the average CAR value around sell recommendations increases from - 0.0308 to -0.0297, with a significant reduction in the absolute value. Compared to the results for the misconduct related sample in Table 10, the magnitude of reduction is only 1/4 to 1/5 of the reduction for misconduct related recommendations (-0.00115 vs -0.00497). For civil misconducts, the decrease on the absolute value of average CAR around sell recommendations is 0.00648, which is smaller than the one in original misconduct related sample (0.00855). When considering all categories, the reduction in the absolute value of average CAR is only 0.00052 for non-misconduct related sell recommendations, while for misconduct related sell recommendations, the reduction is 0.00359 and has a higher level of significance. Thus, although certain categories of the non- 81 misconduct related recommendations sample share a similar reduction in market reaction as those in the original misconduct related sample, the magnitude is much smaller, thus not fully explaining the reduction of the absolute value of average CAR for misconduct related observations. In addition to the results concerning same trend directions, Table 16 reveals evidence concerning different directions that may strengthen the case for the existence of reputational spillover effects. For example, while in the original analysis in Table 10, there is no significant change for the criminal misconduct category (both buy recommendations and sell recommendations), in Table 16, a significant increase in the absolute value of average CAR is evident. When market reactions to non-misconduct related recommendations rises, investors appear to be keeping the original level of attention on misconduct related recommendations. This indicates that the market reaction on misconduct related recommendations decreases in a relative sense, providing indirect evidence that buy-side misconducts negatively impact the reputation of sell-side analysts from the same brokerage. 82 Table 17: Comparison of Average CAR Value of Brokerage Without Misconducts, by Gender 83 Table 17 compares average CAR value for non-misconduct related recommendations by gender separately. When adding gender into analysis, the results can directly or indirectly prove the existence of a spillover effect and a gender difference when misconducts occur. First, for the subsample related to female buy-side analysts’ misconducts, criminal misconduct is the only category having significant results, showing that there is no evidence of changes in market reaction to recommendations not related to female misconduct analysts for most groups, emphasizing the evidence of reputational spillover effect on female side in Table 11. Second, there is only one 5%- or-higher-level significant result, namely a 0.00478 increase in the absolute value of average CAR around sell recommendations, indicated for females in Table 17. Comparing with the same category in Table 11, which has a significant 0.0179 decrease on the absolute value of average CAR after misconduct dates, the result for the non-misconduct related sample analysis goes in the opposite direction, further strengthening the reputational spillover interpretation related to female analysts reported in Table 11. When non-misconduct related recommendations exhibit an increased market reaction, the decrease in the market reaction in the misconduct related recommendations strengthens the conclusions from Table 11. While the significance levels are different, the differences of average CAR for male and female analysts in Table 17 are small and similar in value, especially for the full sample comparison. For the male case, the differences of average CAR are -0.00022 for buy recommendations and -0.00048 for sell recommendations, while for the female case, the differences of average CAR are -0.00024 for buy recommendations and -0.00064 for sell recommendations. The male and female cases in Table 17 have very similar results, and, compared to Table 11, which are the results for misconduct related recommendations, the results for all other recommendations are very small. This strongly suggests that no gender difference exists for the 84 market reaction to non-misconduct related recommendations. 85 5. CONCLUSION AND FUTURE POTENTIAL DIRECTIONS In this article, I investigate the reputational spillover effect between buy-side analysts and sell-side analysts, and examine whether gender difference affects the magnitude of the spillover effect within the same brokerage firm. After the revelation of misconduct by buy-side analysts, investors react less to recommendations of sell-side analysts in the same brokerage, especially to “sell recommendations.” These results indicate the existence of a substantive reputational spillover effect between the two types of analysts. More surprisingly, female buy-side analyst misconducts have a larger negative impact on sell-side analysts from the same brokerage, indicating that investors/clients have different reactions to different genders in the face of misconducts. By comparing these findings with a non-misconduct related sample and samples based on placebo dates, I rule out the potential that my results are not specifically reflective of actual misconduct and the gender of the underlying analysts. The evidence I detect on gender appears to reflect differing expectations of the genders when it comes to following various rules and regulations, and it may have implications for understanding various puzzles regarding the organization of the market for analysts. My study also builds on Gibbons et al. (2021) by collecting rich data from Bloomberg to illustrate that researchers can obtain detailed demographic characteristics of sell-side analysts including name, position, education, location of firm, employment history, etc. By adding these details into the analysis, researchers may be able to examine the role of salient characteristics that have not been previously studied and/or controlled for. For example, researchers can further group the sample by the gender of sell-side analysts, to see if reputational spillover effect between buy- side analysts and sell-side analysts has an “in group” effect. When buy-side analysts and sell-side analysts have the same gender, will the spillover effect be of a different magnitude? After 86 considering the education level and employment history, will investors have a lower level of gender bias on these financial advisory employees, especially when they get involved in illegal activities? 87 BIBLIOGRAPHY 88 BIBLIOGRAPHY Adams, R. B., R. Kräussl, M. Navone, and P, Verwijmeren. 2021. Gendered Prices. The Review of Financial Studies URL https://doi.org/10.1093/rfs/hhab046. Hhab046. Aggarwal, R., and N. M. Boyson. 2016. The Performance of Female Hedge Fund Managers. Review of Financial Economics 29:23–36. Altonji, J., and R. Blank. 1999. Race and Gender in the Labor Market. In O. Ashenfelter and D. Card (eds.), Handbook of Labor Economics, vol. 3, Part C, 1st ed., chap. 48, pp. 3143– 3259. Elsevier. URL https://EconPapers.repec.org/RePEc:eee:labchp:3-48. Arrow, K. 1973. The Theory of Discrimination. In O. Ashenfelter and A. Rees (eds.), Discrimination in Labor Markets, pp. 3–33. Princeton University Press. Atkinson, S. M., S. B. Baird, and M. B. Frye. 2003. Do Female Mutual Fund Managers Manage Differently? Journal of Financial Research 26:1–18. Becker, G. 1957. The Economics of Discrimination. Chicago & London: University of Chicago Press. Bertrand, M., D. Chugh, and S. Mullainathan. 2005. Implicit Discrimination. The American Economic Review 95:94–98. Bertsch, C., I. Hull, Y. Qi, and X. Zhang. 2020. Bank Misconduct and Online Lending. Journal of Banking and Finance 116:105822. Cici, G., P. B. Shane, and Y. S. S. Yang. 2019. Do Connections with Buy-Side Analysts Inform Sell-Side Analyst Research? Working Paper. Davidson, R., A. Dey, and A. Smith. 2015. Executives’ “off-the-job” Behavior, Corporate Culture, and Financial Reporting Risk. Journal of Financial Economics 117:5–28. NBER Conference on the Causes and Consequences of Corporate Culture. Dimmock, S. G., and W. C. Gerken. 2012. Predicting Fraud by Investment Managers. Journal of Financial Economics 105:153–173. Dimmock, S. G., W. C. Gerken, and N. P. Graham. 2018. Is Fraud Contagious? Coworker Influence on Misconduct by Financial Advisors. The Journal of Finance 73:1417–1450. Duchin, R., M. Simutin, and D. Sosyura. 2020. The Origins and Real Effects of the Gender Gap: Evidence from CEOs’ Formative Years. The Review of Financial Studies 34:700–762. 89 Egan, M., G. Matvos, and A. Seru. 2019. The Market for Financial Adviser Misconduct. Journal of Political Economy 127:233–295. Egan, M. L., G. Matvos, and A. Seru. 2017. When Harry Fired Sally: The Double Standard in Punishing Misconduct. Working Paper 23242, National Bureau of Economic Research. Ewens, M., and R. R. Townsend. 2020. Are Early Stage Investors Biased Against Women? Journal of Financial Economics 135:653–677. Fang, L. H., and S. Huang. 2017. Gender and Connections among Wall Street Analysts. The Review of Financial Studies 30:3305–3335. Gafni, H., D. Marom, A. Robb, and O. Sade. 2020. Gender Dynamics in Crowdfunding (Kickstarter): Evidence on Entrepreneurs, Backers, and Taste-Based Discrimination*. Review of Finance 25:235–274. Gibbons, B., P. Hiev, and J. Kalodimos. 2021. Analyst Information Acquisition via EDGAR. Management Science 67. Goldin, C., and C. Rouse. 2000. Orchestrating Impartiality: The Impact of “Blind” Auditions on Female Musicians. American Economic Review 90:715–741. Gompers, P. A., V. Mukharlyamov, E. Weisburst, and Y. Xuan. 2021. Gender Gaps in Venture Capital Performance. Journal of Financial and Quantitative Analysis p. 1–29. Gurun, U. G., N. Stoffman, and S. E. Yonker. 2017. Trust Busting: The Effect of Fraud on Investor Behavior. The Review of Financial Studies 31:1341–1376. Gurun, U. G., N. Stoffman, and S. E. Yonker. 2021. Unlocking Clients: The Importance of Relationships in the Financial Advisory Industry. Journal of Financial Economics. Honigsberg, C., and M. Jacob. 2021. Deleting Misconduct: The Expungement of BrokerCheck records. Journal of Financial Economics 139:800 – 831. Irvine, P., M. Lipson, and A. Puckett. 2007. Tipping. The Review of Financial Studies 20:741– 768. Karpoff, J. M., A. Koester, D. S. Lee, and G. S. Martin. 2017. Proxies and Databases in Financial Misconduct Research. Accounting Review 92:129 – 163. Law, K. K. F., and L. F. Mills. 2019. Financial Gatekeepers and Investor Protection: Evidence from Criminal Background Checks. Journal of Accounting Research 57:491–543. McNichols, M., and S. R. Stubben. 2008. Does Earnings Management Affect Firms’ Investment Decisions? The Accounting Review 83:1571–1603. 90 Niessen-Ruenzi, A., and S. Ruenzi. 2019. Sex Matters: Gender Bias in the Mutual Fund Industry. Management Science 65:3001–3025. Phelps, E. S. 1972. The Statistical Theory of Racism and Sexism. The American Economic Review 62:659–661. Piskorski, T., A. Seru, and J. Witkin. 2015. Asset Quality Misrepresentation by Financial Intermediaries: Evidence from the RMBS Market. The Journal of Finance 70:2635–2678. 91