Welcome to the Forum

Creating an account is currently only possible via registration at SimFin.

Is there going to be any attempt made to support dead tickers?

Famously, in 2008, Lehman Bros. was allowed to go out of business, when many REITs were being falsely evaluated as worthless, although some thought it provided near-essential services to the financial industry. It's ticker had been LEH. When I do a search in data finder or in SimFin Fuse, on either "LEH" or "Lehman", I get no hits.

As the point of this project is to provide some free data including fundamentals for backtesting, backtesting without these "dead" or de-listed tickers suffers from "suvivorship bias". This causes many strategies to look too good, since the stocks the backtests test over have all survived to the present day. Ones that have gone out of business have dropped out of the indices and virtually all free data sources. That's why CSI gives away cheap data through Yahoo: it could make a fine stock screener, if you already have a strategy, but how are you going to backtest it?

Thus, I would be interested to know if SimFin will be supporting dead tickers, or if it is intending to be just another free source of bad data figuratively blowing smoke up our collective asses?


  • Hi codelurker,

    good point.

    Let's put it that way: the idea of SimFin is to make all kinds of fundamental data freely available, that means not only blue-chips that are currently doing well but also delisted companies. That's our aim. At the moment though, our resources are still very limited so we have to prioritize a bit in what data we try to deliver first, and that is determined by "how important" we think the data is (over all possible use-cases, which are more than just backtesting) but also and maybe more importantly how easily we can crawl the data. We already have some delisted companies on the site (and they will stay there of course) but for these companies at least some filings are available in the XBRL format on the SEC website, this is not the case for Lehman unfortunately (since XBRL was adopted in 2009/10 I believe), and XBRL is currently the only way for us to get the raw financial statement data in an efficient (automised) way.

    We are currently working on getting data from sources other than the SEC but that is a bit more challenging, but when that process is working properly there is literally no limit to what kind of data is on SimFin, as long as the data is accessible in some format somewhere on the internet.
  • edited February 2019
    Thanks for the reply. I look forward to such time as you might have this data available. I feel that without seeing how a strategy would have performed in 2008 (and 2001 would also be desirable), one is sort of guessing in the dark as to how their strategy would fare in a crisis. If you invest by "seeing how you do", without doing backtesting studies, or at least relying on high-quality ones of others, you are are guessing in the dark. I suppose if you are day-trading, it's less important. Yes, many inexperienced investors don't care a fig about delisted tickers. There is a reason finance.yahoo.com gives similar info out free; but investors who want data to include delisted tickers pay a high premium. FWIW, SEC listings probably don't do much to tell you the final value of a stock.

    Another critical historical datum, IMHO, is either when a stock enters or leaves an index, or changes cap size. Many big companies start out small, and their market cap grows over time. Stock performance changes with market cap. Thus, small caps have a different long-term average return vs. large caps. Large caps usually have higher returns. Should you test an investment strategy taking any stock, say, immediately after IPO? You will have a lot of small cap stocks in your strategy. If you restrict your tests to, especially, the S&P 500, then I've heard estimates that that accounts for 80% of all US market capitalization - yet it will leave out a lot of smaller and insolvent companies. Thus, S&P 500 membership at a certain time will often be of unique importance in backtesting, for US stocks, absent capitalization categories vs. time. Norgate data has a nice package of index data, index components, stock prices, and stock fundamentals, but at $630/year - and only in Metastock format. Serious investors should look for such features, IMHO. I recommend you put such info on your roadmap, hopefully for implementation at some point; or possibly recommend sources of such data, say, in a FAQ.

    I know it's a lot to expect for free. In fact, I really don't. I am enthusiastic about it, if you can. I'm just trying to point out the high importance of such data.
  • Very interesting info, thanks a lot. Will definitely be putting these points on the roadmap, I think most of them are actually very doable (also for free). Maybe getting historical data of companies that already "disappeared" could get hard, but this is a long term project so at least going forward we'll be able to build a good dataset of companies that went out of business (and I'm sure new crashes are coming too ;) ).
  • As always, Thomas, thanks for your work on this site! I'm sure it is not easy, and is a lot of work! Hopefully our comments on the data are providing a little assistance with the workload.

    I'd second the interest in more data and therefore less survivorship bias. Backtesting is of big interest to me as well.

    The front loading page says SimFin now covers about ~2,200 companies. Perhaps as a general indicator of "completeness" and potential for this type of bias, is there any way to know how many companies would constitute "complete" coverage of the time period represented here? A quick search indicated to me that NYSE currently has ~2,800 active tickers currently, while NASDAQ currently has ~3,300 active tickers currently. Together this is about ~6,100 ... spanning the entire period from XBRL inception, do you have any sense of how many companies/tickers actually would constitute a "complete" dataset, which I guess should also include all the other US exchanges as well?
  • The major indices aim for 60-80% of market cap - since cap falls off very quickly, it takes a relatively smaller % of names (tickers) to get very good coverage cap-wise (and hence volume-wise).

    Look-ahead and survivorship bias are indeed big problems in backtests. Unrestated data is already in the SimFin databases, but is extremely painful to extract one stock, one periodicity at a time via Fuse. I am new to SimFin, so perhaps I have missed an easier way to get the as reported data? Please, someone enlighten me!
  • edited December 2020
    We will make a "as-reported" dataset available in the future, right now the values are always the most recent ones in the web API and the bulk download. I am aware that the as-reported values are better for backtesting so this is on my list for the near future. You can get the as-reported values through the company profiles right now by changing the settings but not through the APIs yet.
  • Thanks for your reply, and for this website/service overall, and all the hours and care that have gone into it. I am glad to hear that as-reported is on your to-do list, and thanks for confirming that there is no way to get that data via the API yet. I will continue to go through Fuse.
Sign In to comment.