Welcome to the Forum

Creating an account is currently only possible via registration at SimFin.

Common Shares Outstanding errors, affecting Market Capitalization

Hi Thomas,

I've found some errors in the 'Common Shares Outstanding' indicator values, in a few different Tickers. These are sometimes associated with correct, but sometimes also incorrect, 'Avg. Basic Shares Outstanding' and 'Avg. Diluted Shares Outstanding' values. This then seems to affect the calculation of Market Capitalization, which bounces around, consistent with changes in the Common Shares Outstanding values.

Here are 2 example Tickers:


Ticker: 'A'
-- 2017-02-28: Common Shares Outstanding 322,300.905 (consistent with EDGAR)
-- 2017-01-31: Common Shares Outstanding 322,000 (seems consistent but dunno where this value sourced from)
-- 2016-10-31: Common Shares Outstanding 614,000 (weird value) **this is actually Issued Shares (Outstanding + Treasury shares), not Outstanding shares only. That's why number so high.
-- 2016-08-31: Common Shares Outstanding 324,384.755 (consistent with EDGAR)
-- 2016-07-31: Common Shares Outstanding 613,000 (weird value)

You can see that Common Shares Outstanding bounces back and forth from the 300,000s to 600,00s, then back. In EDGAR filings, they are all consistently in the 300,000s.

Interestingly, Avg. Basic Shares Outstanding and Avg. Diluted Shares Outstanding for 2016-10-31 is closer to the correct value, at 326,000 and 329,000, respectively. But sometimes, the inverse is true, where the Common Shares Outstanding is the correct value, and the Avg. Basic Shares Outstanding and Av. Diluted Shares Outstanding values are erroneous:

-- 2009-07-31: Common Shares Outstanding 345,112.058
-- 2009-07-31: Avg. Basic Shares Outstanding 175,500.17375
-- 2009-07-31: Avg. Diluted Shares Outstanding 176,000.174


Meanwhile, another example, Ticker: 'AAPL'

-- 2013-10-18: Common Shares Outstanding 899,738 (consistent with EDGAR)
-- 2013-07-12: Common Shares Outstanding 908,497 (consistent with EDGAR)

but there are some weird values in between these dates:

-- 2013-09-30: Avg. Diluted Shares Outstanding 17,723,430 (weird)
-- 2013-09-30: Avg. Basic Shares Outstanding 17,600,412 (weird)
-- 2013-09-28: Common Shares Outstanding 6,294,494 (weird)


Here are my download settings:

Stock prices + Fundamentals
Publishing date / Period end-date (same results in both)
Narrow format


Maybe helpful: I've figured out some of the source of discrepancy, but not all. When I use the SimFin Data Finder tool, and then trace the source of some of these bad values, it appears that the PDF crawler is finding some of the abnormally high Common Shares Outstanding numbers from the text of the Balance Sheet. But the Balance Sheet text refers to "Issued Shares", which is often but not always the same as "Outstanding Shares" (as you know Issued Shares = Outstanding Shares + Treasury Shares). So some of the high numbers for Common Shares Outstanding appear to be taken from this "Issued Shares" text of the Balance Sheet, and these records should probably be corrected.

Separately, where do the numbers for Avg. Basic Shares Outstanding and Avg. Diluted Share Outstanding come from? There are some weird numbers in the bulk download; I'm not sure if they're in the SimFin Data Finder too (haven't checked).



  • Hi Charles,

    yes there might be some errors still in the shares outstanding figures, I haven't built in yet a mechanism that monitors the quality of these figures/potential errors (for the fundamentals there is something in place). There is one error source which comes from XBRL, which is that some companies don't specify the right unit sometimes in their shares outstanding values, so the most common error I saw so far were values that were off by a factor of 1000 (either too big or too small). I know this could be fixed quite easily but it just hasn't been my top priority yet (instead these values haven been corrected manually when spotted).

    We don't take the shares outstanding figures from the balance sheet though, it's really a separate figure that is being reported by the companies in XBRL but it's quite messy as I said already (especially when there is more than one share class involved).

    Additionally, the figures in the data finder and bulk download might deviate from Edgar filings because of share splits, when a share split occurs (for Apple this is what happened in June 2014 for example) this is not reflected in the price because we take the adjusted closing prices (which account for stock splits, so the historical prices are "adjusted" to the price post-split), so as this is not reflected in the price, all historical shares outstanding figures have to be adjusted by the factor of the share split. Thats why for companies that had a share split in the past the shares outstanding figures deviate from Edgar as they are adjusted for the split.
  • Hi Thomas,

    You're right on all counts - it looks like the weird problems I'm encountering are mostly due to the stock splits. I think I figured it out for AAPL, but it looks like there's currently an error in the historical adjustments, after stock splits:

    You're right that the historical Share Price values are "adjusted" to the price post-split. If you look in the SimFin data at the Share Price values over time, even through the date of a stock split, they increase and change without any "jumps" or sudden decreases, as you would expect with a stock split. But that means that, in order for market capitalization to be consistent, then the historical Common Shares Outstanding values need to be "adjusted" as well.

    Consistent with this, it looks like companies adjust both historical Share Price and Common Shares Outstanding values in their SEC filings, when their filings reference either of these values from pre-stock split dates.

    Problem right now in SimFin data is that the historical Share Price values are adjusted, but the Common Shares Outstanding values are not. This leads to strange Market Cap results when these 2 numbers are multiplied.

    Interestingly, for AAPL, at least, this actually solve the question of why the Common Shares Outstanding values "bounce" back and forth from low to high values and back sometimes, instead of just increasing once at the time of the stock split. It's because the older filings, pre-stock split, have the pre-split lower Common Shares Outstanding. After the stock split, the newer filings have the post-split, higher number for Common Shares Outstanding. SimFin data correctly reflects this sudden jump in Common Shares Outstanding at the time of stock split (June 2014 for Apple, 7-for-1 split as you mentioned). But the kicker is: some of the newer post-split SEC filings will include & reference data from a year ago (for YoY comparisons, for example), and it looks like they must report these older reference numbers in the XBRL. But the Common Shares Outstanding value from a year ago has been adjusted for the split, because this filing was written and filed after the stock split. The XBRL data transmitted thus includes this newer, adjusted value for the Common Shares Outstanding, even though it is reporting on the Common Shares Outstanding from a time period prior to the stock split. SimFin then inserts it into the table, dated appropriately from a year ago. But this value has been adjusted; it is then inserted into a time period where the surrounding values were originally reported un-adjusted, pre-stock split, leading to several sequential values that bounce back and forth between adjusted and un-adjusted Common Shares Outstanding.

    It's an interesting problem. Thanks for pointing me in the right direction -- it makes more sense to me now.

    On a forward-looking note, is there any way you think this can be corrected, or updated? Ultimately, I think that the Common Shares Outstanding values need to be adjusted by the stock split, same as the Share Price is, in the same way that the company SEC filings are done. Market Cap isn't consistent otherwise.

    (I also did see the off-by-1000-factor problem due to unit inconsistency too, in a few values. I also have not yet analyzed Ticker 'A', but just looked at 'AAPL' so far since you gave me the stock split hint!)


  • Hi Thomas,

    I've uncovered a second, different source of error for the odd Common Shares Outstanding values, and actually it is seen in the first ticker above (Agilent, Ticker='A'):

    It seems that some companies (like Agilent) are reporting the Common Shares Issued as the Common Shares Outstanding, even though they may be different values if the company retains any Treasury Stock. When I look up Common Shares Outstanding for Agilent, in the SimFin Data Finder webpage:

    Ticker = 'A'
    Common Shares Outstanding as of: 2011-10-31:
    Value is listed as: 591,000,000 (this is a weird value and the correct value is closer to 347 million shares).

    There haven't been any relevant stock splits around this time period or soon afterward.

    From SinFin Data Finder webpage, if I click on the value to determine the source, it ultimately leads me to the following link:


    In the Balance Sheet, it states:
    "Common Stock, 591 million shares at October 31, 2011, issued"
    and the next line states:
    "Treasury Stock, 244 million shares at October 31, 2011"

    Following formula for Outstanding Stock = Issued Stock - Treasury Stock, this would make for Outstanding Stock = 347 million shares, which is basically the correct value.

    However, SimFin data currently shows 591 million shares Outstanding, when it actually should be 591 million shares Issued, and only 347 million shares Outstanding.

    Since SimFin is getting their data from XBRL and not from reading the Balance Sheets, it must be that some companies, like Agilent above, are incorrectly reporting the Issued Shares number under the Outstanding Shares value.

    I'm not sure how this can be fixed if the companies themselves are reporting the wrong values ....



  • Hi Thomas,

    Do you think there's any possibility of working out the inconsistent Common Shares Outstanding issue, relatively soon?

    The thing is that it's such a fundamental value - it affects calculation of P/E ratio, P/B ratio, P/S ratio, etc etc.

    I realize you're very busy making tons of additions to the site! But just wanted to impress upon you the importance of accuracy of this particular value.


  • edited March 2019
    Hi Charles,

    yes wanted to get back to you anyway. Interesting point, I never thought about extracting the shares outstanding data from the balance sheet items (currently it's separate indicators in the XBRL data).

    Shares outstanding is on the top of my list but I just don't believe the SEC database is a good enough source to get that kind of data, taking the data from the balance sheet is interesting but it only solves part of the problem, since we don't just care about shares outstanding but also avg. basic/diluted shares outstanding over the reported period (which are hidden somewhere else).

    I'm working on getting all the shares outstanding info from the PDF reports too so that there are less errors, this is pretty hard but I think it's doable. The PDF extraction is currently taking almost all my time so sorry for not responding earlier.
  • it's cool - thanks thomas for your response! I realize you have a ton on your plate, so I appreciate your responsiveness, and as has been mentioned already by me and multiple people before, this site is incredible.

    some observations I've made / ideas I've thought of, that might be helpful to you with the PDF extraction of this info (and maybe not; you've probably already thought of them):

    - I've noticed that for Common Shares Outstanding, the easiest place for me to look is the cover page of the Quarterly/Annual reports. Near the signatures, there's always a line that says, "As of December 31, 2014, there were 6,984,245 shares outstanding" or something like this. You're right that this does not include info on Basic / Diluted shares.

    - Balance Sheet extraction was very helpful for me, for understanding specifically how many shares were Outstanding vs. Issued, since the XBRL data looks like it is sometimes reported incorrectly, but the Balance Sheet was always helpful in this regard. I have a feeling Balance Sheet may be able to be used to find Basic / Diluted shares also but have not looked into this specifically.

    Do you have any idea of when this issue might be addressed though? Obviously I know there's no firm date, but do you think this would be on the order of weeks, or months? Although Basic & Diluted Shares would be great too, I think that in terms of time, for initial usefulness, accuracy of just the Common Shares Outstanding metric alone would already greatly help to make the Market Cap measure more useful.

  • I am trying my best to improve this as quick as possible :) currently I am focusing mostly on tables inside the PDFs, so would be interesting to look whether the shares outstanding info can be found inside tables or if it's usually somewhere in the text (not in tables).
  • edited March 2019
    it's really hard to give you a realistic estimate though how long this is going to take, the PDF extraction is tremendously difficult but all the info inside the PDFs is of high quality and almost everything you want to know is in there, so I think it's worth it. I don't want to work on improving the XBRL extraction because I think it's a "sinking ship" in a way, and there is no way in trying to fix some small holes in it.
    So it could take a few months before this gets considerably better, but I'm sure I can figure it out as the current progress with the PDF extraction is actually quite remarkable and I'm surprised myself how well it's working already. I'll write a new blog post about this soon.
  • ok awesome Thomas! i see the logic in your approach and that makes complete sense - why bother trying to improve XBRL if the data in it is low quality. agree with the sinking ship analogy, haha.

    awesome, i am very much looking forward to the new PDF extraction data!! maybe it will simultaneous fix the other issue with the unavailable data on companies with non-calendar fiscal years. =D who knows!

  • Hey Thomas,

    I just wanted to remind you - even if you're able to get the Common/Diluted Shares Outstanding value accurate from your new PDF extraction algorithm, please remember that, similar to Share Price, the historical values for this metric will need to be adjusted, retroactively, in the event of any stock splits. Otherwise, Market Cap = Stock Price (retroactively adjusted) x Common Shares Outstanding (NOT retroactively adjusted) will be wrong.

    Alternatively, some other solutions if this may be too confusing:
    - You could have separate fields for Common Shares Outstanding Unadjusted, and Common Shares Outstanding Adjusted.
    - Or, having a separate Market Capitalization value for each historical date would solve this problem since company's Market Capitalization value is unaffected by stock splits.

  • Hi Charles, yes, I am aware of that and this adjustment is currently being made already. So I think it's really just about getting a better source for the original data.
  • Hi Thomas,

    First, congrats on the post and email about the PDF extractor. It's exciting! Maybe it can improve on this continuing issue with Common Shares Outstanding etc.

    I wanted to report: there are still some weird values for Common Shares Outstanding (CSO), Average Basic Shares Outstanding (ABSO) and Average Diluted Shares Outstanding (ADSO). I took a quick look at the most recent bulk download data, at a few tickers and just wanted to report what I found.

    My method was just to browse the data, manually, to see if there were weird big jumps back and forth in numbers, and weird looking numbers. Basically there are still some spurious values, and also even some negative values (which are obvious errors). One Ticker didn't have any ABSO or ADSO info, just CSO info. (There were weird values in Market Cap as well, and they seemed to mirror any weird values in CSO. I'm guessing you just derive Market Cap numbers from the CSO numbers.)

    Data follows below. As you mentioned before, you wanted to try to get a better source for the original data for all of the # Shares metrics. Do you think the PDF extractor will solve these issues, and be a better data source for this particular issue? As it is, there are currently still enough errors that it still prevents SimFin data being used for valuation purposes, since all the valuation metrics depend on # Shares in some form or another.


    -- Some observations from my manual browsing of data:

    By Ticker & SimFin ID:

    A 45846
    CSO jumps - generally 340,000 then 2011-10-31 goes to 591,000, but with occasional jumps back to 340,000 range on non-quarter-ending dates (2011-12-01, 2012-12-01, 2013-12-01 -- often but not always on 12/1).
    sometimes on quarter-ending dates too, like after 2015, jumps back and forth oddly.
    ABSO ok.
    ADSO ok.
    Market Cap consistent 2008 until 2011-10-31, when jumps from 9541.45 to 14721.81. Then consistent until 2011-12-01, when takes sudden dive from 14893.2 to 8849.3419. Then 2012-01-31 jumpst back to 16,000s
    Market Cap seems to jump with updates to CSO, which makes sense based on equation.
    Therefore Market Cap no more consistent than CSO. Depends on CSO. Can just evaluate which of first 3 most consistent.

    AA 367153
    CSO ok.
    ABSO ok.
    ADSO ok.

    AAL 68568
    CSO ok, but some jumps at end of 2013 - not sure if real or error
    ABSO inconsistent and weird. starts 300,000s, then has negative value on 2011-12-31, then jumps from 120,000s to 700,000s
    ADSO also inconsistent and weird. has same pattern as ABSO.

    AAMC 137841
    CSO starts with weird/low values from 2012 until 2013-09-30, then gradually jumps to 57,000s in 2014-03-31 (not sure if legit or error). After this consistent.
    ABSO also stabilizes 2014, then ok afterward.
    ADSO same pattern, also stabilizes 2014, then ok afterward.

    AAMC 847094
    CSO 2000s until some odd-date entries, jumps down to 1700s, 1900s, 1600s. All end-of-month entries look consistent, but entries on 22, 28, 29 of month are jumpe dlower than expected
    ABSO ok.
    ADSO ok but with 1 weird jump value at 2015-12-31, when jumps from 2208 to 1098, then abck up to 1990.

    AAME 450021
    CSO ok.
    ABSO ok.
    ADSO ok.

    AAN 441241
    CSO ok.
    ABSO ok.
    ADSO ok.

    AAOI 671827
    CSO weirdly jumps around until 2013-11-07, when starts to stabilize at 12,000s, then gradually up to 14,000s and ok from there.
    ABSO also weird values until 2014-03-31, stabilizes 14,000s.
    ADSO same pattern as ABSO.

    AAP 184955
    CSO weird values until 2011-01-01, when stabilizes for a while at 105,000s, then jumps down to 73,000s in 2012 (might be real) and stabilizes from there, but then with a few weirdly low values in 2013 (1300s)
    ABSO weird values until 2011 (appears to be combo of random weird values, and also some unit thousands vs. millions issue), then stabilizes in 70,000s
    ADSO same apttern as ABSO.

    AAPL 111052
    CSO ok.
    ABSO ok.
    ADSO ok.

    AAWW 762204
    CSO ok.
    ABSO 2010 with some unit errors. and then 1 weird value 2010-12-31. then consistent, but some negative value on 2014-12-31. and another spurious value 2015-12-31.
    ADSO same pattern as ABSO.

    AAXN 493957
    CSO ok
    ABSO ok
    ADSO ok

    AB 823898
    CSO ok
    ABSO no records
    ADSO no records

    ABAX 36105
    CSO ok
    ABSO ok
    ADSO ok

    ABBV 61199
    CSO ok
    ABSO ok
    ADSO ok

    ABC 187024
    CSO ok
    ABSO ok but with 1 weird negative value 2013-09-30
    ADSO ok but same apttern as ABSO

    ABCD 358762
    CSO ok
    ABSO ok but some with wrong unit until 2010-09-30
    ADSO same pattern as ABSO

  • I'm guessing you just derive Market Cap numbers from the CSO numbers.)


    I know shares outstanding are still an issue for some companies. It is my top priority though to get a better source and fix this once and for all.

    I am doing two things now: I am reworking the current XBRL crawler and will be combining it with the PDF extraction, and then I'll set up some better algorithm to check for inconsistencies in timeseries data.

    If the PDF extraction is working perfectly, the XBRL becomes obsolete because in theory the PDFs should contain all the relevant share figures, but it won't be perfect at the beginning, so I'm trying to combine the data sources for now to get a better overall result.
  • Hi Thomas!

    Hope things are coming along. Just checking in - any progress on improving Shares Outstanding data, or estimates on timing for improved accuracy of this metric?

  • hi Charles, yes working on it now, update will be online in the coming weeks. Not sure if that will include the extracted share data from the PDFs already but it will definitely fix the wrong unit errors amongst others
  • Great! looking forward to checking it out.
  • Hi Thomas!

    Just checking in. Do you think new extracted share data from PDFs will be in the next update? (Also is there a date estimate on it?) I'm glad to hear that the wrong unit errors should be fixed though. =)

  • Hi Charles,

    I made a draft now for the extraction of share data from the PDFs and its working quite well but has to be tested a bit more. The next update will be Fuse 2.0 which is almost done so I want to finish that first now. Shares are my top priority afterwards, I hope the share PDF extraction will be running by the end of the month.
  • Hi Thomas,

    That's awesome. I'll wait for the bulk download because I'm more used to looking at that!

    Hey, I thought of a tip/hint that might make it easier for you to eliminate source of quite a few of these errors. I noticed that quite a few of the errors currently come when, after a stock split, a new quarterly report is issued, and the quarterly report then includes # shares on periods prior to the stock split. The quarterly report will then "adjust" the previous time-period shares, for the stock split, so that the numbers can be compared easily across quarters. However, I noticed that SimFin will then include these "back-reported" numbers, and date them to the previous dates. That's responsible for a lot of the weird-looking "jumping back and forth" numbers, when it comes to the Stock share splits.

    An easy way to eliminate this would be to just not include data about Shares, when it pertains to quarters prior to the current quarterly report. In other words, if the current quarterly repot is 2017-Q2, then don't include any Shares info in that report, that might pertain to 2016-Q2, or 2015-Q2, etc. (the quarterly reports usually include them for comparison purposes). Theoretically, these numbers should have already been reported in the original 2016-Q2 and 2015-Q2 reports, so you won't be missing this data.

    This might be one way to eliminate these errors, whether the source is the XBRL or the PDF extractor. By current system, the PDF extractor may still extract these "previous-quarter" or "previous-year" values, which might be a source of discrepancy if it spans a stock split.

    But then again, you did mention that you had a stock-split correcting algorithm, and this might take care of these issues too. Just a thought, in case it was useful!

  • Hi Thomas, were these common shares outstanding errors fixed? Charles
  • Hi Charles,
    sorry for the late reply - I'm really busy finalising some things currently.

    Your idea about taking only the values from the actual period for the shares outstanding is good. Currently the algorithm looks at the publishing date of the figures, and if the publishing date is later than the date of the stock split, the figures are not adjusted and vice versa. If you have an example for a company where there is currently a problem with the stock split date, feel free to post it here and I'll check again why the algorithm failed there.

    Some bugs for shares outstanding are now fixed with SimFin 2.0 (the 1000s missing for example), but the PDF extraction for share data is not yet ready (dealing with some last issues for the PDF extraction of statements currently).

    We are also completely reworking the bulk download currently, which might be interesting for you, as the new format will be super easy to use. I'll tackle the shares again in a more comprehensive manner as soon as I can, sorry that it's taking so long, it takes time to do things properly and my time resources have been really limited recently (and also focused on getting the PDF extraction running), it hope that will get better in the coming months.
This discussion has been closed.