So actually sorted on revenue numbers and it turned out that there are several lines with either negative numbers or missing values. Is the data at some point checked for quality ?
As I said in the other thread, we have many checks in place for the data and we had the revenue check at one point but then decided to remove it for now.
understandable. But for my ML project I am better of with the three months ending time series so i need to find a solution by myself. Thanks much for the quick answer. Appreciate it.
Comments
Many companies have no revenues in the dataset (mostly some biotechs) so this is fine. Some companies report negative revenues but this is very rare. The negative revenues can occur for CALCULATED periods like Q4 as the companies never report these periods explicitly (only full year figures) in case there is e.g. a demerger.
For DNKN the reported revenue in FY 2017 (https://www.sec.gov/Archives/edgar/data/1357204/000135720418000018/dnkn-20171230x10k.htm#sD5FC8E48980087E8780AC2FCA80AEC10) is lower than the revenue of Q1 (https://www.sec.gov/Archives/edgar/data/1357204/000135720418000024/dnkn-20180331x10q.htm#sE38B43E01CDE84D8AEBF68BCA59097FF) + Q2 (https://www.sec.gov/Archives/edgar/data/1357204/000135720418000040/dnkn-20180630x10q.htm#s22A0DF18FC5C1AF32C97DBCA1C4DB337) + Q3 (https://www.sec.gov/Archives/edgar/data/1357204/000135720418000049/dnkn-20180929x10q.htm#sA9151B5B90065CE323B2AB318BA4D404) combined, so when we calculate Q4 automatically, negative revenues is the result.
In this case though there is a restatement available for FY 2017 (which our old crawler didn't fetch, but our new one does), so this can be fixed (and should be now), but most of the times these Q4 problems can't be solved. So if you want to avoid this problem you should use the annual dataset, as the values in there are reported by the company and not calculated by us.