Welcome to the Forum

Creating an account is currently only possible via registration at SimFin.

Problem case reporting - Crawler

Problem cases are best reported on Github using the tag "Problem Case": https://github.com/SimFin/pdf-crawler/issues

Comments

  • Hiya!,

    i don't know how to use the pdf-crawler. Could someone give me an example for Deutsche Post DHL (P&L-data)?

    I've already written a small script to extract the data from the xls files. https://www.dpdhl.com/en/investors/ir-download-center.html.
    My own script returns:
    
                                                       FY 2015 FY 2016 FY 2017 FY 2018 FY 2019
    id
    # Employees (Full Time Equivalent)                     NaN     NaN     NaN  489571  499461
    Basic EPS                                             1.27    2.19    2.24    1.69    2.13
    Changes in inventories and work performed and c...     NaN     NaN     168      87     239
    Consolidated net profit for the period                1719    2781    2853    2224    2776
    Depreciation, amortization\nand impairment loss       1665    1377    1471     NaN     NaN
    Depreciation, amortization and impairment loss         NaN     NaN     NaN   -3292   -3684
    EBIT margin (in%)                                      NaN     NaN     NaN   0.051   0.065
    EBITDA                                                 NaN     NaN     NaN    6454    7812
    Income taxes                                          -338    -351    -477    -362    -698
    Net financial income                                  -354    -359    -411    -576    -654
    Net income from investments (equity method)              2       4       2      -2      -8
    Net profit attributable to\nDPAG shareholders         1540    2639    2713     NaN     NaN
    Net profit attributable to DPAG shareholders           NaN     NaN     NaN    2075    2623
    Non controlling interest                               179     142    -140    -149    -153
    Number of shares (in m)                                NaN     NaN    1210    1230    1234
    Other operating expenses                              4740    4414    4526   -4597   -4431
    Other operating income                                2394    2156    1971    1914    2351
    Profit before income taxes                            2057    3132    3330     NaN     NaN
    Profit before income taxes (EBT)                       NaN     NaN     NaN    2586    3474
    Profit from operating activities (EBIT)               2411    3491    3741    3162    4128
    Purchased goods and services                         33170   30620   32775  -31673  -32070
    Revenue                                              59230   57334   60444   61550   63341
    Staff cost                                           19640   19592   20072     NaN     NaN
    Staff costs                                            NaN     NaN     NaN  -20825  -21610
    Tax rate (in %)                                        NaN     NaN    14.3    0.14   0.201
    Total operating expenses                             59215   56003     NaN     NaN     NaN
    Total operating income                               61624   59490     NaN     NaN     NaN
    
    There is something wrong with some of the (-) signs.
    The worksheets contains different data and different formatting.
    Not rly easy to put this all together. A inner join (only the same data):
                                                FY 2015 FY 2016 FY 2017 FY 2018 FY 2019
    id
    Basic EPS                                      1.27    2.19    2.24    1.69    2.13
    Consolidated net profit for the period         1719    2781    2853    2224    2776
    Income taxes                                   -338    -351    -477    -362    -698
    Net financial income                           -354    -359    -411    -576    -654
    Net income from investments (equity method)       2       4       2      -2      -8
    Non controlling interest                        179     142    -140    -149    -153
    Other operating expenses                       4740    4414    4526   -4597   -4431
    Other operating income                         2394    2156    1971    1914    2351
    Profit from operating activities (EBIT)        2411    3491    3741    3162    4128
    Purchased goods and services                  33170   30620   32775  -31673  -32070
    Revenue                                       59230   57334   60444   61550   63341
    
    I guess there should be some further standardization?
    Compared to MSFT there are different index names.
  • Hi Incoggnito,

    the PDF extraction is actually a quite complex process, we wrote some articles about it here if you are interested: https://medium.com/@SimFin_official/pdf-extractor-alpha-a357376e12a2
    and here:
    https://medium.com/@SimFin_official/simfin-fuse-2-0-54d6446fc6d4

    The PDF extraction requires still a bit of manual corrections at the moment, and right now we focus our resources on making sure the US dataset is up-to-date, so we're not adding companies with the PDF extractor right now, but will start doing this soon again.
  • Hi tlflassbeck,

    ty for your response. Kind of strange paying membership to help populate the database. I thought I could simply support the project with some work and would then benefit from the points system. Apparently I haven't understood the system properly yet ;) Could u link or tell me how i could support this project?
Sign In to comment.