Creating an account is currently only possible via registration at SimFin.
%%time
df_val_signals_d = hub.val_signals(variant='daily')
Dataset "us-shareprices-daily" on disk (0 days old).
- Loading from disk ... Done!
Cache-file 'val_signals-77ca1f39.pickle' on disk (0 days old).
- Running function val_signals() ...
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
in
%%time
df_val_signals_d = hub.val_signals(variant='daily')
Dataset "us-shareprices-daily" on disk (0 days old).
- Loading from disk ... Done!
Cache-file 'val_signals-77ca1f39.pickle' on disk (0 days old).
- Loading from disk ... Done!
CPU times: user 8.26 s, sys: 2.07 s, total: 10.3 s
Wall time: 10.5 s
Comments
I need to check some more, maybe play with the timestamp of the files, maybe something regarding the OS related functions (Ubuntu vs OS X).
I am not sure how I can “debug” the code in the simfin package while being used in the notebook…
Any suggestions welcome
PS: btw, you may want to assure the code is working for machines with less RAM or make a way to compute the data on one machine and then be able to move it around to others. I believe you use a powerhorse machine for dev/test from what I understand from your hobbies/activities in the day to day life
Yeah this is something we didn't fully consider when building the Python API more than a year ago. Basically the share price files which were already quite big then increase by around 3k data points every day, so they are already considerably larger now than they used to be. For that reason we also introduced the pre-calculated ratios some time ago.
The problem is a bit in general that Pandas requires to load all the data in memory AFAIK, and we use Pandas dataframes for everything. I guess we could do something like splitting the data files into several pieces or do some filtering (starting year etc) before loading everything in the dataframes.