Generating Realistic Random Datasets with Python

Wed 14 February 2018

As a data engineer, after you have written your new awesome data processing application, you think it is time to start testing end-to-end and you therefore need some input data.

As a data scientist, you can benefit from data generation since it allows you to experiment with various ways of exploring datasets, algorithms, data visualization techniques or to validate assumptions about the behaviour of some method against many different dataset of your choosing.

In both cases, a tempting option is just to use real data. One small problem though is that production data is typically hard to obtain, even partially, and it is not getting easier with new European laws about privacy and security.

Trumania is a Python library that we created at Real Impact Analytics to address exactly those issues and that is now released as open source.

A detailed tutorial has been published on DataCamp here: ttps://www.datacamp.com/community/tutorials/generate-data-trumania

Svend Vanderveken

Generating Realistic Random Datasets with Python