Towards a data-driven future: the Deep Data Delivery Standards

If one insight from Brexit is certain, then that new #BigData driven methods of opinion polling need to be developed, since the hedge funds‘ private Exit Polls were clearly wrong.

To exploit data sets for relevant insights, one does not only need sheer size (#BigData) but also sufficient depths of the data (#DeepData). Depth hereby ensures that a data set [i] can be efficiently processed (e.g. is appropriately formatted), [ii] has been effectively compiled to include the deepest information value (e.g. compilation specific are recorded including errata) and [iii] is financially independent from the subject it describes (e.g. data sets assessing corporations have not been paid for by corporate marketing budgets).

So how can anyone identify Deep Data? This is a very relevant questions to which no simple answer existed. To address this need, a group of 36 volunteers teamed up in a non-commercial, scientific initiative and developed the Deep Data Delivery Standards during the 2015/16 season. The Deep Data Delivery Standards are a public good that any asset owner or asset manager is explicitly permitted to use. They were launched June 22nd at RI Europe. Detailed information can be found here:

A shortened version of the Deep Data Delivery Standards (without annotation) is provided below:

Deep data sets are expected to be delivered …

  1. … with a minimum of 5 years historical data on at least 30 independent indicators per data set (e.g. credit rating, ESG data) whereby any data point that is not delivered as reported at the respective point in time should be flagged as backfilled;
  2. … with 98% value weighted market coverage, where a market (e.g. equity index) is claimed to be covered;
  3. … with an assurance that ratings will be re-considered for at least 8.25% of the companies covered in the average month of the following year;
  4. … including considerate, accurate identifiers (eg ISINs) for 99% of the firms covered in every month of current and historical data coverage;
  5. … in machine readable format (eg CSV, XML) and with proper documentation of the data structure;
  6. … with an assurance of individual rating independence meaning that none of the rated entities in the respective market (e.g. equity index) financially contributed to their rating or paid for access solely to their own rating;
  7. … with an assurance of organizational rating independence meaning that whenever rating agencies win entities as new clients which they also rate an independent analysis is conducted if these new clients receive, statistically significant, higher ratings than in the year before and any biases found in this analysis will be addressed within 12 months;
  8. … with an assurance that all research or rating reports in the following year will indicate names and office locations of all analysts substantially involved in the analysis as well as the extent to which their data sources exceed those self-reported by the rated entity;
  9. … with an assurance that all research or rating reports in the following year will include a logbook detailing any errata, where applicable, as well as the dates and roles of participants in communication with the rated companies;
  10. … accompanied by the ratio of the rating agency’s research costs to total cost or the ratio of research head count to total head count in the most recent financial year.

More on this:

3rd Financial Data Science Conference: Why Big Data & Artificial Intelligence are ready to disrupt Security Markets
CFDS – Chartered Financial Data Scienctist

Kind regards,

Andreas Hoepner on behalf of the volunteers.

Disclaimer: Die in der Reihe Positionen im DVFA-Blog vertretenen Meinungen geben nicht unbedingt die Position der DVFA wieder.

Über Dr. Andreas Hoepner

Dr. Andreas G. F. Hoepner is an Associate Professor of Finance at the ICMA Centre of Henley Business School.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.