4.1 Auditing and Enforcing these Bona Fide Constraints
If the reputation of EMIF, the trust of data custodians, as well as the trust of the authorities and the public, substantially hinge upon access to EMIF services that is limited to bona fide research organisations and for bona fide research purposes, EMIF will be expected to assure itself and others that these conditions are met. In principle, data sharing requests should be supported by approval of an ethical oversight body (at the requesting institution or obtained by the data custodian prior to undertaking the research investigation). Beyond that, however, the question still remains as to the further responsibility of EMIF. Some tensions arise here partly due to whether EMIF has an obligation to monitor and investigate the internal activities of EMIF users (such as pharmaceutical companies), and partly due to the feasibility of such an obligation given cost issues, since large scale monitoring will be expensive.
Self-declaration is insufficient to enforce bona fide constraints. Something beyond pure self-declaration should be implemented, to avoid loss of credibility. Our view is that EMIF should have dedicated staff for this screening. A ‘light’ and complementary solution would be to set up a notice-and-action system, whereby EMIF users themselves can signal “inappropriate” organisations if they become aware of them (cf. flagging systems commonly used on social media). An intermediate solution, also compatible with the work of dedicated staff, would be to allow EMIF users to ‘rate’ other users on the basis of the interactions they had with each other (cf. rating systems of auctioning websites like eBay). However, given that research groups may be in competition, peer rating might be motivated by interests that are not visible to EMIF. In both cases, signalling and rating would need to be independently assessed. In addition to peer mechanisms, data protection authorities and their strengthened powers on the basis of GDPR will play a major role in this context. Finally, there should also be obligations on the other end, that is, an appeals process against the decision, to avoid the danger of inadvertently creating unfair market advantage or monopoly to the detriment of an organisation that is “substantially similar” to one that has been allowed to participate.
It is not realistic to try to prevent a large organisation from sharing direct access to a research dataset, or indirectly sharing the research results, with departments and staff that are not conducting bona fide research, such as marketing departments. One should simply require that this is not done by contract when signing up to be an EMIF user, or perhaps by means of a regular self-declaration. One may then require periodically some form of evidence that the research results have only been used for permitted purposes. As a general strategy, one should regard the organisation as a whole as doing bona fide or non-bona fide research with EMIF data, otherwise the implementation would become unrealistic. It does not seem feasible to screen every contract, but it may be important to add a clause to the Terms and Conditions and the contract for use of the EMIF Platform, as just indicated. Note that the focus remains on the nature of the specific activity, not on the nature of the organisation (although the two are easily connected, it is the priority given to the former that matters, see above). The sanction would be the (temporary or permanent) exclusion of the whole organisation from having access to EMIF. Adding a clause to the effect that the data cannot be used for anything else than the intended purpose would mean that, if something goes wrong, the wrongdoers will be found to have violated the contract. Finally, in order to enforce bona fide constraints, terms and conditions of the EMIF platform shall make reference to GDPR provisions, such as Art. 40 on Codes of Practice and Art. 42 on certifications. In addition, Art. 55 on the powers of supervisory authorities should be taken into account. Whether or not EMIF can fully police enforcement mechanisms, this activity will be complemented by the enforcement mechanisms of GDPR. On this basis, one can imagine further safeguards via external auditors, ethics certificates, and so forth.
Even if there is no routine provision of evidence, EMIF should retain powers to investigate any concerns about the use made of EMIF services by one of its user organisations, such as the authority to appoint an external auditor and a requirement that the organisation support (and pay for) the investigations of such an auditor. Such power is crucial. To cite an example: within the ethics management system of the Bavarian construction industry, a company has to undergo an audit every 3 years and pay for it in order to renew their ethics certificate. This is essential.
A range of sanctions can be applied by EMIF to a research organisation found to be in breach of its obligation only to conduct bona fide research. One may consider suspending temporarily or permanently the license to use EMIF services. One may also envisage publishing a list of usage breaches, partly for transparency to the public, partly to act as a deterrent, and partly to ‘name and shame’. The problem is both legal and ethical. From an ethical point of view, EMIF should adopt ‘closed’ sanctions, such as a warning, or a temporary suspension—and in case of serious and repeated breaches, a permanent revoking—of the license to use EMIF services. However, from a legal point of view, ‘naming and shaming’ can be problematic, because issue of legal liability for reputational damage may arise. One should only ‘name and shame’ if one proceeds in accordance with fair trial requirements (the party should be given a reasonable period to react and restore the breach, the decision should be open for appeal, etc.). We advocate in the Terms and Conditions of the EMIF Platform the possibility of some kind of mediation for serious cases whereby EMIF users accept the authority of an existing or ad hoc panel of mediators and are willing to bear the costs. To give an analogy, the ethical ‘sanctions’ would be comparable to the peer review system, which is good at many issues but less good at spotting fraud and ultimately relies on the honesty of the partners. It is preferable to acknowledge clearly this reliance than hide it behind grand-sounding but unenforceable provisions. Of course, legal sanctions remain a possibility but they are a different issue (e.g. enforcement of current legal rules on data protection). A question to be addressed is how transparent the ethical sanctions should be. At minimum EMIF could make public (e.g. via its web site) the action it is taking when a breach is identified, without naming the wrongdoer. This would have the further benefit of signalling that unethical uses will be identified and sanctioned.
4.2 Transparency of Access Requests
A final question that arises, is whether data custodians should be informed about every request for access to data they made available through EMIF, and if so, with which level of detail. In this context, a fair balance needs to be struck between commercial sensitivity arguments invoked by data users, who may wish to conduct research confidentially (even if the results may later be published), and data custodians who may need to confirm adherence to any constraints they have placed on the permitted uses of their data.
One scenario, favoured by some of the EMIF data custodians we have consulted, is that an audit report extracted from the query log is always accessible to authorised individuals within the data custodian organisation. This would allow inspection of which organisations have executed queries each day, on which categories of their data and with what parameters, and if any disclosure controls were applied. Since this would reveal the research areas being investigated by each research user, this access would need to be governed by a confidentiality agreement, with penalties for breach along the same lines as those described for research users above.
A second scenario, that might be acceptable to some data custodians, is for a filtered overview of analysis activity to be provided regularly to data custodians, with a more detailed audit log extract only produced if a concern is raised. Such a filtered overview report can be defended as the more usable approach. Because of this need for filtering, the system should make it technically possible to indicate which constraints are attached to uses of certain datasets (otherwise the data custodian will run the legal risk of non-compliance). Online repositories such as SSRN, for example, frequently make public user statistics like “this paper has been seen/downloaded/cited X times” etc. Something similar could be envisaged. It would be sufficient to indicate to users that there is interest, and maybe patterns in that interest, as opposed to having detailed and explicit information about who is interested in what. In short, a balance should be struck between the commercial sensitivity of some data users and the protection of data custodians. With a caveat: transparency is not the one-size-fits-all solution. For example, in addition to a trusted third-party audit (EMIF positions itself as a trusted third-party to audit query activity on behalf of the data custodians, instead of allowing the data custodians direct insight into the queries being run on their data), it is possible that some technical solutions will help us in this context, such as zero-knowledge proofs.
This is an area where there is a need for wider consultation with data custodians and research users on the appropriate level of activity transparency and the mechanisms for protecting commercial sensitivity, to balance the legitimate and understandable interests on both sides.