3.1.1 Mapping texts
Lawmakers should begin by creating three datasets composed of disclosure rules (Sect.
1), firm-level policies (Sect.
2), and the case-law pertaining to both (Sect.
3). Again, the domains of online privacy and online contract terms are taken as examples.
1.
Disclosure rules as dataset: the De Iure disclosures
For the sake of simplicity, we term
De Iure disclosures all rules where disclosure duties are set. In the privacy context, requirements to platforms to disclose information to individuals regarding their rights, how their data are collected and treated would fit this category. Just as examples we may quote: Sec. 1798.100(a) CCPA, which stipulates the duty of ‘a business that collects a consumer’s personal information [to] disclose to that consumer the categories and specific pieces of personal information the business has collected’. GDPR Art. 12 requires data-controllers to provide similar information to data subjects.
Technically speaking, these rules can be understood as datasets (Livermore and Rockmore), that can be retrieved and analyzed through NLP techniques (Boella et al.
2013,
2015), easily searched (e.g. via the Eur-lex repository), modelled (e.g. using LegalRuleML) (Governatori et al.
2016; Palmirani and Governatori
2018), classified and annotated (e.g. through the ELI annotation tool)
20 For instance, the PrOnto ontology has been developed specifically to retrieve normative content from the GDPR (Palmirani et al.
2018).
While rules may be clear in stating the goals of required disclosure, it may well be that convoluted sentences or implied meaning appear that make the stated goal far from clear. Also, the same rule may sometime prescribe a conduct with a nice level of detail (if X, than Y), but it may include provisions that require, for instance, that information about privacy shall be given by platforms in ‘conspicuous, accessible, and plain language’.
21 Even if governmental regulation is adopted specifying what these terms mean, they would not escape interpretation (Waddington
2020), and thus possible conflicting views by the courts.
22
To help attenuate these problems, proposals have been made to use NLP tools to extract legal concepts and linking them to one another, e.g. through the combination of legislation database and legal ontology (or knowledge graph). Boella et al. (
2015) suggest using the unsupervised TULE parser and a supervised SVM to automate the collection, classification of rules and extraction of legal concepts (in accordance with Eurovoc Thesaurus). This way, the meaning of legal texts will be easier to understand, making complex regulations and the relationships between rules simpler to catch, even if they change overtime. Similarly, LegalRuleML may be used to specify in different ways how legal documents evolve, and to keep track of these evolutions and connect them to each other.
2.
Firm-level disclosure policies as dataset: the De Facto disclosures
The second dataset is that of firm-level disclosure policies, that we term
De Facto disclosures. The latter include but are not identical to the notices elaborated by the industry to implement the law or regulations. We refer to the overly-famous online Terms of Services (commonly found online and seldom read). With regard to privacy policies, pioneering work in assembling and annotating them was undertaken by Wilson et al. (
2016), resulting in the frequently used ‘OPP-155’ corpus. Indeed, ML is now standard method to annotate and analyze industry privacy policies (Sarne et al.
2019; Harkous et al.
2018).
3.
The linking role of case-law
The case-law would play an important role, serving as the missing link between legal provisions and their implementation. Indeed, courts’ decisions help detect controversial text and provide clarification on the exact meaning to give both De Iure and De Facto disclosures. It follows that case outcomes and rule interpretation should be used to update the libraries with terms that can come out as disputed, and others that can become settled and undisputed.
23
A good way to link the case law with rules is that proposed by Boella et al (
2019) who present a ‘database of prescriptions (duties and prohibitions), annotated with explanations in natural language, indexed according to the roles involved in the norm, and connected with relevant parts of legislation and case law’.
In the EU legal system, a question might arise if only interpretative decisions by the European Courts or also those of national jurisdictions should be included in the text analysis, given that the first would provide uniform elucidation that binds all national courts (having force of precedent), but most case-law on disclosures originates from national controversies and does not reach the EU courts. We know, for instance, that the EU jurisprudence saves to global platforms only a minor part of the costs they spend in controversies with consumers; the paramount ones are those platforms bear for litigations hold before national jurisdictions,
24 where there is no binding precedent, and the same clause can be qualified differently.
Moreover, differently from the US,
25 in Europe, only the decisions by the EU Courts are fully machine-readable and coded (Panagis et al.
2017),
26 while the process to make national courts’ ones also so is still in the making (it is the European Case Law Identifier: ECLI),
27 although at a very advanced stage. Nonetheless, analytical tools are already available that allow to link the EU to national courts’ cases. For instance, Agnoloni et al
2017 introduced the BO-ECLI Parser Engine, which is a Java-based system enabling to extract and link case law from different European countries. By offering pluggable, national extension, the system produces standard identifier (ECLI or CELEX) annotations to link case law from different countries. Furthermore, the EU itself is increasingly conscious of the need to link European and national case law, resulting, for instance, in the EUCases project which developed a unique pan-European law and case law Linking Platform.
28
As shown by Panagis (cit.), of algorithmic tools, citation network analysis in particular, can be extremely useful in addressing not only the question of which is the valid law but also which preceding cases are relevant as well as how to deal with conflicting interpretations by different courts. The latter is especially relevant in systems where there is no binding precedence (i.e. most national EU legal systems) and where, consequently, differing interpretations of certain ambiguous terms might arise. By combining network analysis and NLP to distinguish between different kind of references, it might be possible to assess which opinions are endorsed by the majority of courts and could thus be considered the ‘majority opinion’. While other methods to analyze citations in case law might establish the overall relevance of certain cases in general, only the more granular methodology suggested by Panagis et al. seems well fit to assess which interpretations of certain ambiguous terms are “the truly important reference points in a court’s repository”. In this way, case law can be used to link the general de iure disclosures and the specific de facto disclosures while duly taking into account different interpretations of the former by different courts.
3.1.2 Mapping the causes of failure
To measure the causes of failure of both
de iure and de facto disclosures is not an easy task (Costante et al.
2012). Nevertheless, quantitative indices are indispensable to conduct the following analysis, to make the information they store easily accessible and readable for machines and algorithms. Also, such indexes guarantee the repeatability and objectivity required for the sake of scientific validity.
In line with our ‘comprehensive approach’, for each stage, failures must be identified, mapped and linked with the failures at other stages, since these are inherently intimately related.
Therefore, we propose defining a standard made of three top-level categories of failure that can be used for both
de Iure and de facto disclosures:
(i)
Readability. Length of text can be excessive leading to information overload.
(ii)
Informativeness. Lack of clarity and simplicity can lead to information overload. But also the lack of information can result in asymmetry.
(iii)
Consistency. Lack of same lexicon and cross reference in the same document or across documents that may lead to incoherence.
Based on these three framework categories, we establish golden standard thresholds and rank clauses as optimal (O) or sub-optimal (S–O) (Contissa et al.
2018a). This way, we would for instance, rank as S–O a privacy policy clause under the ‘length of text’ index, if it fails to achieve the established threshold under the goal of ‘clarity’ as stated in the GDPR Article 12. At the same time, however, Article 12 or some of its provisions—as seen—may score S–O under other failure indexes, such as lack of clarity (vagueness). The case-law might help clarify whether this is the case.
In the following, we elaborate the methodology for designing a detailed system of indexes to capture the main causes of failure. Furthermore, we provide ideas on how to translate each indicator into quantitative, machine-readable indices. Table
1 summarizes our findings.
Table 1
Indexes of failure of de iure and de facto disclosures—methodology and ranking
Readability | Information overload | Length of text | No. of polysyllables on the basis of the length of the text Rank: e.g. if longer that X words (golden standard), then rank S–O ALGO: SMOG; Dale–Chall readability formula; Gunning Fog Index Major Ref. Bartlett et al. ( 2019) |
Informativeness | Information overload | Complexity of text | Syntactic: No. of certain grammatical structures (nodes) containing complex text (e.g. conjunctive adverbs—however, thus, nevertheless—passives, modal verbs—could, should, might) Rank: E.g. if number of nodes containing complex tokens in clause is higher than X per sentence of a Y length (golden standard), than rank S–O Major Ref. Botel and Granowsky ( 1972) or Szmrecsanyi ( 2004) Semantic: use of complex, difficult, technical or unusual terms called 'outliers' (e.g. ‘as necessary’, ‘generally’) or of two or more semantically different CI parameters in information flows Rank: E.g. if clause contains more outliers than the number set in golden standard, then rank as S–O ALGO: LOF, CI in information flows Major Ref. Bartlett et al ( 2019) Shvartzshnaider et al. ( 2019) |
| Information asymmetry | Lack of information | Presence of all information required by the law (e.g. identity of data controller, types of personal data collected; goals of treatment, etc.) Rank E.g. if clause omits more elements than all those necessary according to golden standard, then rank as S–O Major Ref. Liepina et al ( 2019) and Costante et al ( 2012) /or Contissa et al. ( 2018a, b) |
Consistency | Internal and External | Interaction amongst clauses within the same text and across texts | Recurrence of same lexicon and cross reference between different clauses in the same document and across documents Rank: E.g. if a clause scores lower than the citation network gold standard for cross-reference links or evaluation of textual similarity, then rank as S–O Major Ref. Panagis et al. ( 2017) [citation net]; or Nanda et al. ( 2019) [similarity models] |
1.
Readability. Information overload: length of text
The first quantitative index is readability. It is mainly understood as non-readership due to information overload, and measured in terms of ‘
length of text’.
There is a large variety of readability scores (Shedlosky-Shoemaker et al.
2009), based on the length of text which are frequently highly correlated, thus ‘easing future choice making processes and comparisons’ between different readability measures (Fabian et al.
2017).
Among the many, we take Bartlett et al. (
2019) proposing an updated version of the old (1969) SMOG. Accordingly, annotators establish a threshold of polysyllables (words with more than 3 syllables) a sentence may contain, in order to be tagged as unreadable by the machine,
29 and hence S–O. The authors suggest ‘a domain specific validation to verify the validity of the SMOG Grade’.
This is especially relevant to make our proposal workable. Not all domains are the same and an assessment of firm-level privacy policies would clearly require to be made in each sector. For instance, the type of personal data a provider of health-related services collects would be treated differently from those of a manufacturer retailer dealing with non-sensitive data.
Under the Readability-Length of text index, sub-optimal disclosure clauses use more polysyllables than those established in the golden standard, set and measured using the revised version of SMOG proposed by Bartlett et al ( 2019) |
2.
Informativeness. Information overload: complexity of text
Lack of readability of disclosures may also depend on the complexity of text. The scholarship has suggested to measure it from both a semantic and syntactic points of view.
While most analyses of readability focus on the number of words in a specified unit (e.g. a sentence, paragraph, etc.) as a proxy for complexity, only few authors focus on analyzing the syntactic complexity of a text separately (Botel and Granowsky
1972). Although some scholars search for certain conditional or relational operators, they usually do so with the aim of detecting sentences that are semantically vague or difficult to understand (e.g. see Liepina et al. (
2019): see next para.).
Going back to Botel and Granowsky, they propose a count system which designates a certain amount of ‘points’ to certain grammatical structures, based on their complexity (the more complex, the more points). For instance, conjunctive adverbs (‘however’, ‘thus’, ‘nevertheless’, etc.), dependent clauses, noun modifiers, modal verbs (‘should’, ‘could’, etc.) and passives will be assigned one or two points respectively, whereas, for instance, simple subject-verb structures (e.g. ‘she speaks’) receive no points.
30 The final complexity score of a text is then calculated as the arithmetic average of the complexity counts of all sentences.
31
An alternative approach is that of Szmrecsanyi (
2004), who proposes an ‘Index of Syntactic Complexity’, which relies on the notion that ‘syntactic complexity in language is related to the number, type, and depth of embedding in a text’, meaning that the more number of nodes in a sentence (e.g. subject, object, pronouns), the higher the complexity of a text.
32 The proposed index thus combines counts of linguistic tokens like subordinating conjunctions (e.g. ‘because’, ‘since’, ‘when’, etc.), WH-pronouns (e.g. ‘who’, ‘whose’, ‘which’, etc.), verb forms (finite and non-finite) and noun phrases.
33
Although this might be ‘conceptually certainly the most direct and intuitively the most appropriate way to assess syntactic complexity’, it is pointed out that this method usually requires manual coding.
34
Since at least the last two measures seem to be highly correlated,
35 choosing among them might in the end be a question of the computational effort associated with calculating such scores.
Under the Informativeness-Syntactic complexity Index, S–O disclosure clauses (of a given length) use a number of complexity nodes that is higher than the standard, defined and measured using Botel and Granowsky ( 1972) or Szmrecsanyi ( 2004). |
Semantic complexity (or the use of complex, difficult, technical or unusual terms called ‘outliers’) is analyzed by Bartlett et al. (
2019) who use the Local Outlier Factor (LOF) algorithm (based on the density of a term’s nearest neighbors) to detect such terms.
Approaching the issue of semantic complexity from a slightly different angle, Liepina et al. (
2019) evaluate the complexity of a text based on four criteria: (1) indeterminate conditioners (e.g. ‘as necessary’, ‘from time to time’, etc.), (2) expression generalizations (e.g. ‘generally’,’normally’, ‘largely’, etc.), (3) modality (‘adverbs and non-specific adjectives, which create uncertainty with respect to the possibility of certain actions and events’) and (4) non-specific numeric qualifiers (e.g. ‘numerous’, ‘some’, etc.). These indicators are then used to tag problematic sentences as ‘vague’.
In a similar vein, Shvartzshnaider et al. (
2019) base their assessment of complexity/clarity on tags, however, in a different manner. They analyze the phenomenon of ‘parameter bloating’, which can be explained as follows: building on the idea of ‘Contextual Integrity’ or CI and information flows,
36 the description of an information flow is deemed (too) complex (or bloated) when it ‘contains two or more semantically different CI parameters (senders, recipients, subjects of information, information types, condition of transference or collection) of the same type (e.g., two senders or four attributes) without a clear indication of how these parameter instances are related to each other’
37 This results in a situation where the reader must infer the exact relationship between different actors and types of information, which significantly increases the complexity of the respective disclosure (at 164). Therefore, the number of possible information flows might be used as a quantitative index to measure the semantic complexity of a clause.
Under the Informativeness-Semantic complexity Index, a S–O disclosure clause contains more outliers or semantically different CI parameters in information flows than the number set in the golden standard, defined and measured using Bartlett et al ( 2019) or Shvartzshnaider et al. ( 2019) respectively. |
3.
Informativeness. Information asymmetry: lack of information
Another failure index of information asymmetry is the completeness of the information provided in a disclosure. Comprehensiveness has been investigated mainly at the firm-level disclosure policies, rather than the rulemaking (Costante et al.
2012). It must be noted that the requirement of completeness does not automatically counter readability. While an evaluation of the completeness of a disclosure clause is merely concerned with the question whether all essential information requested by the law is provided, readability problems mostly arise from the way this information is presented to the consumer by the industry. Therefore, a complete disclosure is not per se unreadable (just as an unreadable disclosure is not automatically complete) and the two concepts thus need to be separated.
Several authors suggest tools to measure completeness, especially in the context of privacy disclosure. However, nothing impedes to transfer the approaches presented in this section to disclosures like terms and conditions of online contracts.
For instance, based on the above-outlined theory of CI, Shvartzshnaider et al. define completeness of privacy policies as the specification of all five CI parameters (senders, recipients, subjects of information, information types, condition of transference or collection). Similarly, Liepina et al. (
2019) consider a clause complete if it contains information on 23 pre-defined categories (i.e. ‘ <id> identity of the data controller, <cat> categories of personal data concerned, and <ret> the period for which the personal data will be stored’). If information that is considered ‘crucial’ is missing, the respective clause is tagged as incomplete. Manually setting the threshold would then help define if a clause scores as optimal or not.
A similar, but slightly refined approach is presented by Costante et al. (
2012, at 3): while they also define a number of ‘privacy categories’ (e.g. advertising, cookies, location, retention, etc.), their proposed completeness score is calculated as the weighted and normalized sum of the categories covered in a paragraph.120F.
For our purposes, a privacy or online contract disclosure clause could be ranked using the methodology suggested by Costante et al (
2012) and Liepina et al., or alternatively, by Contissa et al. In both cases, however, corpus tagging would be necessary.
Under the Informativeness-Lack of information Index, in sub-optimal disclosure clauses (of a given length) the number of omitted elements is higher than the pre-defined minimum necessary standard, defined and measured using Liepina et al ( 2019) and Costante et al ( 2012) or Contissa et al. ( 2018a, b). |
One of the two root causes of failure identified above concerns the misalignment of the regulatory goals behind the duty to disclose certain information (as stated in the
de iure disclosures) and the actual implementation thereof in the de facto disclosure. The general criterion that can be derived from this is that of
consistency, which can be translated into two sub-criteria: internal and external. However, since their measure and computability are identical, they will be treated together.
Internal consistency denotes the recurrence of the same lexicon in different clauses of the same document as well as the verification of cross-references between different clauses within the same document. External coherence means the cross-references that refer to clauses contained in different legal documents. External coherence too can be understood both as the recurrence of the same lexicon across referred documents and the verification of the respective cross-references. For instance, one rule in the GDPR might refer to others both explicitly (e.g. Article 12 recalling Article 5) or implicitly (like the Guidelines on Transparency provided for by the European Data Protection Board)
38; or a privacy policy might refer to a rule without expressly quoting its article or alinea in the article.
Unfortunately, there is no common, explicit operationalization of internal and external coherence in the literature.
A first attempt to analyze cross-references in legal documents is made by Sannier et al. (
2017), who develop a NLP-based algorithmic tool to automatically detect and resolve complex cross-references within legal texts. Testing their tool on Luxembourgian legislation as well as on regional Canadian legislation, they conclude that NLP can be used to accurately detect and verify cross-references (at 236). However, their tool would allow to construct a simple count measure of unresolved cross-referenced, which might serve as a basis for the operationalization of internal and external coherence, both in terms of the lexicon used (see above, 2nd cause of failure: complexity of text), and the correct referencing of different clauses (see above, 3rd cause of failure: lack of information). Nevertheless, this is far from the straightforward, comprehensive solution one might wish for.
A solution could be to rely on more complex NLP tools such as ‘citation networks’, as proposed by Panagis et al. (
2017) or ‘text similarity models’, as suggested by Nanda et al. (
2019).
The citation network analysis tool by Panagis et al. (
2017), seems particularly straightforward, since it uses the Tversky index to measure text similarity. Therefore, using a tool such as theirs would automatically cover both the verification of cross-reference links as well as an evaluation of the textual similarity of the cited text.
Another promising option to capture text similarity is the model proposed by Nanda et al. (
2019), who use a word and paragraph vector model to help measure the semantic similarity from combined corpuses.
39 After manually mapping the documents (rules provisions and respective policy disclosures), the corpuses are automatically annotated helping to establish the gold standard for coherence. Provisions and terms in the disclosure documents would then be represented as vectors in a common vector space (VSM) and later processed to measure the magnitude of similarity among texts.
This last two models especially come with the advantage of capturing the distance in implementation of rules-based disclosures by the industry policies. They seem therefore very promising in the aim of measuring both the distance in lexicon as well as the presence of cross-reference within the same disclosure rule or policy (and define the gold standard).
Under the Consistency Index, a sub-optimal disclosure clause scores lower than the gold standard for cross-reference links or lexicon similarity, measured using either the citation network tool by Panagis et al. ( 2017) or the similarity model by Nanda et al. ( 2019). |
3.1.3 Getting to hypothetically optimal disclosures (HOD) through ontology
Preparing the texts in the
de iure and de facto data sets means that we process the disclosures in each domain to rank them, thus collecting those that score optimal for each failure index. More specifically, per each clause or text partition of the disclosures in each (
de iure and de facto) dataset, processing for the five analyzed indexes will provide a score, allowing to identify a set of optimal disclosure texts (see Table
2). So for instance, we should be able to select the optimal disclosure provision in the GDPR as far as its ‘readability’ index is concerned. The same should be for the clause of a privacy policy implementing that provision in a given sector (like e.g. the short term online home renting): imagine that is ‘Clause X’ of AirBnB disclosure policy. The two would form the ‘optimal pair’, under readability, of
de iure and de facto privacy disclosures in the short-term online home renting sector. The same should be done for all clauses and each failure index.
Table 2
– Example of ranking of disclosure pair leading to HOD, based on failure index
CRD Art. 6 | Partition X, Art. 6a(1)(ea) | AirBnB Portion Y ToS policy | Optimal (score 1) | Optimal (score 1) | Optimal (score 1) | Optimal (score 1) | Optimal (score 1) | HOD |
(info on personalized prices) | Expedia Port. W | Optimal (score 1) | Optimal (score 1) | Optimal (score 1) | S–O (score 0) | Optimal (score 1) | Not incl. in HOD |
Booking Port. Z | S–O (score 0) | Optimal (score 1) | S–O (score 0) | S–O (score 0) | S–O (score 0) | Not incl |
VRBO Port. XY | S–O (score 0) | S–O (score 0) | S–O (score 0) | S–O (score 0) | S–O (score 0) | Not incl |
The kind of coding (whether done manually or automatically) and training to employ, clearly depends on the methodology that will be chosen to perform for each of the failure indexes sketched above. In any event, labelling the disclosures might require some manual work by legal experts in the specific sector considered.
The next step is to link the two selected ‘optimal pair’ of de iure and de facto disclosures in the data sets, to reach a sole dataset of what we should term Hypothetically Optimal Disclosure, or HOD.
While, theoretically, a simple, manually organized, static database could be used to do so, the Law & Tech literature suggests a significantly more effective and flexible solution: the use of an ontology/knowledge graph (Shrader
2020; Sartor et al.
2011; Benjamins
2005)10 (Table
3).
As discussed above (I.A.1), legal ontologies are especially apt in this purpose, because they allow automating the extraction and linking of legal concepts, and to keep them up to date even if they change overtime (Boella et al
2015). Another reason is that some ontologies allow to link legal norms with their implementation practices, a feature that is relevant to us.
A good model for linking texts through ontology is provided for by the Lynx project
40 (Montiel-Ponsoda and Rodríguez-Doncel
2018). Lynx has developed a ‘Legal Knowledge Graph Ontology’, meaning an algorithmic technology that links and integrates heterogeneous legal data sources such as legislation, case law, standards, industry norms and best practices.
41 Lynx is especially interesting as it accommodates several ontologies able to provide the flexibility required to include additional nodes anytime rules or policies change.
To adapt the Lynx ontology to our needs, manual annotation to establish structural and semantical links of
de iure and de facto disclosure datasets would nonetheless be needed. That should be done taking into consideration the results of the ranking process, upon which optimal disclosure pairs are selected (Table
2, above). Hence, manual annotation in ontology would consist in functionally linking of only the latter texts, based on semantic relations between their contents.
In our model, nodes will be represented by the failure criteria sketched above. These nodes are already weighted as Optimal/Sub-Optimal and thus given a specific relevance, which allows an analytically targeted and granular nuancing of the ontology.
A further step consists in the assessment of the overall ‘coherence’ of HOD ontology. Coherence in this context is understood as a further failure index, consisting of
Lack of cross-reference between the Optimal principles-
level rule and the
corresponding Optimal implementing level policy (Table
3).
Table 3
Using ontology to get to the hypothetically optimal disclosures (HOD)
Coherence/overall | Cross-validation amongst clauses across datasets | Verification of cross-referencing between principles-level (de iure) and application level (de facto) leading to incoherence Rank If ‘clause-pair’ scores lower than the gold standard for cross-reference links, then rank S–O ALGO: Lynx Legal Knowledge Graph Ontology + manual annotation Major Ref. Alschner and Skougarevskiy ( 2015) |
After manual annotation, to cross-validate amongst clauses across datasets, this process would help to further verify if there is cross-reference between the optimal pairs, or between the principles-level of the de iure disclosure and the application level of the de facto disclosures, given that they might come from policies drafted by different firms.
A solution could be to rely on ‘citation networks’, as proposed by Alschner and Skougarevskiy (
2015). Focusing on the lexical component of coherence, citation network would help to calculate the linguistic ‘closeness’ between different, cross-referenced documents
42 and to assess their coherence.
43
This way, we will be able to give evidence to the overall optimal linked disclosures (i.e. showing the highest scores assigned to each and every pair per single sector domain) and hence to validate the overall coherence of HOD per given domain.
In conclusion, out of the linked data ontology HOD, we should be able to select the texts that fail the least, under a comprehensive approach. These are linked texts, made of the optimal rules (disclosure duties), linked to their optimal implementations (policies), whose terms are clarified through the case law and that score optimal for each and every failure index.
HOD are self-executing algorithmic disclosures, which specifications can be used by the industry to directly implement their content. This however opens a plethora of legal and economic questions regarding their efficacy, legitimacy and proportionality.
3.1.4 Limitations of HOD: legitimacy and efficacy
HOD are selected that are the optimal available algorithmic disclosures, but they are still prone to failure. We do not know how effective they might be in leading to behavioral change; how well they could inform real consumers and have them make a sensible choice (for a skeptical take: Zamir and Teichman
2018), given their diverse preferences (Fung et al.
2007). We do not have evidence if the optimal disclosure text regarding a given clause will perform well or not. For instance, imagine we are ranking disclosures in the short-term online renting sector, and that the HOD regarding information provision on the service ranking indicates that the optimal pair is “CRD Art. 5”—“AirBnB Terms, Clause X”: what do we know about its efficacy? The HOD cannot tell.
Moreover, since the comprehensiveness of the proposed approach implies that HOD might complement or even partially substitute tasks that would normally be executed or at least supervised by democratically elected representatives, concerns of legitimacy arise. In the example done, once the optimal pairs identified through the HOD, the idea is that “CRD Art. 5”- “AirBnB Terms Clause X” would be automatically implementable. However, that would be problematic under legitimacy terms.
Lastly, HOD may lack proportionality, since they are addressed to undiversified, homogeneous consumers (the average ones), based on assumption of homogenous reading, understanding, evaluation, and acting capabilities (Di Porto and Maggiolino
2019; Casey and Niblett
2019). However, the same disclosure may well be excessively burdensome for less cultivated consumers, while being effective for well-informed, highly literate ones.
In the following, we explore these three issues separately.
Although they are hypothetically optimal inter-linked texts, constantly updated with new rules, industry policies, and case-law, easily accessible and simplified, not so costly to read and understand, the overall efficacy of HOD remains untested.
On this land stand the enthusiasts, like Bartlett et al. who purport that the use of text analysis algorithmic tools, which summarize terms of contracts and display them in graphic charts, ‘greatly economize[s] on [consumers’] ability to parse contracts’ (Bartlett et al.
2019). However, they do not provide proof that this is really so (if one excludes the empirical evidence supporting their paper). Paradoxically, the same holds for those who oppose the validity of simplification strategies and information behavioral nudging, like Ben-Shahar (
2016). They consider that ‘simplification techniques…have little or no effect on respondents’ comprehension of the disclosure.’ But again, this conclusion refers to the ‘best-practice they surveyed’.
2.
Legitimacy deficit of HOD
HOD suffer from a deficit of legitimacy. Because an algorithm is not democratically elected, nor is it a representative of the people, it cannot
sic et simpliciter be delegated rulemaking power (Citron
2008, at 1297).
While in a not so far future it may well be that disclosure rules become fully algorithmic (produced through our HOD machine), a completely different question is whether disclosure we have selected as the hypothetically optimal might also become ‘self-applicable’, or, in other words, whether their adoption can become one step only, without any need for implementation. This is surely one of the objectives of HOD. By selecting the optimal rules together with the optimal implementation and linking them in an ontology, we aim at having self-implementing disclosure duties.
Hence, it is necessary to re-think of implementation as a technical process, strictly linked (not merged) with the disclosure enactment phase. But especially, we need to ensure some degree of transparency of the HOD algorithmic functioning and participation of the parties involved in the production of algorithmic disclosures.
Self-implementation of algorithmic rules is one of the least studied but probably the most relevant issues for the future. A lot has been written on the need to ensure accountability of AI-led decisions and due process of algorithmic rule-making and adjudication (Crawford and Schultz
2014; Citron
2008; Casey and Niblett
2019; Coglianese and Lehr
2016). However, whilesome literature exists on transparency and explicability of automated decision-making and profiling for the sake of compliance with privacy rules (Koene et al.
2019), the question of due process and disclosure algorithmic rule-making has been substantially neglected.
However, a problem might exist that the potential addressees of self-applicable algorithmic disclosure rules do not receive sufficient notice of the intended action. That might reduce their ability to become aware of the reasons for action (Crawford and Schultz at 23), respond and hence support their own rights.
44 Also comments and hearings are generally hardly compatible with an algorithmic production of disclosures; while they would be especially relevant, because they would provide all conflicting interests at issue to come about and leave a record for judicial review. The same goes for expert opinions, which are often essential parts of the hearings: technicians may discuss the code, how it works, what is the best algorithm to design, how to avoid errors, and suggest improvements.
In the US system, it is believed that hearings would hardly be granted in the wake of automated decisions because they would involve straight access to ‘a program’s access code’ or ‘the logic of a computer program’s decision’, something that would be found far too expensive under the so-called Mathews balancing test (Crawford and Schultz at 123, Citron at 1284).
In Europe too, firms would most probably refuse to collaborate in a notice and comment rulemaking, if they were the sole owner of the algorithm used to produce disclosures, since that might imply to disclose their source codes, and codes are qualified as trade secrets (thus, exempt from disclosure).
Moreover, as (pessimistically) noted by Devins et al., the chances for an algorithm to produce rules are nullified, because ‘Without human intervention, Big Data cannot update its “frame” to account for novelty, and thus cannot account for the creatively evolving nature of law.’ (at 388).
Clearly, all the described obstacles and the few proposals thus far advanced are signs that a way to make due process compatible with an algorithmic production of disclosure rules is urgent and strongly advisable.
3.
Lack of proportionality of HOD
Although it is undeniable that general undiversified disclosures may accommodate heterogeneous preferences of consumers (Sibony and Helleringer
2015), in practice, they may put too heavy a burden on the most vulnerable or less cultivated ones, while not generating outweighing benefits for other recipients or the society. In this sense, they may become disproportionate (Di Porto and Maggiolino
2019).
On the other side, also targeting disclosure rules at the individual level (or personalizing) (Casey and Niblett), as suggested by Busch (
2019), may be equally disproportionate (Devins et al
2017) as can generate costs for individuals and the society. For instance, if messages are personalized, the individual would not be able to compare information and therefore make meaningful choices on the market (Di Porto and Maggiolino
2019). That, in turn, would endanger policies aimed at fostering competition among products, which are based on consumers’ ability to compare information about their qualities.
45 Also, targeting at the singular level requires necessarily to obtain individual consent to process personal data (for the sake of producing personalized messages) and also show one’s ‘own’ fittest disclosure.
46