1. Introduction
The inaugural Generative AI and Law (GenLaw) workshop took place on July 29th and 30th, in Honolulu, Hawai’i, where it was co-located with the 40th International Conference on Machine Learning (ICML). The workshop was organized in response to the intense interest in (and scrutiny of) recent public advancements in generative-AI technology. The primary goal was to bring together experts in machine learning (ML) and law to discuss the legal challenges that generative-AI technology raises. To promote concrete and focused discussions, we chose to make intellectual property (IP) and privacy the principal legal topics of the first workshop. Other significant topics discussed included free speech, products liability, and transparency. For this first workshop, most discussion was limited to considerations of U.S. law.
The workshop was convened over two days. The first day (July 29th) was a public session held as part of ICML, consisting of keynote lectures, panel discussions, lightning talks, and a poster session, all dealing with research issues at the intersection of Generative AI and law. The second day (July 30th), was held off-site, at which approximately forty participants conducted a series of roundtable discussions to dig deeper into significant issues identified on the first day.
This reportThis work is licensed under CC BY 4.0. To view a copy of this license, visit this page.
reflects the takeaways from the roundtable discussions. They are organized into five broad headings, reflecting the participants’ consensus about the most urgently needed contributions to the research area of Generative AI and law:
- A high-level statement about why Generative AI is both immensely significant and immensely challenging for law (Section 2);
- The beginnings of a shared knowledge base that provides a common conceptual language for experts across disciplines (Section 3);
- Clarification of the unique capabilities and issues of generative-AI systems, setting them in relation to the broader landscape of artificial-intelligence and machine-learning technologies (Section 4);
- An initial taxonomy of the legal issues at play (Section 5); and,
- A concrete research agenda to promote collaboration and progress on emerging issues at the intersection of Generative AI and law (Section 6).
To best serve these ends, this report does not delve into the technical details of specific generative-AI systems, the legal details of complaints and lawsuits involving those systems, or policy proposals for regulators. Our intended audience is scholars and practitioners who are already interested in engaging with issues at the intersection of Generative AI and law, for example, ML researchers who have familiarity with some of the ongoing lawsuits regarding Generative AI, and lawyers who have familiarity with terms like “large language model.” We focus our attention on synthesizing reflections from the workshop to highlight key issues that need to be addressed for successful research progress in this emerging and fundamentally interdisciplinary area.
2. The Impact of Generative AI on Law
Generative AI is “generative” because it generates text, images, audio, or other types of output. But it is also “generative” in the sense of Jonathan Zittrain’s theory of generative technologies: it has the “capacity to produce unanticipated change through unfiltered contributions from broad and varied audiences” (Zittrain 2008). As a result, generative-AI systems will be both immensely societally significant — too significant for governments to ignore or to delay dealing with — and present an immensely broad range of legal issues. To see why, it is useful to consider Zittrain (2008)’s five dimensions of generativity:
- Leverage
- A technology provides leverage when it makes difficult tasks easier. Generative AI is widely recognized for its use in creativity; programming; retrieving and synthesizing complex bodies of knowledge; and automating repetitive tasks.
- Adaptability
- A technology is adaptable when it can be applied to a wide range of uses. Generative AI is celebrated for its adaptability. It has been applied to programming, painting, language translation, drug discovery, fiction, educational testing, graphic design, and much more.
- Ease of mastery
- A technology is easy to master when users without specialized training can readily adopt and adapt it. While some generative-AI methodologies, such as model pre-training, still require technical skills, the ability to use chat-style, interactive, natural-language prompting to control generative-AI systems greatly reduces the difficulty of adoption. Users without programming or ML backgrounds have been able to use Generative AI for numerous tasks.
- Accessibility
- A technology is accessible when there are few barriers to its use. Cost is the most obvious barrier, but other barriers can include regulation, secrecy, and linguistic limits. The creation of cutting-edge generative-AI models from scratch requires enormous inputs of data, compute, and human expertise — currently limiting model creation to a handful of institutions — but services allowing inference with these models are widely available to the public. These services are inexpensive for users making small numbers of queries, and they tend to operate close to real-time, making generative outputs (at least appear) low-cost to produce, in terms of time.
- Transferability
- A technology is transferable when changes in it can easily be conveyed to others. Once pre-trained or fine-tuned, generative-AI models can be easily shared, prompts and prompting techniques are trivially easy to describe, and systems built around generative-AI models can be made broadly available at increasingly low effort and cost.
In short, Generative AI hits the generativity jackpot. It provides enormous leverage across a wide range of tasks, is readily built on by a huge range of users, and facilitates rapid iterative improvement as those users share their innovations with each other.
Zittrain (2008)’s two examples of supremely generative technologies from 2008 are computers and the Internet. Generative AI seems likely to be a third. No other technology of the last two decades even comes close. Regulators and legal scholars should expect that Generative AI will raise legal and policy challenges that are comparable in scope, scale, and complexity to those raised by computers and the Internet.
The Internet-law analogy also provides guidance on how technologists and lawyers can approach this shared challenge. They must have a common vocabulary so that their contributions are mutually intelligible (Section 3). Lawyers must have a sufficient foundation of technical understanding to be able to apply their expertise in law accurately (Section 4). Technologists, for their part, must have a sufficient foundation of legal knowledge to identify legally significant technical interventions (Section 5). And both groups need a common research agenda to collaborate and iterate rapidly on effective projects that advance a shared understanding how Generative AI and the legal system interact (Section 6). The aim of this report is to lay down a starting framework for these tasks.
3. Developing a Shared Knowledge Base
It became apparent over the course of the GenLaw roundtable discussions that some commonly used terms have different meanings in machine learning and in law. Sometimes, both groups have been working to develop deep understandings of important but hard-to-capture concepts. The term privacy is a prominent such example. Technologists’ formal definitions (such as differential privacy) do not always encompass the wide range of interests protected by privacy law; similarly, it can be hard to put the holistic definitions used by legal scholars into computationally tractable forms that can be deployed in actual systems.The U.S. Census has sparked debate over its use of differential privacy: a technique that provides strong theoretical guarantees of privacy preservation. Critics question whether or not the definition of privacy reflected in differential privacy accords with the census’s broader goals of privacy preservation. Differential privacy is also sometimes used in Generative AI, though it is context-dependent whether its definition of “privacy” is meaningful for generative-AI applications (Brown et al. 2022).
These communication barriers are real, but, as was clear during GenLaw, the two communities have generally understood that they mean something different by “privacy” and have read each others’ work with a working understanding of these differences in mind.
We also observed, however, that there are also various types of misunderstandings in terminology across disciplines. At GenLaw, some terms were used in different ways because members of one community did not even realize a loosely-defined term in their community was a term of art in the other, or because the meaning they assumed a term had was subtly different from how it was actually used in writing. These conflicting definitions and translation gaps hampered our ability to collaborate on assessing emerging issues. For example, technologists use the term pre-training to refer to an early, general-purpose phase of the model training process, but legal scholars assumed that the term referred to a data preparation stage prior to and independent of training. Similarly, many technologists were not aware of the importance of harms as a specific and consequential concept in law, rather than a general, non-specific notion of unfavorable outcomes. We found our way to common understandings only over the course of our conversations, and often only after many false starts.
Thus, first and foremost, we need to have a shared understanding of baseline concepts in both Generative AI and law. Even when it is not possible to pin down terms with complete precision, it is important to have clarity about which terms are ambiguous or overloaded. We believe that there three significant ways that computer scientists and legal scholars can contribute to creating this shared understanding:
- They can build glossaries of definitions of important terms in machine learning and law, which can serve both as textbooks and as references (Section 3.1). Throughout this piece, glossary terms are hyperlinked to the corresponding glossary entry.
- They can develop well-crafted metaphors to clarify complex concepts across disciplinary boundaries. Even imperfect metaphors are useful, as they can serve to highlight where concepts in Generative AI deviate from intuitions that draw on more traditional examples (Section 3.2).
- They can keep current with the state of the art, and help others to do so. This does not just mean the fundamentals of machine learning and Generative AI (although these are certainly important). It also means being alert to the plethora of ways that generative-AI systems are be deployed in practice, and the commonalities and differences between these systems (Section 3.3).
3.1 Identifying and Defining Terms
The GenLaw organizers and participants have collaborated to create an initial glossary of important terms at the intersection of law and Generative AI. The glossary has two primary goals.
First, it identifies terms of art with technical or multiple meanings. Both law and machine learning commonly give specific, technical definitions to words that also have general, colloquial meanings, such as “attention” or “harm.”See the entry for attention in the glossary for the machine-learning definition.
Sometimes, the redefinition runs the other way, when a technical term has taken on a broader meaning in society at large. To technologists, an algorithm is simply a precise rule for carrying out a procedure, but the term has come to be popularly associated with specific technologies for ranking social media posts based on expected interest. The words “goal” and “objective” are sometimes used interchangeably in English, but objective is a term of art in machine learning, describing a mathematical function used during training. On the other hand, “goal” does not have an agree-upon technical definition. In machine learning, it is typically used colloquially to describe our overarching desire for model behaviors, which cannot be written directly in math.
Second, the glossary provides succinct definitions of the most critical concepts for a non-expert in either law or ML (or both). These definitions are not intended to cover the full complexity of a concept or term from the expert perspective. For example, one could write volumes on privacy.And many have, arguably for thousands of years, which is why we do not attempt a definition of privacy in the glossary.
Our purpose here is simply to show technologists that there is more to privacy than removing personally identifiable information (PII).
The glossary is offered as a starting point, not a finish line. The field is in flux; its terminology will evolve as new technologies and controversies emerge. We will host and update this glossary on the GenLaw website.See https://blog.genlaw.org/glossary.html.
We hope that these definitions will serve as a baseline for more effective communication across disciplines about emerging issues.
3.2 Crafting Useful Metaphors
Well-chosen metaphors can provide a useful mental model for thinking through complex concepts. Metaphors are also widely used in both machine learning and law. For machine-learning practitioners, these metaphors can also be sources of inspiration; the idea of an “artificial neural network” was inspired by the biology of neurons in the human brain (Boers et al. 1993). Analogy and metaphor are central to legal rhetoric (Solove 2001); they provide a rational framework for thinking through the relevant similarities and differences between cases. Metaphors, however, can also simplify and distort; nevertheless, understanding the ways that a metaphor fails to correctly describe a concept can still be instructive in helping to clarify one’s thinking.
At the GenLaw workshop, we discussed instructive metaphors for Generative AI extensively. We give two examples from this discussion here (anthropomorphism and memorization) and describe additional ones here.
Metaphorical anthropomorphism is the personification of a non-human entity; it applies metaphors that compare the entity’s traits to human characteristics, emotions, and behaviors. Machine-learning practitioners commonly use terms that anthropomorphize machine-learning models, for example, saying models “learn,” “respond,” “memorize,” or “hallucinate.” Such metaphors can lead people to conclude that a machine-learning system is completing such actions using the same mechanisms and thought processes that a human would. However, while practitioners may sometimes be inspired in their designs by biological phenomena (e.g., neural networks contain “neurons” that “fire” analogously to those in the human brain), they by and large do not mean that machine-learning models “learn” or “memorize” in exactly the same way that humans complete these actions. Instead, these should be considered terms of art — perhaps inspired by human actions, but grounded in technical definitions that bear little resemblance to human mechanisms.
Some within the GenLaw community have advocated for using different terms to describe these processes — ones that do not elicit such strong comparisons to human behavior (Cooper et al. 2022 and citations therein). However, until we have better terms, understanding when a term is indeed a term of art, and the ways that it is inspired by (but not equivalent to) colloquial understandings, will remain a critical part of any interdisciplinary endeavour.
The terms memorization and regurgitation are very common in the machine-learning literature. Roughly speaking, memorization and regurgitation can be treated interchangeably. They both signify when a machine-learning model encodes details from its training data, such it is capable of generating outputs that closely resemble its training data.Some differentiate “memorization” and “regurgitation,” with regurgitation referring to a model’s ability to output its training data (via generation) and memorization referring to a model containing a perfect copy of its training data (regardless of whether or not it is regurgitated). In practice, the two terms are commonly used interchangeably.
The machine-learning experts who coined these terms created precise definitions that can be translated into quantified metrics; these definitions refer to specific ways to measure the amount of memorization (as it is technically definition) present in a model or its outputs (Carlini, Ippolito, et al. 2023; Ippolito et al. 2023; Anil et al. 2023; Kudugunta et al. 2023).
Unfortunately, the connection to the colloquial meanings of these words can cause confusion. Some outside of the machine-learning community misinterpet “Generative AI memorization” to include functionality that goes beyond what machine-learning practitioners are actually measuring with their precise definitions. One such example is discussions around text-to-image generative models “memorizing” an artist’s style (Chayka 2023). While measuring stylistic similarity is an active area of machine-learning research (Casper et al. 2023), it is not equivalent to memorization in the technical sense of the word.
Other misunderstandings can arise due to the fact that “memorization” and “regurgitation” are imperfect analogies for the underlying processes that machine-learning scientists are measuring. People deliberately memorize; for example, an actor will actively commit a script to memory. In contrast, models do not deliberately memorize; the training examples that end up memorized by a model were not treated any differently during training than the ones that were not memorized.Models are trained using an objective function that rewards the model for producing data that looks similar to the training data during the training phase. However, the goal of model training is not to reproduce the training data, so other techniques, such as regularization or alignment are used to help a model generalize. See Section 3.1 for a discussion on “goal” versus objective in machine learning.
Importantly, humans distinguish between memorizing (which has intentionality) and “remembering” (when a detail is recalled without the intent to memorize it). Generative-AI models have no such distinction. For humans, it is how we personally feel about a thought, action, or vocalization that leads us to call it “memorized.” But for models, which lack intent or feeling, memorization is merely a property assigned to their outputs and weights through technical definitions.
3.3 Understanding Evolving Business Models
Developing glossaries and metaphors can help with staying current about generative-AI technology, but they do not necessarily capture all of the different ways that generative-AI functionality can be put to use in practice. To understand real-world uses, it is also important to have a working understanding of different generative-AI production processes.
There are a variety of evolving business models that have yet to solidify into definitive patterns. In this uncertain environment, myths and folk wisdom can proliferate. Real, current information about business models can be very useful for understanding who is involved in the production, maintenance, and use of different parts of generative-AI systems. Business models prove useful for appreciating the quantity and diversity of actors (not just technologies) that enable generative-AI functionality.
We highlight four patterns from discussion at GenLaw:
Business-to-consumer (B2C) hosted services: Several companies have released direct-to-consumer applications and application programming interfaces (APIs) for producing generations. For example, on the large end of the spectrum, there are chatbots powered by main players in language modeling (e.g., OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Bard, etc.). There are also smaller companies that have released similar tools (e.g., Midjourney’s (Midjourney 2023) or Ideogram’s (Ideogram.AI 2023) text-to-image generation applications). These companies provide a mix of entry points to their systems and models, including user interfaces and APIs, often offered via subscription-based services. Typically, the systems and models developed by these companies are hosted in proprietary services; users can access these services to produce generations but, with some notable exceptions (e.g., fine-tuning APIs), cannot directly alter or interact with the models embedded within them.
Business-to-business (B2B) integration with hosted services: Other business models allow for businesses to integrate generative-AI functionality into their products either via direct partnership/integration or through the use of APIs. For example, ChatGPT functionality is integrated into Microsoft Bing search (via a close partnership between Microsoft and OpenAI). Poe is developed based on a partnership between Anthropic and Quora (Anthropic 2023).We cannot tell from press releases whether Poe also uses the API or if their partnership is of a different nature. Often, we will not be able to tell the nature of the business relationship (unless disclosed publicly) between corporate partners.
In other cases, companies develop generative-AI products by becoming corporate customers of generative-AI-company APIs, as opposed to bespoke business partners. These types of business relationships can either lead to developing new features for existing products, or new products altogether.Products derived from open models and datasets: The business models discussed above depend on proprietary systems, models, and/or datasets. Other options include (or directly rely on) open-source software, models, and datasets, which can be downloaded and put to use (e.g., training new models or fine-tuning existing model checkpoints). Some companies operate distinctly (or partially) with open-source product offerings, such as some versions of Stable Diffusion (Rombach et al. 2022) offered by Stability AI (Stability AI 2023). Others operate in a mixed fashion, for example, individuals can openly download Meta’s Llama-model family (the different models’ weights), but the details of the training data are closed (Touvron et al. 2023).
Companies that operate at specific points in the generative-AI supply chain: Any link (or subset of links) in the supply chain (K. Lee, Cooper, and Grimmelmann 2023) could potentially become a site of specific business engagement with generative-AI technology. For those interested in issues at the intersection of Generative AI and law, it will be important not only to be familiar with the roles and scope of companies in these areas, but also how they interact and inter-operate. We provide three examples below of emerging sites of business engagement.
- Datasets: There may be companies that engage only with the dataset collection and curation aspects of Generative AI (similar to how data brokers function in other industries). Scale AI (Scale AI 2023) is one such company that works on data example annotation for generative-AI training datasets.
- Training diagnostics: There are some companies that handle aspects of data analysis and diagnostics for generative-AI-model training dynamics, like Weights & Biases (Weights & Biases 2023).
- Training and deployment: While it is generally tremendously costly to train and deploy large generative-AI models, advancements in open-source technology and at smaller companies (in both software and hardware) have helped make training custom models more efficient and affordable. There are now several companies that develop solutions for bespoke model training and serving, such as MosaicML (acquired by DataBricks) (MosaicML 2023) and Together AI (Together AI 2023).
We defer additional discussion of open- vs. closed-source software to the glossary. The important point that we want to highlight here is that there are many ways that generative-AI models may be integrated into software systems, and that there are many different types of business models associated with the training and use of these models. This landscape is likely to continue to evolve as new business players enter the field.
4. Pinpointing Unique Aspects of Generative AI
During the roundtable discussions, the legal scholars and practitioners had a recurring question for the machine-learning experts in the room: What’s so special about Generative AI? Clearly, the outputs created by Generative AI today are better than anything we have seen before, but what is the “magic” that makes this the case? It became clear that answering this question, even just in broad strokes, could be useful for providing more precise analysis of the legal issues at play (Section 5). In this section, we summarize three aspects of Generative AI for which it can be productive to consider recent developments in AI as meaningfully novel or different in comparison to past technology. These include (1) the transition from training models to perform narrowly-defined tasks to training them for open-ended ones (Section 4.1), (2) the role of the modern training data (Section 4.2) and generative-AI pipelines (Section 4.3) and (3) how the scaling up of pre-existing techniques has enabled the quality and variety we see in generations today (Section 4.4)
4.1 The Transition to Very Flexible Generative Models
In the past, machine-learning models tended to be trained to perform narrowly defined discriminative tasks, such as labeling an image according to the class of object it depicts (L. Deng 2012; J. Deng et al. 2009) or classifying the sentiment of a sentence as positive or negative (Kiritchenko, Zhu, and Mohammad 2014). Modern generative-AI models change this paradigm in two ways.For a longer summary of this transition from task-specific, discriminative models to generative models in Generative AI, see Parts I.A and I.B of K. Lee, Cooper, and Grimmelmann (2023).
First, there has been a shift from discriminative models, which have simple outputs like a class label (e.g., dog
or cat
for an image classifier), to generative models, which output complex content, such as entire images or paragraphs of text (e.g., given the input of cat
, outputting a novel image of a cat, sampling from the near-infinite space of reasonable cat images it could create) (K. Lee, Cooper, and Grimmelmann 2023, pt. I.A).As we discuss at the top of Section 5, today’s generative-AI models are used to solve both discriminative tasks (e.g., sentiment classification) and generative tasks that result in expressive content (e.g., producing paragraphs of text).
Second, there has been a shift toward using single, general-purpose models to solve many different tasks, rather than employing a model customized to each task we would like to perform. Even a few years ago, it was common to take a base model and fine-tune it once per each task domain. This would, for example, result in one model that specializes in sentiment classification, another which specializes in automatic summarization, another in part-of-speech tagging, and so on. Many state-of-the-art systems today handle a wide variety of tasks using a single model.It is rumored that the models underlying ChatGPT are actually an ensemble of on the order of 10 expert models, in which different types of requests get routed to specific experts. Nevertheless, if true, these experts are still more flexible than task-specific models from the past.
These models are able to do all sorts of things (Ganguli et al. 2022). In Section 2, we discuss how Generative AI is a generative technology, in the sense of Jonathan Zittrain’s theory of generativity (Zittrain 2008). The scaling up of Generative AI (Section 4.4) has facilitated generativity across a wide range of applications and modalities, not just in the text-to-text and text-to-image applications that are most commonly reported on in the news. As a non-exhaustive list, this scaling has enabled huge breakthroughs in image captioning (Li et al. 2023), music generation (Agostinelli et al. 2023), speech generation (Le et al. 2023) and transcription (Radford et al. 2022), tools for lowering the barrier to learning to program (Yilmaz and Karaoglan Yilmaz 2023), and research questions in the physical sciences (including on protein folding, drug design, and materials science) (Corso et al. 2023).
4.2 Developments in the Training Pipeline: Pre-Training and Fine-Tuning
Machine-learning models are trained on a training dataset of examples of the task that the model is supposed to be able to accomplish. The nature of these training datasets has changed drastically over the years, leading to the capabilities seen in generative-AI systems today (K. Lee et al. 2023). In particular, we have seen a shift toward multi-stage training pipelines, in which models are first trained on large (but possibly lower-quality) datasets to create a base model and then progressively trained on smaller, more-curated datasets that better align with the model creators’ goals.
Typically, a base model is constructed by training on an enormous, often web-scraped dataset, which instills a “base” of knowledge about the world within the model. This step is called pre-training because it is the training that occurs before the final training of the model. As described by Callison-Burch (2023) in his testimony to the U.S. House of Representatives Judiciary Committee, during pre-training, models learn underlying patterns from their input data. When pre-training on large-scale data that has a wide variety of information content, base models capture abundant, “general-knowledge” information. For example, large language models (LLMs) learn syntax and semantics, facts (and fictions) about the world, and opinions, which can be used to produce summaries and perform limited reasoning tasks; image-generation models learn to produce different shapes and objects, which can be composed together in coherent scenes.Though, notably, not a collage! See the metaphors for a discussion of why the metaphor (Section 3.2) of a collage for generative-AI outputs can be misleading.
Pre-training gives these models the unprecedented flexibility to generate all sorts of outputs through synthesizing information in the input training data.
This flexibility allows base models to be re-used in a variety of ways. For example, to adapt it to more specific tasks and domains, one can further train (i.e., fine-tune) the base model on domain-specific data (e.g., legal texts and case documents) to specialize the model’s behavior (e.g., performing better at legal document summarization). Alternatively, one might fine-tune the base model to understand a dialog-like format (as ChatGPT has done (OpenAI 2022)). Pre-training is very expensive, which means it only happens once (or a small handful of times),It can cost millions of dollars to train a large language model (BigScience 2022; Bekman 2022; K. Lee, Cooper, and Grimmelmann 2023).
but subsequent fine-tuning tends to be much faster (due to the smaller size of the datasets involved), so it can more tractably occur many times.
Despite our discussion above of what is unique about pre-training and fine-tuning, it is worth emphasizing that this division is not well-defined. It is predominantly an artifact of choices made regarding training, rather than an essential aspect of the training process. Both pre-training and fine-tuning are just training (though perhaps configured differently). The reasons we differentiate between these two stages have to do with how large-scale model training is done in practice; the distinction is only meaningful because researchers frequently choose to divide stages of training along these lines (in turn, ascribing meaning to this division). For example, one actor in the supply chain may release a pre-trained model, a different actor may fine-tune that model and release it as well, and a third actor may fine-tune the already fine-tuned model (K. Lee, Cooper, and Grimmelmann 2023).Is the third model fine-tuned from a pre-trained model, or a fine-tuned model? This is all just semantics.
Additionally, researchers frame concrete research questions specifically for pre-training or fine-tuning (Longpre et al. 2023 e.g.).
During the GenLaw roundtable discussions, it became apparent that legal experts shared some misconceptions about the roles of pre-training and fine-tuning, and that it is therefore important for machine-learning researchers to emphasize the influence of pre-training on generative-AI model capabilities. Additionally, we re-emphasize our note from the top of Section 3 that pre-training is training, and not a data preparation stage.
4.3 Generative-AI Systems and the Supply Chain
There are numerous decisions and intervention points throughout the system, which extend to elements beyond choices in pre-training and fine-tuning. Since many actors can be involved in the generative-AI supply chain, and decisions made in one part of the supply chain can impact other parts of the supply chain, it can be useful to identify each intervention and decision point and think about them in concert. We defer to K. Lee, Cooper, and Grimmelmann (2023, pt. I.C) for detailed discussion of the supply chain and the numerous stages, actors, and design choices that it involves.
These choices affect the quality of the model, both in terms of its characteristics/capabilities and the model’s consequent effectiveness. For example, consider the intervention point at which the training data is chosen. Creating a training dataset requires answering questions like: (1) which data examples should be included in the training dataset; (2) where will the data be stored (e.g., on whose servers); (3) for how long will the data be retained; (4) where will the resulting trained model be deployed, etc?For more on choices in training data, see Chapter 1: The Devil is in the Training data from K. Lee et al. (2023)
Choices made about where the model will be deployed can affect what training data can be used. A model training on private user data has a very different privacy-risk profile if such a model were never to leave the user’s personal device, compared to if it is be shared across many users’ devices.
Not all design choices are about models and how they are trained. Models are embedded within overarching systems, which consist of many component pieces that both individually and together reflect the outcomes of relevant sociotechnical design decisions (Cooper, Levy, and De Sa 2021; Cooper and Vidan 2022; OpenAI 2023b; Brundage et al. 2022). There are numerous other intervention points throughout the supply chain, which involve systems-level choices (K. Lee, Cooper, and Grimmelmann 2023). Such intervention points include prompt input filters, generation output filters, rate limiting (e.g., how many prompts a user can supply in a given time window to a system), access controls, terms of use, use-case policies for APIs,Google, Anthropic, OpenAI, and Cohere all have such policies.
user interface (UI) and experience (UX) design (e.g., to guard against over-reliance on generative-AI systems), and so on (OpenAI 2023b; GitHub 2023; Brundage et al. 2022). Each of these involve design decisions that can have their own legal implications.
4.4 The Massive Scale of Generative-AI Models
Ultimately, the capacity to facilitate “magical,” flexible, open-ended functionality with Generative AI (Section 4.1) comes from the massive scale at which generative-AI models are trained (Smith et al. 2023). State-of-the-art models today are an order of magnitude larger and trained on significantly more data than the biggest models from five years ago.
Techniques for scaling up models have demonstrated a uniquely important role in unlocking generative-AI capabilities. This includes research into more efficient neural architectures and better machine-learning systems for handling model training and inference at scale (Ratner et al. 2019). For example, many experts have studied methods for collecting and curating massive, web-scraped datasets (K. Lee et al. 2023 e.g.), as well as the “emergent behaviors” of models trained at such such large scales (Wei et al. 2022 e.g.).
In spite of these changes, it is worth noting that many techniques used in Generative AI today are not new. Language models, for example, have existed since at least the 1980s (Rosenfeld 2000). The difference is that, in recent years, we have figured out how to scale these techniques tremendously (e.g., modern language models use context windows of thousands of input tokens, compared to the 5 to 10 input tokens used by language models in the early 2000s). We defer to machine-learning experts to provide more specific details on the methodologies and outcomes of scaling.Part II.B of K. Lee, Cooper, and Grimmelmann (2023) has useful discussion and citations on this topic.
Finally, one of the implications of scale is that machine-learning practitioners are training fewer state-of-the art models today than were being trained in the past. When models were small relative to available computing resources, it was common to re-train a machine-learning system several times, changing hyperparameters or other configuration details to find the best-quality model. Today’s model scale means the cost of training just one state-of-the-art model can be hundreds of thousands or even millions of dollars (BigScience 2022; Bekman 2022; K. Lee, Cooper, and Grimmelmann 2023), This further incentivizes the push toward general-purpose models described in Section 4.1.
5. A Preliminary Taxonomy of Legal Issues
One significant outcome of the GenLaw discussions was progress toward a taxonomy of the legal issues that Generative AI raises. We say “progress toward” because the initial analysis presented here is very much an interim contribution as part of an ongoing project. The GenLaw workshop was explicitly scoped to privacy and intellectual property (IP) issues, so this analysis should be considered non-exhaustive, and the omission of other topics is not a judgment that they are unimportant. Further, we note that not all capabilities, consequences, risks, and harms of Generative AI are legal in nature, so this taxonomy is not a complete guide to generative-AI policy. Other reports have made significant attempts to catalog such concerns (Fergusson et al. 2023 e.g.). We instead focus on highlighting the ways in which specifically legal issues may arise.
We begin with an important high-level point: Generative AI inherits essentially all of the issues of AI/ML technology more generally. This is so because Generative AI can be used to perform a large and increasing number of tasks for which these other types of ML systems have been used. For example, instead of using a purpose-built sentiment-analysis model, one might simply prompt an LLM with labeled examples of text and ask it to classify text of interest; one could use a trained LLM to answer questions with “yes” or “no” answers (i.e., to perform classification). The resulting classifications may or may not be as reliable as ones from a purpose-built model, but insofar as one is using a machine-learning model in both cases, any legal issues raised by the purpose-built model are also present with the LLM.
Further, any crime or tort that involves communication could potentially be conducted using a generative-AI system. One could use an LLM to write the text used for fraud, blackmail, defamation, or spam, or use an image-generation system to produce deepfakes, obscene content, or false advertisements. Almost any speech-related legal issue is likely to arise in some fashion in connection with Generative AI.
With these broader observations in mind, in the remainder of this section we discuss four legal areas that will need to deal with Generative AI: intention in torts and criminal law (Section 5.1), privacy (Section 5.2), misinformation and disinformation (Section 5.3), and intellectual property (Section 5.4).
5.1 Intent
Numerous aspects of law turn on an actor’s intention. For example, in criminal law, the defendant’s “criminal intent” (mens rea), not just the act and its resulting harms, is often an element of a crime (Institute 2023). Intent is not a universal requirement. Some crimes and torts are “strict liability” (e.g., a manufacturer is liable for physical harm caused by a defective product regardless of whether they intended that harm, which they almost always did not). But where intent is required, a defendant’s lack of wrongful intent means they cannot be convicted or held liable. For example, fraud is an intentional tort and crime. A defendant who speaks falsely but honestly to the best of their knowledge does not commit fraud.
Generative AI will force us to rethink the role of intent in the law. In contrast to most prior types of ML systems,Highly autonomous vehicles are another such example, but currently remain in relatively limited use (Cooper and Levy 2022).
Generative AI can cause harms that are similar to those brought about by human actors but without human intention. For example, an LLM might emit false and derogatory claims about a third party – claims that would constitute defamation if they had been made by a human (Volokh 2023).
There is unlikely to be a simple across-the-board answer as to how the “intent” of a generative-AI system should be measured, in part because the legal system uses intention in so many ways and so many places. Consider an example from a GenLaw discussion. One participant noted that it may be useful to move to a respondeat superior model — a legal doctrine (often used in tort law) that ascribes the legal responsibility of an employee to their employer (if the tort or other wrongful conduct was conducted within the scope of employment). For this kind of liability model, one could treat the generative-AI system as the “employee,” and then ascribe responsibility for harm to the “employer” – i.e., the user. Such an approach appears to sidestep the need to deal with intent; respondeat superior is strict liability as to the employer. However, another participant noted that in the usual application of respondeat superior, there is still an embedded notion of intention. That is because the employee’s intentions are still relevant in determining whether a tort has been committed at all; only if there has can liability then also be placed on the employer. This is not to say that respondeat superior has no role to play, only that it does not avoid difficult questions of intent.
Another line of discussion at GenLaw concerned whether these difficulties might lead to a greater focus on the human recipients of generative-AI outputs. Some authors, for example, have argued that the rise of AI systems creates a world of “intentionless free speech” in which communications should be assessed purely based on their utility to the listener (Collins and Skover 2018). Such a framework helps establish clearly a First Amendment basis for a right to use Generative AI. But it also raises difficult questions about how to protect users from Generative AI in cases of false or harmful outputs. These issues will cut across many legal areas.
5.2 Privacy
As discussed above (Section 3), “privacy” is a notoriously difficult term to define. Different disciplines rely on different definitions that simplify the concept in different ways, which can make it very difficult to communicate about privacy across fields. In particular, computer science and law are known to operate using very different notions of privacy.Even subfields vary greatly. Cryptographers use different conceptions of “privacy” than ML researchers; decisional privacy in constitutional law is very different than data privacy in technology law.
For example, in many subfields of computer science, it is common to employ definitions of privacy based on mathematical formalisms that are computationally tractable (often differential privacy). In contrast, privacy in the law is often defined contextually, based on social norms and reasonable expectations. It is typically necessary to first identify which norms are at play in a given context, after which it is then possible to determine if those norms have been violated (and what to do about it). Such definitions of privacy are fundamentally nuanced; they resist quantification. The tensions between legal and computer-science approaches to privacy are a source of communication challenges. At GenLaw, one of the legal experts provided a useful intuition for this tension: Computer scientists often want to be able to quantify policy, including policy for handling privacy concerns; in the law, the mere desire to quantify complex concepts like privacy can itself be the source of significant problems.
Despite these difficulties, it is still important to be able to reason about privacy (and when it is violated) in both computing and law. This is not a new problem: it has been a source of significant practical and research challenges for essentially as long as computers have been in use. As long as personally identifiable information (PII) like addresses and phone numbers has been stored on computers, there have been risks that such information could be seen or leveraged by others who otherwise would not have had access. Since the introduction of machine-learning methods to software systems, it has become possible to predict user behavior and personal preferences (sometimes with high fidelity); in turn, having access to such arguably private information has opened up the possibility to develop software that relies on this information to guide or manipulate user behavior. We understand how difficult these privacy challenges are only because of decades of research in law and computer science. Legal scholars have articulated the real-world harms that people can suffer from through misuse of “private” information; computer scientists have demonstrated real-world attack vectors through which “private” information can be leaked.
Generative AI is poised to make these privacy challenges even harder. As noted above (Section 4.4), in contrast to prior machine-learning models, generative-AI models are typically trained on large-scale web-scraped datasets. These datasets can contain all sorts of private information (e.g., PII) (Brown et al. 2022), which in turn can be memorized and then leaked in generations (Carlini, Hayes, et al. 2023; Somepalli et al. 2022). A traditional search engine only locates individual data points, but a generative-AI model could link together information in novel ways that reveal sensitive information about individuals. Adversarially designed prompts can extract other sensitive information, such as internal instructions used within chatbots (Edwards 2023).This is potentially also a trade-secret issue (Section 5.4), depending on the nature of the information leaked.
5.3 Misinformation and Disinformation
Generative AI can be used to produce plausible-seeming but false content at scale. As such, it may be a significant source of misinformation, and amplify the speech of actors engaged in disinformation campaigns.By misinformation, we mean material that is false or misleading, regardless of the intent behind it. In contrast, disinformation consists of deliberately false or misleading material, often with the purpose of manipulating human behavior.
These capabilities will present issues in any area of law that prohibits false speech — from lies about people to lies about products to lies about elections. They will also challenge the assumptions of areas of law that tolerate false speech out of the belief that such speech will be comparatively rare and easy to counter. Taken together, these include a wide variety of legal topics, including defamation, national security, impersonation, bad (or, in regulated contexts, illegal) advice, over-reliance (Brundage et al. 2022), amplification, spear-phishing, spam, elections, consumer-protection law (e.g., addiction, deception, false advertising, products liability), deepfakes, and much else.
From a misinformation perspective, generative-AI models are very sensitive to their training data, which may itself include misinformation or disinformation. For example, in August of this year, it was discovered that a book about mushroom foraging, which was produced with the assistance of an LLM, contained misinformation about which mushrooms are poisonous (likely due to inaccurate information learned from the data on which the LLM was trained) (Cole 2023). Similarly, it appears that LLMs are subject to sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs, and sandbagging, where models are more likely to endorse common misconceptions when their user appears to be less educated” (Bowman 2023; Perez et al. 2022).
From a disinformation perspective, models can be used to deliberately generate persuasive but false content scraped from the Internet. But they can also be deliberately manipulated through adversarially selected fine-tuning data or through alignment. These processes could be used to skew models to deliberately produce misleading content.
One point raised by legal scholars at GenLaw is that generated disinformation about individuals (e.g., deepfakes) will potentially contribute to new types of defamation-related harms. Other experts, particularly those with a privacy background in computing, questioned whether such harms could also be classified as intimate privacy violations (Citron and Solove 2022). In many respects, the harms caused by sufficiently convincing forgeries are very similar to those caused by truthful revelations (Zipursky and Goldberg 2023). Indeed, this is something that Generative AI seems well-positioned to enable: large-scale, inexpensive production of believable deepfakes that use a person’s likeness, and depict fake intimate acts or convey fake intimate information (Maiberg 2023; Rubin 2023).
In response, lawyers with expertise in defamation said they believed that this would likely not constitute a cognizable privacy harm under the law, although it would still be actionable as defamation or false light. In turn, this response raised questions about whether Generative AI could create new types of harms that blur current conceptions of disinformation and privacy harms.
5.4 Intellectual Property
Naturally, given the recent spate of lawsuits about copyright and Generative AI (as well as the stated thematic focus for the first GenLaw Workshop), IP was a frequent topic for emerging legal issues. It has also been one of the first generative-AI subjects explored in detail by scholars, (K. Lee, Cooper, and Grimmelmann 2023; Sag 2023; Samuelson 2023; Callison-Burch 2023; Vyas, Kakade, and Barak 2023 e.g.), and we leave discussion of the doctrinal details to their work. Instead, we focus here on a few high-level observations about current and impending IP issues — many of which also have applications beyond IP.
Volition: Human volition plays an important and subtle role in defining IP infringement. For example, copyright infringement normally requires that a human intentionally made a copy of a protected work, but not that the human was consciously aware that they were infringing. Generative-AI systems may occasionally produce outputs that look like duplicates of the training data. Some participants at GenLaw were concerned that it may be easy to deflect the role of human-made design choices (Section 4.2) by making such choices seem “internal” to the system (when, in fact, such choices are typically not foregone conclusions or strict technical requirements (Cooper et al. 2022)). Since “purely internal” copies tend to be fair use, such deflection could serve as a copyright liability shield. Legal experts will need to contend with this possibility in their analysis of generative-AI systems.For a more general treatment of “scapegoating the system,” see Cooper et al. (2022). Its opposite, in which a human is held wholly responsible for a harm caused by a technical system, is the “moral crumple zone,” described in Elish (2019).
Additionally, since generative-AI systems typically take in input from human users, it is also possible for a user to intentionally cause a model to output potentially infringing content.Market externalities: There are many concerns that Generative AI will lead to mass labor displacement, significant market changes, and the concentration of market power. These issues extend beyond IP, but necessarily invoke related questions of ownership. These are matters for labor law, international trade law, and other areas of law. But they are also IP issues, because doctrines such as fair use invite courts to consider such societal effects in weighing the propriety of particular copying.
Trade secrecy: Fine-tuning on proprietary data is poised to become a potentially useful pattern in the adoption of generative-AI technology. However, existing generative-AI models are known to memorize their training data (Carlini, Hayes, et al. 2023; Carlini, Ippolito, et al. 2023). In turn, this raises the possibility that an adversarial user could extract proprietary information in training data, thereby presenting issues related to trade secrecy (Edwards 2023).
Scraping: Similarly, the legality of scraping training data is intextricable from the IP treatment of Generative AI. Generative-AI companies both rely on scraped data as an input and take measures (both technical and legal) to prevent outputs from their systems from being used as inputs to other systems without permission.
Authorship: As alluded to above, IP law may need to reconsider authorship eligibility in light of Generative AI. Computer authorship is not a new topic of analysis in the law (Grimmelmann 2016; Samuelson 1985 e.g.), but Generative AI is likely to present new variations on old themes. For example, purely computer-generated works are not currently covered by copyright. However, some argue that this situation not sustainable (T. B. Lee 2023 e.g.). Where AI-generated works have significant value, there will be strong economic pressures on courts to gives users copyright in those works.
Patent: Given that generative-AI modeling techniques also have applications in the physical sciences (e.g., in drug design, see Section 4), it seems likely that there will be implications for patent law. For example, U.S. patent law requires a human inventor as a condition of patent eligibility. Just as copyright’s human authorship requirement has been challenged (but so far upheld (Thaler v. Perlmutter 2023 e.g.)), similar challenges may arise with respect to patents.
Idea-expression dichotomy: Generative AI seems to further blur the already often-murky line between idea and expression in copyright law. For example, one could attempt to analogize the prompt to an idea and the associated generation to its expression, but this presents several problems. For one thing, there seems to be a bit of an inversion from the typical pattern: it suggests that the AI, rather than the human, is responsible for the creative expression (which is not currently protectable by copyright law). For another, there may be sufficient creativity for copyrightability of the prompt itself, even if it is ultimately (by the prior analogy) responsible for the idea in the resulting generation. Lastly, there is a tenable argument that the human prompter and generative-AI system are acting in concert to produce the resulting generation (K. Lee, Cooper, and Grimmelmann 2023), and that the way that an idea is expressed in a prompt makes it intextricably indivisible from the resulting expressive generation. In short, as others have noted (Lemley 2023), Generative AI seems to turn the idea-expression dichotomy “upside down.”
6. Toward a Long-Term Research Agenda
Participants in the GenLaw workshop and roundtable identified several important and promising future research directions. Notably, these topics have several elements in common. First, each showcases how technical design choices play a crucial role in legal research questions. Many of the architectures and applications of generative-AI systems are genuinely novel, compared to previous technologies that the legal system has had to contend with. Understanding the legal issues that they raise will require close engagement with the technical details.
Second, just as design choices can inspire questions for the legal scholars, it is also important to consider how legal scholarship can influence the choices that generative-AI researchers make when designing systems (Sections 4.2, 4.3). Understanding not just the current legal framework, but also how that framework may evolve, provides important guidance for system designers about which technical changes are and are not legally significant. In addition, a clear sense of the legal possibility space can help direct generative-AI research toward novel designs, algorithms, attacks, and characterizations that have beneficial characteristics.
The list that follows is just a sample of emerging research areas at the intersection of Generative AI and law. It gives a flavor of how these two disciplines can concretely inform each other. We believe it is the starting point of a rich, long-term research agenda with the potential to influence and inform public policy, education, and industrial best practices.
6.1 Centralization and Decentralization
One crucial question about the future of Generative AI concerns the relative degree of centralization versus decentralization. Consider, as an example, the controversies over the use of closed-licensed data (within web-scraped datasets) as training data for generative-AI models (e.g., LAION datasets (Beaumont 2022; Schuhmann et al. 2022), The Pile (Gao et al. 2020), Books3 (Knibbs 2023), etc.), especially if training involves removing copyright management information. While such datasets are often released with open licenses (e.g., the LAION organization has released their datasets under the MIT license, which allows for use and copying), this does not guarantee that the associated and constituent data examples in those datasets can be licensed for use in this way (K. Lee, Cooper, and Grimmelmann 2023). Many examples within datasets have closed licenses.As K. Lee, Cooper, and Grimmelmann (2023) notes, this is particularly complex for datasets used to train multimodal models, like text-to-image models; the examples to train text-to-image models are image-caption pairs, where for each pair the image and the text caption could be subject to their own copyrights (and even hypothetically could be subject to a copyright as a compilation).
Legal scholars have made arguments that run the gamut of possible fair-use outcomes for the use of these datasets in Generative AI (Samuelson 2023; Lemley 2023; K. Lee, Cooper, and Grimmelmann 2023; Sobel 2021; Sag 2023; Henderson et al. 2023 e.g.). Nevertheless, it remains to be seen whether courts will rule that the use of such datasets constitutes fair use.
In the interim, an alternative path is to invest in producing open, permissively licensed datasets that avoid the alleged legal issues of using web-scraped data. This means not only releasing datasets with such licenses, but ensuring that the underlying data examples in the dataset have clear provenance (K. Lee et al. 2023) and are openly licensed. This is a rich problem domain. It involves significant technical innovation, both in techniques for collecting such datasets at scale while respecting licensing conditions and also in training models that make best use of the limited materials available in them. (Current attempts to train models on such openly licensed datasets have yielded mixed results in terms of generation quality (Gokaslan et al. 2023).) It also requires substantial legal innovation, including the development of appropriate licenses that function as intended across jurisdictions, and organizational innovation in creating authorities to steward such datasets.
These same issues and tensions recur at every stage in the development of generative-AI systems. This is partly a technical question; current methods require centralized pre-training at scale based on datasets typically gathered from highly decentralized creators. Whether either or both of these constraints will change in the future is an important and open question. Improvements in training algorithms may reduce the investment required to pre-train a powerful base model, opening it up to greater decentralization. At the same time, improvements in synthetic data may enable well-resourced actors to generate their own training data, partially centralizing the data-collection step.
Centralization versus decentralization is also partly a business question (Section 3.3). There is currently substantial investment both in large centralized companies that are developing large base models, and in a large ecosystem of smaller entities developing fine-tuned models or smaller special-purpose models. The relationships among, and relative balance between, these different entities is likely to evolve rapidly in the coming years.
And, most significantly, centralization versus decentralization is a fundamentally legal question. As noted above, licensing law may inform who can use a dataset. Competition and antitrust law are likely to play a major role going forward. Every important potential bottleneck in Generative AI – from copyright ownership to datasets to compute to models and beyond – will be the focus of close scrutiny. These novel markets will require technical, economic, and legal analysis to determine the most appropriate competition policy. In addition to antitrust enforcement, possible policies include government subsidies, open-access requirements, “public option” generative-AI infrastructure, export restrictions, and structural separation. These questions cannot be discussed intelligently without contributions from both technical and legal scholars.
6.2 Rules, Standards, Reasonableness, and Best Practices
Since the technological capabilities of today’s generative-AI systems are so new, it is unclear what duties the creators and users of these systems should be. This overarching problem is not unprecedented for either law or computing. In some cases, these duties take bright-line rule-like forms; HIPAA strictly regulates which kinds of data are treated as personally identifying and subject to stringent security standards. In other cases, these are more flexible standards that require greater exercises of discretion. In some cases, the legal system defaults to a general standard of reasonableness: did a person behave reasonably when developing or using a system? And sometimes, even when there is no law on point, practitioners have developed best practices that they follow to do their jobs effectively. We anticipate that the legal system will need to articulate these duties for generative-AI creators and users, and to determine which modalities of rules and standards to employ.
These expectations have always been technology-specific and necessarily change over time as technology evolves. For example, under the Uniform Trade Secrets Act (UTSA) information must be “the subject of efforts that are reasonable under the circumstances to maintain its secrecy.” The threshold for what efforts are considered “reasonable” has changed over time in response to developments in information security. Similarly, in cybersecurity, the FTC monitors the state of the art. As state-of-the-art practices improve, the FTC has been willing to argue that companies engage in unfair and deceptive trade practices by failing to implement widely used cost-effective measures. Further, the definition of reasonableness is contextual; what is considered reasonable for a large company (e.g., in terms of system development practices) is typically different than what is considered reasonable for smaller actors.
In short, legal scholars urgently need to study – and technical scholars urgently need to explain – which generative-AI safety and security measures are recognized as efficient and effective. Nor will a one-time exchange suffice. The legal system must be attuned to the dynamism of generative-AI development. What is currently an effective countermeasure against extracting memorized examples from models may fail completely in the face of a newly developed techniques. But conversely, new techniques of training and alignment may be developed that are so clearly effective that it is appropriate to expect future generative-AI creators to employ them. Indeed, the legal system must be attuned to this dynamism itself, to the fact that our current understanding of the frontier between the possible and the impossible in Generative AI is provisional and constantly being refined. There is work here for many researchers from both communities.
Once technology begins to stabilize, it becomes easier to define concrete standards (e.g., safety standards). Accordingly, by definition, compliance with such standards is sufficient for meeting the bar of reasonableness. Until there is some stability, when harms occur, there will necessarily be some flexibility; there will be some deference to system builder’s self-assessments of whether their design choices reflected reasonable best efforts to construct safe systems. In turn, today’s best efforts will guide future standards-setting and determination of best practices.
Both the legal and machine-learning research communities should face this reality head-on; they should take hold of the opportunity to actively engage in research and public policy regarding today’s generative-AI systems, such that they can help shape the development of future standards. This work will require understanding the complexity and particulars of different generative-AI technologies; effective standards will differ by model modality and other system capabilities (e.g., generative-AI systems that interact with APIs to bring in additional content, such as plugins (OpenAI 2023a)).
To meet this challenge, one clear need is useful metrics to effectively evaluate the behaviors of generative-AI systems. As we discuss below (Section 6.4), effective ways to evaluate generative-AI systems currently remain elusive. System capabilities and harms are not readily quantifiable; designing useful metrics will be an important, related area of research for Generative AI and law.
6.3 Notice and Takedown \neq Machine Unlearning
Notice and takedown is well-known in both software and legal communities because of search engines and Section 512 of the US Copyright Act. Enabling notice-and-takedown functionality has a variety of sociotechnical challenges, which are notably even more complicated for Generative AI.
For generative-AI models, there is no straightforward analogue for simplyNotice and take down can be technically challenging (but nevertheless feasible) for large-scale software systems that involve distributed databases that work in concert.
removing a piece of data from a database (as might be the case for removing a file from a video-hosting platform). Once a model has been trained, the impact of each data example in the training data is dispersed throughout the model and cannot be easily traced. In order to remove an example from a trained model, one must either track down all the places where the example has an impact and identify a way to negate its influence, or re-train the entire model. This is challenging because “impact” is not well defined, and neither is “removal.”Alternatively, we must first define what it means to “take down” a training example from a generative-AI model, which itself is a ill-defined problem.
There are entire subfields of machine learning devoted to problems like these. For example, the subfield of “machine unlearning” (Bourtoule et al. 2021; Cao and Yang 2015) attempts to define the desired goals for removing an example and to design algorithms that satisfy these goals.This subfield is also heavily motivated by the The Right to be Forgotten clause in GDPR.
Another line of work attempts to quantify data-example attribution and influence; it seeks to define “attribution” and then attribute generations from a model to specific data examples in the training data. Both machine unlearning and attribution are very young fields, and their strategies are (for the most part) not yet computationally feasible to implement in practice for deployed generative-AI systems. Machine unlearning and attribution are of significant interest for ML researchers and practitioners. There has been intense (and growing) investment in this area. How these fields will develop remains to be seen.
6.4 Evaluation Metrics
Evaluation is far from a new topic in machine learning (or computing more generally). Nevertheless, there is a clear need for useful metric definitions for Generative AI. We discuss some issues of interest below.
It is well-known that the force of legal rules depends on how they are implemented and interpreted. Many decisions are made on a case-by-case basis, taking into account specific facts and context. In contrast to this approach, machine-learning practitioners evaluate systems at scale. It is common practice to define metrics that can be applied directly to every situation (or at least a large majority of them). These metrics necessarily use a pre-specified sets of features that may leave out considerations that may be important to forming a decision that appropriately accounts for broader context.
This is hardly a new observation; it has had significant influence machine-learning subfields, such as algorithmic fairness. More generally, the challenges of operationalizing or concretizing societal concepts into math has been discussed at length in prior works (Jacobs and Wallach 2021; Friedman and Nissenbaum 1996; Cooper and Abrams 2021; Cooper, Levy, and De Sa 2021 e.g.), and developing reasonable definitions for legal concepts is an active and evolving area of research (Cooper, Frankle, and De Sa 2022; Cooper et al. 2023; Scheffler, Tromer, and Varia 2022 e.g.).
Nevertheless, it is worth emphasizing that these observations hold true for Generative AI.There are also specific complexities for Generative AI that have not been so readily apparent in prior work in machine learning (e.g., in other areas of machine learning, there are accepted (though imperfect) notions of “ground truth” labels, which are absent in Generative AI).
For example, as we discussed in Section 3.2, researchers create different, precise, definitions of memorization for different purposes. The definition of memorization for an image-generation model will differ greatly from a code-generation model or a text-generation model. Similarly, since “removing” the impact of a training example from a trained model is an ill-defined problem, researchers may develop different metrics for quantifying whether or not training data points are successfully removed.
Many metrics are defined in terms of technical capabilities. For example, the evaluation of the amount of memorized training data in a model depends on the ability to extract and discover the memorized training data (Carlini, Ippolito, et al. 2023; Ippolito et al. 2023). As the techniques for data extraction improve, the evaluation of the model will change. Additionally, the way models are used alters the way that they should be evaluated. For example, a machine-unlearning method may be applied to a model to remove the effect of a specific individual’s data. However, that model may later be fine-tuned on additional data that is very similar to the removed individual’s data. This may cause the individual’s data to effectively “resurface.”This is presently speculation, though not all-together baseless. Recent research shows that the effects of alignment methods may be negated through the course of fine-tuning (Qi et al. 2023).
The way that the supply chain is constructed for a particular generative-AI model may alter the way that the systems (in which that model is embedded) can and should be evaluated. For example, the analysis may change depending on whether a model is aligned or not.Further, alignment is not binary. There are different possible degrees of alignment.
In turn, as another example, it is possible that some actors may not have the relevant information to perform necessary evaluations; some actors may not even know if a particular model is aligned or not.
7. Conclusion and the Future of GenLaw
In this report, we discussed the main topics broached at the first GenLaw workshop: the importance of developing a shared knowledge base for improved communication (Section 3), the unique aspects of Generative AI that present novel challenges and opportunities (Section 4), a taxonomy of emerging legal issues (Section 5), and associated open research questions at the intersection of Generative AI and law (Section 6).
As is clear from the diversity of issues discussed within these topics, it is difficult to pithily sum up the main takeaways of GenLaw. Nevertheless, we will attempt to do so, and will sketch out our hopes for the future of GenLaw as an organization.
Expanding beyond copyright concerns: Perhaps the fairest overarching assessment from this report is that GenLaw’s participants believe that copyright concerns just scratch the surface of potential issues. Put differently, a common belief was that the legal questions currently under consideration in U.S. courts only touch on a small area of the potential legal issues that Generative AI will raise. In part, this is because the underlying technology is continuing the evolve and be adopted at such a rapid pace. As a result, the research agenda that we suggest here (Section 6) will necessarily evolve over time.
Shaking off disciplinary boundaries: There remain major open questions about how best to evaluate the behavior of generative-AI systems. Answering these questions necessarily will involve machine-learning-technical knowledge, but they will also involve much more. This report illustrates just how central legal considerations are to effective evaluation. But we do not intend to suggest that accounting for these considerations will on its own be sufficient. As we continue to understand how Generative AI will transform our interactions, expectations, economy, education, etc., we will need to continue to shake off disciplinary boundaries in order to design useful and comprehensive evaluation methodologies.
Evolving resources and engagement: Given the generative and evolving nature of generative-AI systems and products, GenLaw’s work to help educate and facilitate engagement between technologists, legal experts, policymakers, and the general public will necessarily require ongoing effort. The resources that we develop (such as those in this report) will need to be frequently updated to keep pace with technological changes.
In response to these takeaways, we are growing GenLaw into a nonprofitGenLaw is in the process of obtaining 501(c)(3) status.
home for research, education, and interdisciplinary discussion. Thus far, we have written pieces that make complex and specialized knowledge about law and Generative AI accessible to both a general audience (K. Lee et al. 2023) and to subject-matter experts (K. Lee, Cooper, and Grimmelmann 2023). We have worked to provide additional resources, such as recordings of our events (GenLaw 2023a; James Grimmelmann 2023), collections of external resources (GenLaw 2023b), and the initial glossary on the GenLaw website. For our first in-person workshop, we engaged with participants that have various expertise in Generative AI, law, policy, and other computer-science disciplines across 25 different institutions. We are excited to continue engaging with experts across industry, academia, and government. While our first event and materials have had a U.S.-based orientation, we are actively focusing on expanding our engagement globally. We will be maintaining the GenLaw websitehttps://genlaw.org
with the most up-to-date information about future events and resources.