Article

California’s AI Transparency Laws: How SB 942 and AB 2013 Will Reshape AI Data Practices

Setting the stage for AI transparency

If 2023 and 2024 were the teaser trailers for U.S. AI regulation, 2025 is the blockbuster release. And California (never shy about a starring role in tech policy) has premiered two headline acts: the California AI Transparency Act (SB 942) and Assembly Bill 2013 on Generative AI Training Data Transparency.

Both laws take effect January 1, 2026, and together they create a one-two punch of accountability. SB 942 focuses on outputs: how AI-generated content is labeled, detected, and disclosed. AB 2013 focuses on inputs: how the data used to train generative AI systems is documented and made public.

For privacy and compliance professionals, these laws are more than legislative updates. They are operational mandates with real penalties for noncompliance. And they’re arriving at a moment when public trust in AI is fragile, regulators are sharpening their teeth, and stakeholders are asking, “How do we prove our AI is playing fair?”

Understanding California’s AI Transparency Act (SB 942)

The California AI Transparency Act is a consumer protection law with a simple premise: if you make AI that generates or alters content, you must tell people clearly, consistently, and in a way that can’t be easily stripped out.

However, the law’s scope is narrower than “all AI”. It applies only to covered providers (developers of a GenAI system with over 1,000,000 monthly visitors or users that are publicly accessible within California). It does not apply to certain exclusively non-user-generated experiences, such as video games, television, streaming, movies, or interactive content that is not created or modified by users. These exemptions mean some large AI content producers are outside the Act’s reach.

Core requirements include:

AI detection tools, free to the public

Covered providers must offer a publicly accessible detection tool to identify whether their generative AI system created or altered an image, video, or audio file. The tool must work via a web interface and an API, support content uploads or URLs, and output system provenance data (such as the system version and creation date) without exposing personal provenance data. The detection tool must be free to use, though providers may impose reasonable limitations to address security or integrity risks to their GenAI system.

Manifest disclosures (visible labels)

Users must be able to add a visible label: “manifest disclosure,” that identifies content as AI-generated. Labels must be clear, conspicuous, permanent (or nearly so), and appropriate for the medium.

Latent disclosures (embedded metadata)

All AI-generated content must include embedded information: provider name, GenAI system name and version, creation timestamp, and a unique identifier. This must be detectable by the provider’s AI detection tool and aligned with industry standards.

License enforcement

If a licensed third party disables disclosure capabilities, the provider must revoke their license within 96 hours. Licensees must cease using the system once a license is revoked.

Penalties

Civil penalties of $5,000 per violation, per day, plus possible injunctive relief, make this a law with real teeth.

California’s AI Transparency Act moves labeling and provenance from a “nice to have” to a “non-negotiable” but only for covered providers and only for content within its defined scope. If your AI touches California consumers and isn’t in an exempt category, transparency must be woven into your design and delivery pipelines.

How mature is your AI risk management?

Take the quiz

Breaking down California AB 2013 Generative AI Training Data Transparency

If SB 942 answers “How do we show people what’s AI-made?”, AB 2013 asks “What’s in the AI’s brain?”

By January 1, 2026, any developer releasing a new or substantially modified GenAI system (or a significant update) in California must publish training data documentation on their website. This must include:

High-level dataset summaries: sources or owners, purpose alignment, volume (ranges allowed), and types of data points.
IP and privacy flags: whether datasets contain copyrighted, trademarked, or patented material; whether they include personal or aggregate consumer information under California Consumer Privacy Act (CCPA) definitions.
Acquisition details: whether datasets were purchased or licensed.
Processing history: cleaning, modification, or enhancement steps, and their purpose.
Timeframes: when data was collected (and whether collection is ongoing), and when it was first used in training.
Synthetic data disclosure: if synthetic data generation was used, with an optional explanation of its functional purpose.

Exemptions exist for:

Generative AI systems or services whose sole purpose is to ensure security and integrity.
Systems used solely for the operation of aircraft in the national airspace.
Systems developed for national security, military, or defense purposes that are made available exclusively to a federal entity.

This is the first U.S. law to mandate public documentation of training data for commercial AI systems at this level of specificity. For compliance leaders, it means standing up data lineage management as a core governance function.

Unlock deeper compliance insights with a free trial of Nymity Research. Get instant access to jurisdiction-by-jurisdiction analysis, legislative tracking, and practical compliance guidance—including ongoing updates to California’s AI laws. Start your free trial today.

Practical implications for privacy and compliance teams

Think of SB 942 and AB 2013 as California handing you a two-page “AI transparency checklist,” except it’s written in legal code and costs $5,000/day to ignore.

Operational changes you’ll likely need:

New governance workflows to track data sources, IP rights, and privacy risk from dataset ingestion through model deployment.
Cross-functional playbooks between engineering, legal, privacy, and communications to handle disclosure labeling, detection tool updates, and public documentation.
Vendor and partner audits to ensure licensees and third parties keep required disclosure features intact.

Risk factors and violation scenarios:

Missing dataset documentation: A developer updates their GenAI model but fails to update the public training data summary as required under AB 2013. This could trigger enforcement if discovered during an investigation.
Noncompliant metadata: A provider releases AI-generated marketing images without embedding the latent disclosures SB 942 requires. If these assets are publicly distributed, each piece of noncompliant content could count as a separate violation.
License enforcement gaps: A licensee removes mandatory disclosure features from a licensed GenAI system. If the provider does not revoke the license within 96 hours of discovery, both the provider and the licensee could be exposed to penalties.

Broader compliance considerations for multi-jurisdiction alignment:

While not a requirement of SB 942 or AB 2013, California’s rules are among the most detailed in the U.S. Organizations operating across multiple regions should build processes that meet the most stringent overlapping requirements. This may include:

Mapping disclosure obligations in each jurisdiction where your AI operates (e.g., SB 942 in California, Colorado AI Act transparency rules, EU AI Act content labeling).
Designing universal disclosure templates that meet or exceed the strictest format, permanence, and metadata requirements you face globally.
Coordinating dataset documentation standards so that your AB 2013-compliant training data summaries also satisfy disclosure or risk assessment obligations under other AI or privacy laws.

Meeting these standards can help differentiate your organization as a trusted AI provider, especially in markets where public skepticism of AI remains high. It also reduces operational friction when scaling AI deployments across states and countries.

Compliance roadmap for California’s AI transparency laws

Step 1: Conduct a gap analysis

Compare existing AI governance against both laws. Pay special attention to provenance tracking, dataset documentation, and labeling workflows.

Step 2: Build a living training data inventory

Document source, ownership, type, processing history, and legal status for every dataset. Update this inventory with each model update or retraining.

Step 3: Implement disclosure templates

Develop standardized manifest and latent disclosures that meet SB 942’s permanence and clarity requirements. Test for resilience against stripping or alteration.

Step 4: Update vendor contracts

Mandate disclosure compliance in all GenAI licensing agreements. Include revocation rights and enforcement timelines.

Suggested practices and tools for achieving AI transparency

From a privacy-by-design perspective, California’s laws effectively require:

Integrated dataset documentation tools (e.g., metadata catalogs, lineage tracking platforms).
Content authenticity solutions: watermarking, C2PA-compliant metadata embedding, and detection APIs.
DPIA integration: add AI transparency checks to your data protection impact assessments and NIST AI Risk Management Framework processes.

Sector-specific watchpoints:

Healthcare: HIPAA considerations when disclosing dataset characteristics

Under AB 2013, developers must disclose whether training datasets include personal information or aggregate consumer information as defined in the CCPA. For healthcare organizations subject to HIPAA, this requirement demands extra caution. If training data includes protected health information (PHI), even in de-identified or aggregated form, disclosure summaries must avoid re-identification risks and maintain HIPAA-compliant safeguards.

Moreover, if synthetic data generation was used to augment sensitive datasets, AB 2013 allows developers to note its purpose, which could be leveraged to demonstrate HIPAA-aligned privacy preservation. The key challenge for healthcare entities will be balancing AB 2013’s transparency mandates with HIPAA’s strict confidentiality requirements and ensuring that no publicly posted dataset summaries inadvertently reveal sensitive medical details.

Finance: SEC and FINRA record retention rules for AI-generated disclosures

SB 942’s manifest and latent disclosure requirements mean that any AI-generated financial communications, from investor presentations to client statements, must be labeled and embedded with provenance metadata. For financial institutions under SEC or FINRA oversight, this creates a dual compliance obligation: maintaining SB 942-compliant disclosures while ensuring that all labeled AI-generated materials are retained in accordance with recordkeeping rules.

For example, FINRA Rule 2210 and SEC Rule 17a-4 require preserving certain communications for specified periods. If AI tools are used to create client-facing reports or marketing materials, firms must not only apply SB 942’s disclosure protocols but also store the original AI-labeled versions and their metadata in case of regulatory audits or disputes.

E-commerce: Brand protection when AI-generated marketing or product content is labeled

In the e-commerce sector, SB 942’s visible and embedded labeling of AI-generated content has direct brand implications. Marketing images, product descriptions, and promotional videos created by generative AI must carry manifest disclosures that are clear, conspicuous, and appropriate for the medium. This means customers may see explicit indicators that a product image or ad was AI-generated—a potential trust-building measure for some brands, but a reputational risk if not managed carefully.

The latent metadata requirements also mean that, even if visible labels are cropped or removed in unauthorized use, the embedded provenance can still identify the source. E-commerce companies will need to integrate these labeling practices into their creative workflows and brand guidelines, ensuring the disclosures are consistent, aesthetically aligned, and do not detract from customer engagement.

How California’s AI laws compare to other jurisdictions

California’s approach is more prescriptive than most U.S. states and aligns closely with the EU AI Act, which also requires training data and output transparency for specific systems.

EU AI Act: Applies tiered obligations based on risk category, with explicit transparency requirements for high-risk and foundation models.
Canada’s AIDA: Establishes requirements for “high-impact systems,” including risk mitigation and recordkeeping, but provides less detail on training data disclosure formats.

Colorado: The Colorado AI Act imposes obligations for developers and deployers of “high-risk AI systems,” including transparency measures, documented risk management programs, and consumer rights regarding AI-driven decisions.

Utah: The Utah AI Policy Act requires disclosure when AI is used in consumer interactions, including informing individuals when they engage with generative AI tools or chatbots.

Preparing for California AI Transparency Act (SB 942) and AB 2013 compliance: Why early action builds trust and reduces risk

Technical standards for provenance embedding, watermarking, and dataset documentation formats will continue to evolve—driven by both industry bodies and potential federal AI legislation. Privacy leaders should watch for updates from the NIST AI Risk Management Framework, the Coalition for Content Provenance and Authenticity (C2PA), and guidance from organizations like IAPP to ensure their programs stay current.

By acting early, organizations can do more than just meet California’s January 1, 2026 deadlines. They can shape industry norms, influence best practices, and position themselves as trusted leaders in the responsible use of AI.

Opacity was a feature of AI in its early days. In California, it’s now becoming a liability. By operationalizing transparency in both outputs (SB 942) and inputs (AB 2013), privacy and compliance leaders can:

Minimize fines, legal risk, and reputational damage
Build lasting trust with customers, partners, and regulators
Future-proof their AI governance frameworks against a fast-moving regulatory landscape.

Compliance will no longer be the finish line; it will be the entry ticket to market credibility. The organizations that lead now won’t just meet California’s bar; they’ll set the benchmark for responsible AI worldwide. The question isn’t whether you’ll comply; it’s whether you’ll lead.

AI Governance, Streamlined and Simplified.

Identify applicable AI laws, automate risk scoring, and track compliance so you can prove responsible AI use without overloading your governance team.

Streamline AI governance

Smarter Mapping. Stronger Risk Management.

Automate data flow mapping, risk analysis, and vendor assessments to reduce privacy risk and keep compliance on track in a fraction of the time.

Map and manage with ease

Frequently Asked Questions: California AI Transparency Act (SB 942) & AB 2013 Generative AI Training Data Transparency

1. What is the California AI Transparency Act (SB 942)?

The California AI Transparency Act (SB 942) is a state law that takes effect on January 1, 2026, and requires large generative AI providers to make their AI-generated content identifiable through both visible labels (manifest disclosures) and embedded metadata (latent disclosures). It also mandates that these providers offer a free, publicly accessible AI detection tool to identify content created or altered by their systems.

2. Who is considered a “covered provider” under SB 942?

A “covered provider” is defined in the bill as any entity that creates, codes, or otherwise produces a generative AI system with over 1 million monthly users in California and that is publicly accessible in the state.

3. Are there exemptions under SB 942?

Yes. SB 942 does not apply to products, services, websites, or applications that exclusively provide non-user-generated video games, television, streaming, movie, or interactive experiences.

4. What are “manifest” and “latent” disclosures in SB 942?

Manifest disclosures are visible labels applied to AI-generated content, such as “This image was generated by AI.” They must be clear, conspicuous, permanent (or nearly so), and appropriate for the medium.
Latent disclosures are embedded metadata that include details such as the provider’s name, the AI system name and version, the date/time of creation, and a unique identifier. These must be detectable by the provider’s AI detection tool and meet industry standards.

5. What is AB 2013: Generative AI Training Data Transparency?

AB 2013 is a California law effective January 1, 2026, that requires developers of generative AI systems to publish detailed documentation about the datasets used to train their systems. This includes information such as dataset sources, types of data points, intellectual property status, licensing details, data processing history, and whether synthetic data was used.

6. Who must comply with AB 2013?

Any developer releasing a new or substantially modified generative AI system in California (including significant updates to existing systems) must comply with AB 2013’s public documentation requirements.

7. What are the exemptions under AB 2013?

AB 2013 does not require documentation for:

Generative AI systems whose sole purpose is to ensure security and integrity.
Systems used solely for the operation of aircraft in the national airspace.
Systems developed for national security, military, or defense purposes that are made available exclusively to a federal entity.

8. What specific information must be disclosed under AB 2013?

The law requires documentation that includes:

Dataset sources or owners.
How datasets align with the system’s intended purpose.
Data point types and estimated volumes.
Intellectual property and privacy status (e.g., copyrighted, personal data).
Whether datasets were purchased, licensed, or in the public domain.
Processing or cleaning steps taken.
Data collection timeframes and first-use dates.
Whether synthetic data generation was used, with an optional explanation of why.

9. What are the penalties for violating SB 942 or AB 2013?

SB 942: Civil penalties of $5,000 per violation, per day, plus possible injunctive relief and legal costs. Each day a violation continues counts as a separate offense.
AB 2013: The bill itself does not specify a monetary penalty in the retrieved text. However, it grants enforcement authority to the California Attorney General, meaning noncompliance could still result in enforcement actions, including investigations and other remedies allowed under California law.

10. How can privacy professionals prepare for compliance?

For SB 942: Develop workflows for labeling AI-generated content with both visible and embedded disclosures, ensure metadata persistence, and deploy a compliant detection tool.
For AB 2013: Maintain a living inventory of training datasets with full source, licensing, processing, and IP details, and ensure this can be published in the required public format before release.
In both cases: Integrate these obligations into vendor contracts, data governance frameworks, and multi-jurisdiction compliance plans.

Key Topics

Get the latest resources sent to your inbox

Related resources

View all resources