Courts-and-Legislators-Nudge-for-a-New-Data-Market

Courts-and-Legislators-Nudge-for-a-New-Data-Market

Courts and Legislators Nudge for a New Data Market

“Clean Data” Debate in Generative AI

In the last 30 days two landmark copyright decisions came out of the Northern District of California and, one week later, the European Parliament released a 175‑page study on generative‑AI training. Both say the same thing in different ways: right‑now there is no recognised market for licensing books, images or music as AI‑training fodder. U.S. judges treat that vacuum as proof of “no market harm,” while EU policy‑makers call it a market failure that must be fixed.

Judge Chhabria (Kadrey v. Meta): “Llama is not capable of generating enough text from the plaintiffs’ books to matter, and the plaintiffs are not entitled to the market for licensing their works as AI training data.”[1]

Judge Alsup (Bartz v. Anthropic): “A market could develop … Even so, such a market … is not one the Copyright Act entitles Authors to exploit.” [2]

EU Parliament study: Current EU law “leaves creators without any enforceable mechanism to authorise, deny, or license the use of their works for AI training under negotiated terms.” [3]

June 2025 Northern District of California Decisions
Bartz v. Anthropic (Alsup J., 23 June 2025)

Three authors said Anthropic scanned millions of print and pirate‑site books and used them to train Claude. Alsup found that the training copies were fair use because the model transforms whole books into “statistical abstractions” that never substitute for the originals. On market harm (Factor 4) he accepted, for argument’s sake, that a licensing market could emerge, but held it is “not one the Copyright Act entitles Authors to exploit.” The only infringement he left standing is Anthropic’s internal “pirated library,” which will go to trial on damages. In short: training is safe (for now), while hoarding pirated PDFs not. [4]

Kadrey v. Meta (Chhabria J., 25 June 2025)

Thirteen novelists sued Meta for scraping “shadow libraries” to build Llama. The court said Llama’s weights are “highly transformative”: they store patterns, not expressive chunks. Meta’s expert ran “adversarial prompting” experiments and could not coax any Llama model to emit more than 50 consecutive tokens (≈ 50 words) from any plaintiff’s book. The plaintiffs’ own expert agreed that Llama could not reproduce “any significant percentage” of the texts.[5]

Meta also submitted testimony that none of the 13 authors has ever licensed – or even been asked to license – a book for AI-training purposes. If no market exists, Meta’s unlicensed use cannot depress it and there is nothing for the copyright holder to lose.

On Factor 4 Chhabria called the licensing‑market theory a “clear loser,” writing that the authors “are not entitled to the market for licensing their works as AI training data.” He granted Meta summary judgment on training copies but flagged that better evidence of market dilution could swing future cases.

Both judges imposed an empirical burden on plaintiffs: show a functioning or nascent licensing market. Until such a market exists – with prices, contracts, measurable revenue, AI models enjoy a sizeable fair‑use advantage.

EU’s Generative AI and Copyright: Training, Creation, Regulation

The Parliament’s Justice Committee study, published on 30 June 2025, suggested that there should be some kind of remuneration for authors who’s works are used for AI training. However, the Committee, just like the US Courts recognized that any protection of such economic rights cannot be enforced as authors have no practical way to license their work for AI training.

In both cases – US and EU – it comes down to whether the licensing market for authors data vis a vis AI training models exists or not. The US Courts will wait for more evidence from future plaintiffs. As for EU, the Committee proposed pro-active steps to establish such market and build licensing channel at a legislation level.

If Europe builds a paid licensing channel, U.S. plaintiffs could soon point to that very market to show cognizable harm, erasing the defense advantage that Meta and Anthropic just enjoyed.

What Happens when the “Licensing Void” Fills

Models that can prove “clean data” will cost more

Adobe’s Firefly image tool promotes itself as “commercially safe” because it was trained only on Adobe Stock, public‑domain and openly licensed pictures.[6] On the supply side, Photobucket is asking 5 cents to $1 per photo to license its 13 billion‑image archive which is content that used to sit online for free.[7]

Public‑domain or Creative Commons Attribution licence (CC BY) material remains free, but any option that comes with a clear paper trail will carry a premium. After all, it protects both users and developers from potential lawsuits.

A two‑tier data economy will settle in

Tier I: premium, traceable datasets such as newspapers, professional photo libraries, industry research all licensed through collective deals or pay per asset APIs.

Reddit has already priced its user posts at ≈ $60 million a year for Google. [8]

Tier II: public‑domain text, Wikipedia, government data that is still free and legal in the US under fair‑use rules. US likely keeps fair-use baseline, but market will converge on global “reference rates” for high value sectors like music, images, specialized text.

Over time, prices in the premium lane will settle into “reference rates” (think “$x per photo” or “$y per song”) for high‑value sectors such as images, music and specialist text.

Data marketplaces will feel like app stores

Cloud vendors already host one‑click catalogues of “ready‑to‑license” filesets. Amazon Web Services Data Exchange lists 3,000‑plus commercial and 1,000‑plus free datasets covering everything from news wires to medical scans.[9]

As deals scale, AI model builders will bolt on leakage‑testing dashboards and provenance logs to reassure investors and to comply with the EU AI Act’s rule saying big models must publish a training‑data summary.[10]

Practical take-aways for stakeholders

Data is becoming a tradeable commodity. Until recently, tech companies scraped whatever text or images they could find on the open web for free. Now, they are being pushed to pay for clean, permission-based datasets.

EU is signaling that silence will no longer equal consent. Successful AI products in the next decade will be those built on traceable, fairly acquired data streams.

The era of free-for-all scraping is giving way to a regulated data-supply chain where proof of origin and proof of non-harm will decide who can train, and at what price.

Specialized data brokers are springing up. Think of them as “Spotify” for training data, platforms where you can subscribe to texts, images, or medical scans. This will make buying data easier and give creator a single place to license their work. But consequently, this will raise prices for high-quality collection of data for end-users.

The AI training‑dataset market, valued at USD 2.6 billion in 2024, is projected to reach USD 8.6 billion by 2030, expanding at a CAGR of 21.9 %.[11]

Every day AI tools like search assistants, email co‑pilots, and simple image generators will likely remain low‑cost or free. But the “niche” models those offering medical advice, legal research, or high‑end graphics will come with a higher price tag, and they’ll be upfront about why: “built on licensed data, with $50K indemnification.” You’ll also start seeing little provenance badges everywhere, so you’ll always know which outputs are safe to reuse in your own projects.

The June 2025 US decisions didn’t green‑light endless scraping, they just pointed out that authors couldn’t show any market harm yet, since no market exists. Then, the EU Parliament study proposed creating that market and putting a price on data.

Even if no court or legislative backing is fully achieved, the magnitude of AI boom and its dependency on quality data creates so much upside for authors that we may see the emergence of siloed platforms which will hold the keys to the quality data. They will be a gatekeeper to keep AI agents or their APIs away from such data unless subscription is paid. If this happens, the quality data may gradually reduce in public domain and move into pockets of pre-licensed platforms.

want to know more?

Learn more about our work

About Company

Advisory

RESTAKE (NOW SWIKE) (Blockchain Infrastructure / Staking)

Restake provides institutional-grade blockchain infrastructure and non-custodial staking services across multiple protocol ecosystems. Founded in 2022 by Lian Group and rebranded to Swyke in 2025, the company operates validator nodes through its Finality platform and supports infrastructure for networks including Ika, Union, Somnia, and Flare. Restake combines validator operations with Web3 venture capital and innovation labs, serving institutional clients and protocol foundations while maintaining a focus on secure, non-custodial staking solutions from its Dubai base.

Learn more: www.swyke.ai

    Profile

    Operating Partner

    Stefan Carlsson

    Stefan Carlsson is a tech executive and NED with over 25 years experience in leadership roles across Europe, Asia, the Middle East and the US. Stefan has held CEO and CFO roles in publicly listed companies and high-growth scale-ups backed by leading institutional investors.

    About Company

    Advisory

    COWA (Digital Infrastructure / Sustainable Computing)

    Cowa AI develops sustainable digital infrastructure for Bitcoin mining and AI computing. Based in London, the company operates zero-carbon data centers in the Nordic region powered by renewable hydro energy. Cowa provides infrastructure for blockchain mining, high-performance computing for machine learning, and PoS blockchain validation services, while also investing in technology and Web3 ventures. The company’s model leverages renewable energy resources to deliver environmentally sustainable solutions for energy-intensive computational applications.

    Learn more: www.cowa.ai

    About Company

    Advisory

    WOWCUBE (Consumer Electronics)

    WOWCube Entertainment develops volumetric computing hardware for next-generation interactive entertainment. The company’s flagship product is a modular handheld gaming platform featuring 24 integrated high-resolution displays across eight processing units, controlled through six-axis motion input. WOWCube’s technology creates three-dimensional gaming experiences through physical device manipulation, establishing a new category in spatial computing that bridges tactile interaction with immersive digital content.

    Learn more: www.wowcube.com

    About Company

    Incubation and spin-off

    LEXMINSTER (Legal Services)​

    Lexminster is a business-oriented international law firm serving clients across Central Asia and global markets. Based in Uzbekistan and Kazakhstan with representations in Austria, the UAE, and the UK, the firm specializes in cross-border M&A, banking and finance, and real estate transactions. Lexminster advises private equity funds, financial institutions, and multinational companies on complex deals across more than 40 jurisdictions, delivering commercially focused legal solutions backed by deep regional expertise and an extensive international network.

    Learn more: www.lexminster.com

    About Company

    Investment

    BITFURY (Technology / Infrastructure)

    Bitfury Group is a full-service blockchain technology company and one of the largest private infrastructure providers in the blockchain ecosystem. Founded in 2011, Bitfury specializes in hardware, security, and software solutions that secure the Bitcoin blockchain and advance Web 3.0 technologies. The company has transitioned from Bitcoin mining operations to operating as a technology incubator focused on ethical innovation in artificial intelligence, quantum computing, and decentralized systems. Bitfury’s infrastructure and technology solutions support the development of next-generation blockchain applications and emerging computational technologies.

    Learn more: www.bitfury.com

    About Company

    Investment

    LAYER5 (Venture Capital / AI Applications)

    Layer 5 backs domain experts building AI-powered solutions for enterprise problems. The firm combines venture capital with technical development services, partnering with industry veterans to create software for heavily regulated sectors with outdated legacy systems. Layer 5 focuses on the AI application layer, targeting industries where deep domain expertise and validated buyer pain create defensible advantages over purely technical AI teams.

    Learn more: www.layer5.vc

    About Company

    Advisory

    KORELYA CAPITAL (Venture Capital)

    Korelya Capital is a venture capital firm managing over €800 million to invest in European technology companies. Founded in 2016, the Paris-based firm backs early and growth-stage startups in AI, deep tech, and digital services, providing capital and strategic partnerships to facilitate expansion into Asian markets. With offices across Paris, London, Seoul, and Singapore, Korelya leverages relationships with major Asian corporates to support portfolio companies including GetYourGuide, Ledger, and Bolt in building global market positions.

    Learn more: www.korelyacapital.com

    About Company

    Advisory

    WEO (Health-Tech / Smart Bottle)

    Weo is a health-tech company specializing in biotechnology for hydration and immune health. The company has developed a smart bottle utilizing patented diamond-based electrolysis technology to alter the molecular structure of water, enhancing its antioxidant properties for human, animal, and plant consumption. Weo’s technology platform targets improved health outcomes through advanced water science, combining hardware innovation with biochemical research to deliver functional hydration solutions.

    Learn more: www.we-o.com

    About Company

    Advisory

    LIAN CAPITAL PARTNERS (Private Equity / Venture Capital)

    LIAN Capital Partners is a Luxembourg-based private equity and venture capital firm operating an investor-operator model across technology infrastructure and healthcare sectors. Founded in 2017 as part of LIAN Group, the firm provides capital and hands-on operational expertise to growth-stage companies, buyouts, and venture investments, with focus areas including blockchain, fintech, logistics, and real estate. LIAN targets Series B and later-stage rounds, serving institutional clients including family offices, private banks, and ultra-high-net-worth individuals. With offices in Luxembourg and Geneva, the firm acts as both investment manager and active operator, leveraging deep sector expertise to drive value creation in high-potential technology and healthcare companies.

    Learn more: www.liangroup.io

    About Company

    Advisory

    CRYSTAL INTELLIGENCE (Blockchain Analytics / Compliance)

    Crystal Blockchain provides advanced analytics and forensic tools for cryptocurrency compliance and investigation. Developed by Bitfury Group, the platform enables financial institutions, law enforcement, and crypto businesses to trace transactions, assess risks, and maintain AML compliance across major blockchain networks. Crystal offers transaction monitoring, fund flow visualization, and custom risk scoring through SaaS, API, and public explorer interfaces, supporting regulatory compliance and risk management across the digital asset industry.

    Learn more: www.crystalintelligence.com

    About Company

    Investment

    DCB (Agritech / Livestock)

    Dmanisi Cattle Breeding is Georgia’s only large-scale cattle breeding operation, raising 800 head of premium French Salers cattle on over 1,600 hectares. The farm is developing Georgia’s first large-scale domestic beef brand to replace imported meat with locally-produced supply. Selected for their adaptation to Georgia’s climate and meat quality, the Salers breed achieves reproduction and fattening rates comparable to their French counterparts, enabling consistent year-round production for the local retail market.

    About Company

    Investment

    LONGBOW (Automotive / Electric Vehicles)

    Longbow Motors is a UK-based automotive innovator focused on lightweight, driver-centric electric sports cars. Founded by industry veterans with experience at leading EV and performance brands, Longbow aims to redefine the electric sports car segment with vehicles that prioritize engagement, agility, and craftsmanship. Its first series of models, including open-top and fixed-roof sports cars, are designed and hand-built in the UK with production targeted in the mid-2020s.

    Learn more: www.longbowmotors.com

    About Company

    Incubation and spin-off

    HUT8 (Energy & Digital Infrastructure / Data Centers)

    Hut 8 is a diversified energy and digital infrastructure company that has evolved from its roots in Bitcoin mining into a broader platform spanning power generation, compute hosting, and data center development. The company now pursues large-scale data center projects across North America and has executed multi-billion-dollar agreements to provide energy-intensive compute capacity for AI workloads, reflecting its transition toward infrastructure supporting next-generation technology demand.

    Learn more: www.hut8.com

    About Company

    Incubation and spin-off

    ANAN

    Anan Data Centers is a data center operator based in Israel that provides secure, high-performance colocation and infrastructure services for enterprises and cloud providers. With secure, resilient facilities and a focus on performance and compliance, Anan supports a range of critical IT workloads, positioning itself as a core provider of infrastructure in the region.

    Learn more: www.anandata.io

    About Company

    Incubation and spin-off

    CIFR (Digital Infrastructure / Compute Hosting)

    Cipher Mining Inc. is an industrial-scale data center operator originally focused on Bitcoin mining that has expanded into broader infrastructure hosting for high-performance computing. The company develops and operates facilities with significant power capacity and has secured large-scale leases that position it at the intersection of digital asset production and compute-intensive workloads, including AI and enterprise infrastructure customers.

    Learn more: www.ciphermining.com

    About Company

    Incubation and spin-off

    POLAR (Digital Infrastructure / HPC)

    Polar DC is a European data center developer focused on next-generation infrastructure designed for high-performance computing and AI workloads. The company builds and operates advanced facilities that combine sustainability, operational excellence, and cutting-edge design to meet the growing demands of cloud, AI, and enterprise customers. Polar’s initial capacity is fully pre-sold and it continues to expand its footprint across strategic locations in Europe, backed by experienced leadership and institutional capital.

    Learn more: https://www.polardc.com/