IEEE TSE 2026

The New Frontier of AI Licensing:
Detecting License Incompatibilities for AI Models

LISCAN.AI — the first LLM-powered framework for automated detection of license incompatibilities in open-source AI artifacts, validated on 2.2M+ Hugging Face models.

🔧 Try LISCAN.AI 💾 Dataset
2.2M+
AI Models Analyzed
92.9%
Precision
89.8%
Recall
89.7%
Derivative Incompatibility
82
AI Licenses Studied
51.8%
Developer Confirmation Rate

Overview

A concise summary of our research motivation, methodology, and key findings.

Open-source AI-specific licenses represent a new form of social contract in the digital age, introducing complexities and multidimensional challenges for license compatibility detection. Unlike traditional SPDX licenses, open-source AI models adopt a hierarchical licensing framework encompassing model main license and component licenses (Weights, Dataset), along with novel clauses such as technical constraints, commercial restrictions, and ethical responsibilities.

This paper introduces LISCAN.AI, the first LLM-powered framework detecting license incompatibilities for AI artifacts through: (1) License-aware Model Derivation (LMD) Graphs modeling cross-model dependencies, (2) License Compatibility Assessment (LCA) rules for AI models with validation from senior attorneys, and (3) LLM-powered semantic analysis of unstructured text.

Our evaluation of 2,204,432 models reveals widespread issues, with 37.2% of global-component license pairs and 89.7% of derivative models showing incompatibilities. In real-world validation, we reported 56 issues to Hugging Face developers — 29 out of 31 responses confirmed the identified incompatibilities.
Framework

Overall Architecture of LISCAN.AI

LISCAN.AI consists of three tightly integrated modules that work together to detect license incompatibilities across the AI model ecosystem.

🔗

① LMD Graph Construction

Constructs a License-aware Model Derivation (LMD) Graph to simultaneously capture hierarchical license declarations (Main, Weights, Dataset) and extrinsic derivative relationships (fine-tune, quantize, adapter, merge, distill) between models. Metadata is extracted via Hugging Face REST API and LLM-based README parsing.

Graph Modeling LLM Parsing
⚖️

② License Compatibility Assessment Framework

Collaborates with senior attorneys to construct LCA rules based on analysis of 82 mainstream AI licenses. Identifies 7 obligations (O1–O7) and 5 AI-specific restrictions (R1–R5), classifying licenses into 4 types: Permissive SPDX, Copyleft SPDX, SR-AI, and WR-AI.

Attorney Validated 82 Licenses
🔍

③ License Compatibility Detection

Identifies all license pairs (intra-model and inter-model) and assesses them via three detection methods: Type 1 (SPDX-SPDX via OSADL matrix), Type 2 (SPDX-AI via LCA rules), and Type 3 (AI-AI via LLM semantic analysis).

3-Type Detection LLM Semantic
LCA Rules

License Compatibility Assessment Rules

Formulated through collaborative analysis of 82 AI licenses with senior attorneys. These rules govern compatibility between four license types across derivation relationships.

SR-AI: Strongly Restrictive (e.g. OpenRAIL, Gemma) WR-AI: Weakly Restrictive (e.g. LLaMA 2/3) Permissive SPDX (e.g. MIT, Apache 2.0) Copyleft SPDX (e.g. GPL, LGPL)
Downstream License Upstream License Compatibility Primary Reason
SR-AI License Permissive SPDX ✓ Compatible SR-AI imposes stricter conditions, satisfying all original constraints.
Copyleft SPDX ✗ Incompatible Extra restrictions violate copyleft's "same license" mandate.
WR-AI License ✗ Incompatible SR-AI fails to fully satisfy WR-AI's commercial/ethical constraints.
SR-AI License ⚠ Clause-Dependent Requires granular clause-by-clause analysis.
WR-AI License Permissive SPDX ✓ Compatible WR-AI imposes stricter conditions, satisfying all original constraints.
Copyleft SPDX ✗ Incompatible Extra restrictions violate copyleft's "same license" mandate.
WR-AI License ⚠ Clause-Dependent Requires granular clause-by-clause analysis.
SR-AI License ✗ Incompatible WR-AI fails to preserve pass-through restrictions of SR-AI.
Permissive SPDX SR-AI License ✗ Incompatible Cannot enforce use restrictions in upstream SR-AI licenses.
WR-AI License ✗ Incompatible Cannot enforce use restrictions in upstream WR-AI licenses.
Copyleft SPDX SR-AI License ✗ Incompatible "No additional restrictions" conflicts with SR-AI pass-through mandate.
License Terms

Identified AI License Obligations & Restrictions

Through collaborative analysis with senior attorneys (Cohen's κ = 0.90), we identified 7 obligations and 5 AI-specific restrictions that impact compatibility assessments.

O1

Include Copyright

Retain copyright notices in all copies or substantial uses of the software/models.

General
O2

Include License

Include the full text of the license in modified software/models.

General
O3

Disclose Source/Weights

Distribution requires mandatory disclosure of source code, model weights, and training datasets.

General
O4

State Changes

Distribution requires a prominent notice stating modifications made to the original version.

General
O5

Reciprocal Obligation

Modified versions must be distributed under the original license (copyleft).

General
O6

Trademark Notice

Original attributions or trademarks must be prominently displayed on Model Card or UI.

AI-Specific
O7

Pass-through Restriction

Distributors must incorporate behavioral restriction clauses into downstream licenses.

AI-Specific
R1

Ethical Constraints

Prohibits use in high-risk or unethical domains (military, medical misinformation, surveillance).

AI-Specific
R2

Commercial Scale Limit

Usage-based thresholds, e.g., LLaMA's 700 million monthly active user ceiling.

AI-Specific
R3

Commercial Service Restriction

Direct commercial use as a hosted service (e.g., API) is prohibited without authorization.

AI-Specific
R4

Anti-Distillation

Prohibits using model outputs to train competing AI models (e.g., LLaMA 2/3).

AI-Specific
R5

Output Ownership

Restrictions on the ownership and usage rights of model-generated content.

AI-Specific
Detection Methods

Three-Type Detection Strategy

LISCAN.AI employs three complementary detection methods tailored to different license pair combinations.

1

Type 1: SPDX–SPDX

Leverages the OSADL License Compatibility Matrix covering 113 mainstream SPDX licenses. Provides direct, deterministic compatibility determination for standard open-source license pairs.

📊 OSADL Matrix
2

Type 2: SPDX–AI / AI–SPDX

Applies our attorney-validated LCA rules to determine compatibility between traditional SPDX licenses and AI-specific licenses (SR-AI or WR-AI), covering all cross-paradigm combinations.

⚖️ LCA Rules
3

Type 3: AI–AI & Additional Clauses

Employs LLM-powered semantic analysis with structured prompting to detect conflicting interpretations across 12 identified terms (O1–O7, R1–R5). Also handles licenses with additional unstructured clauses.

🤖 LLM Semantic Analysis
Evaluation

Experimental Results (RQ1–RQ3)

Evaluated on a ground truth dataset of 188 license combinations with 88 incompatibility issues, compared against state-of-the-art baseline LIDETECTOR.

🏆 Compatibility Detection Performance

LISCAN.AI (DeepSeek-R1) — Precision 85.4%
LISCAN.AI (DeepSeek-R1) — Recall 86.4%
LISCAN.AI (DeepSeek-R1) — F1 85.9%
Baseline LIDETECTOR — Precision 65.7%
Baseline LIDETECTOR — Recall 70.3%

📊 Landscape of AI Licensing (RQ2)

Models with SPDX licenses 23.8%
Models with AI licenses 6.2%
Models without license 70.0%
Top AI license: OpenRAIL 27.7%
Derivatives with different licenses 43.8%
Datasets with commercial restrictions 59.9%

⚠️ State of Incompatibility (RQ3)

Main-Component incompatibility rate 88.5%
Direct derivative incompatibility 43.6%
Indirect dependency risks 56.4%
Top pattern: Apache 2.0 → LLaMA3 14,873 cases
Models analyzed in total 2,204,432

🔬 LLM Comparison (Phase 2: Compat. Detection)

DeepSeek-v3.2 — F1 88.8%
GPT-5.1 — F1 89.0%
Gemini-3-pro — F1 91.3%
LIDETECTOR (baseline) — F1 67.3%
Real-World Validation

Practical Utility on Hugging Face (RQ4)

We reported 56 detected issues to Hugging Face developers and tracked their responses to validate LISCAN.AI's practical effectiveness.

56
Issue Reports Submitted
31
Developer Responses Received
29
Incompatibilities Confirmed
51.8%
Confirmation Rate

Developer Actions Taken

  • 14 cases — Downstream license compliance updates (e.g., adding required Gemma License documentation)
  • 11 cases — License type alignment with base models
  • 4 cases — Missing license remediations added
  • 2 cases — Developers justified keeping licenses (plugins/standalone datasets, not derivatives)
  • 25 cases — Pending (developer inactivity on Hugging Face)
💡 Key Finding: Reports revealed critical knowledge gaps — e.g., developers were unaware of LLaMA 3.3's naming requirements, demonstrating LISCAN.AI's value in raising awareness of licensing obligations.
Contributions

Key Contributions

This paper makes three primary contributions to the field of AI license compliance.

🔬 Originality & Technique

LISCAN.AI is the first LLM-powered framework for automated detection of license incompatibilities for AI artifacts. It combines LMD Graphs for derivation tracking, legally validated LCA rules, and semantic analysis to detect incompatibilities (92.9% Precision, 89.8% Recall).

📦 Reproduction Package

Publicly released resources on liscanai.github.io: (1) Available LISCAN.AI tool at liscan-ai.com. (2) Ground Truth Dataset with 188 license combinations and 88 incompatibility issues.

📊 Large-Scale Study

Large-scale validation of 2,204,432 Hugging Face models reveals pervasive incompatibilities: 37.2% of main-component license pairs and 89.7% of derivative models show incompatibilities, exposing systemic risks in the AI ecosystem.