LISCAN.AI – Detecting License Incompatibilities for AI Models

Abstract

Overview

A concise summary of our research motivation, methodology, and key findings.

Open-source AI-specific licenses represent a new form of social contract in the digital age, introducing complexities and multidimensional challenges for license compatibility detection. Unlike traditional SPDX licenses, open-source AI models adopt a hierarchical licensing framework encompassing model main license and component licenses (Weights, Dataset), along with novel clauses such as technical constraints, commercial restrictions, and ethical responsibilities.

This paper introduces LISCAN.AI, the first LLM-powered framework detecting license incompatibilities for AI artifacts through: (1) License-aware Model Derivation (LMD) Graphs modeling cross-model dependencies, (2) License Compatibility Assessment (LCA) rules for AI models with validation from senior attorneys, and (3) LLM-powered semantic analysis of unstructured text.

Our evaluation of 2,204,432 models reveals widespread issues, with 37.2% of global-component license pairs and 89.7% of derivative models showing incompatibilities. In real-world validation, we reported 56 issues to Hugging Face developers — 29 out of 31 responses confirmed the identified incompatibilities.

Framework

Overall Architecture of LISCAN.AI

LISCAN.AI consists of three tightly integrated modules that work together to detect license incompatibilities across the AI model ecosystem.

🔗

① LMD Graph Construction

Constructs a License-aware Model Derivation (LMD) Graph to simultaneously capture hierarchical license declarations (Main, Weights, Dataset) and extrinsic derivative relationships (fine-tune, quantize, adapter, merge, distill) between models. Metadata is extracted via Hugging Face REST API and LLM-based README parsing.

Graph Modeling LLM Parsing

⚖️

② License Compatibility Assessment Framework

Collaborates with senior attorneys to construct LCA rules based on analysis of 82 mainstream AI licenses. Identifies 7 obligations (O1–O7) and 5 AI-specific restrictions (R1–R5), classifying licenses into 4 types: Permissive SPDX, Copyleft SPDX, SR-AI, and WR-AI.

Attorney Validated 82 Licenses

🔍

③ License Compatibility Detection

Identifies all license pairs (intra-model and inter-model) and assesses them via three detection methods: Type 1 (SPDX-SPDX via OSADL matrix), Type 2 (SPDX-AI via LCA rules), and Type 3 (AI-AI via LLM semantic analysis).

3-Type Detection LLM Semantic

LCA Rules

License Compatibility Assessment Rules

Formulated through collaborative analysis of 82 AI licenses with senior attorneys. These rules govern compatibility between four license types across derivation relationships.

SR-AI: Strongly Restrictive (e.g. OpenRAIL, Gemma) WR-AI: Weakly Restrictive (e.g. LLaMA 2/3) Permissive SPDX (e.g. MIT, Apache 2.0) Copyleft SPDX (e.g. GPL, LGPL)

Downstream License	Upstream License	Compatibility	Primary Reason
SR-AI License	Permissive SPDX	✓ Compatible	SR-AI imposes stricter conditions, satisfying all original constraints.
	Copyleft SPDX	✗ Incompatible	Extra restrictions violate copyleft's "same license" mandate.
	WR-AI License	✗ Incompatible	SR-AI fails to fully satisfy WR-AI's commercial/ethical constraints.
	SR-AI License	⚠ Clause-Dependent	Requires granular clause-by-clause analysis.
WR-AI License	Permissive SPDX	✓ Compatible	WR-AI imposes stricter conditions, satisfying all original constraints.
	Copyleft SPDX	✗ Incompatible	Extra restrictions violate copyleft's "same license" mandate.
	WR-AI License	⚠ Clause-Dependent	Requires granular clause-by-clause analysis.
	SR-AI License	✗ Incompatible	WR-AI fails to preserve pass-through restrictions of SR-AI.
Permissive SPDX	SR-AI License	✗ Incompatible	Cannot enforce use restrictions in upstream SR-AI licenses.
Permissive SPDX	WR-AI License	✗ Incompatible	Cannot enforce use restrictions in upstream WR-AI licenses.
Copyleft SPDX	SR-AI License	✗ Incompatible	"No additional restrictions" conflicts with SR-AI pass-through mandate.

License Terms

Employs LLM-powered semantic analysis with structured prompting to detect conflicting interpretations across 12 identified terms (O1–O7, R1–R5). Also handles licenses with additional unstructured clauses.

🤖 LLM Semantic Analysis

Evaluation

Experimental Results (RQ1–RQ3)

Evaluated on a ground truth dataset of 188 license combinations with 88 incompatibility issues, compared against state-of-the-art baseline LIDETECTOR.

🏆 Compatibility Detection Performance

LISCAN.AI (DeepSeek-R1) — Precision 85.4%

LISCAN.AI (DeepSeek-R1) — Recall 86.4%

LISCAN.AI (DeepSeek-R1) — F1 85.9%

Baseline LIDETECTOR — Precision 65.7%

Baseline LIDETECTOR — Recall 70.3%

📊 Landscape of AI Licensing (RQ2)

Models with SPDX licenses 23.8%

Models with AI licenses 6.2%

Models without license 70.0%

Top AI license: OpenRAIL 27.7%

Derivatives with different licenses 43.8%

Datasets with commercial restrictions 59.9%

⚠️ State of Incompatibility (RQ3)

Main-Component incompatibility rate 88.5%

Direct derivative incompatibility 43.6%

Indirect dependency risks 56.4%

Top pattern: Apache 2.0 → LLaMA3 14,873 cases

Models analyzed in total 2,204,432

🔬 LLM Comparison (Phase 2: Compat. Detection)

DeepSeek-v3.2 — F1 88.8%

GPT-5.1 — F1 89.0%

Gemini-3-pro — F1 91.3%

LIDETECTOR (baseline) — F1 67.3%

Real-World Validation

Practical Utility on Hugging Face (RQ4)

We reported 56 detected issues to Hugging Face developers and tracked their responses to validate LISCAN.AI's practical effectiveness.

Issue Reports Submitted

Developer Responses Received

Incompatibilities Confirmed

51.8%

Confirmation Rate

Developer Actions Taken

14 cases — Downstream license compliance updates (e.g., adding required Gemma License documentation)
11 cases — License type alignment with base models
4 cases — Missing license remediations added
2 cases — Developers justified keeping licenses (plugins/standalone datasets, not derivatives)
25 cases — Pending (developer inactivity on Hugging Face)

💡 Key Finding: Reports revealed critical knowledge gaps — e.g., developers were unaware of LLaMA 3.3's naming requirements, demonstrating LISCAN.AI's value in raising awareness of licensing obligations.

Contributions

Key Contributions

This paper makes three primary contributions to the field of AI license compliance.

🔬 Originality & Technique

LISCAN.AI is the first LLM-powered framework for automated detection of license incompatibilities for AI artifacts. It combines LMD Graphs for derivation tracking, legally validated LCA rules, and semantic analysis to detect incompatibilities (92.9% Precision, 89.8% Recall).

📦 Reproduction Package

Publicly released resources on liscanai.github.io: (1) Available LISCAN.AI tool at liscan-ai.com. (2) Ground Truth Dataset with 188 license combinations and 88 incompatibility issues.

📊 Large-Scale Study

Large-scale validation of 2,204,432 Hugging Face models reveals pervasive incompatibilities: 37.2% of main-component license pairs and 89.7% of derivative models show incompatibilities, exposing systemic risks in the AI ecosystem.