LISCAN.AI — the first LLM-powered framework for automated detection of license incompatibilities in open-source AI artifacts, validated on 2.2M+ Hugging Face models.
A concise summary of our research motivation, methodology, and key findings.
LISCAN.AI consists of three tightly integrated modules that work together to detect license incompatibilities across the AI model ecosystem.
Constructs a License-aware Model Derivation (LMD) Graph to simultaneously capture hierarchical license declarations (Main, Weights, Dataset) and extrinsic derivative relationships (fine-tune, quantize, adapter, merge, distill) between models. Metadata is extracted via Hugging Face REST API and LLM-based README parsing.
Graph Modeling LLM ParsingCollaborates with senior attorneys to construct LCA rules based on analysis of 82 mainstream AI licenses. Identifies 7 obligations (O1–O7) and 5 AI-specific restrictions (R1–R5), classifying licenses into 4 types: Permissive SPDX, Copyleft SPDX, SR-AI, and WR-AI.
Attorney Validated 82 LicensesIdentifies all license pairs (intra-model and inter-model) and assesses them via three detection methods: Type 1 (SPDX-SPDX via OSADL matrix), Type 2 (SPDX-AI via LCA rules), and Type 3 (AI-AI via LLM semantic analysis).
3-Type Detection LLM SemanticFormulated through collaborative analysis of 82 AI licenses with senior attorneys. These rules govern compatibility between four license types across derivation relationships.
| Downstream License | Upstream License | Compatibility | Primary Reason |
|---|---|---|---|
| SR-AI License | Permissive SPDX | ✓ Compatible | SR-AI imposes stricter conditions, satisfying all original constraints. |
| Copyleft SPDX | ✗ Incompatible | Extra restrictions violate copyleft's "same license" mandate. | |
| WR-AI License | ✗ Incompatible | SR-AI fails to fully satisfy WR-AI's commercial/ethical constraints. | |
| SR-AI License | ⚠ Clause-Dependent | Requires granular clause-by-clause analysis. | |
| WR-AI License | Permissive SPDX | ✓ Compatible | WR-AI imposes stricter conditions, satisfying all original constraints. |
| Copyleft SPDX | ✗ Incompatible | Extra restrictions violate copyleft's "same license" mandate. | |
| WR-AI License | ⚠ Clause-Dependent | Requires granular clause-by-clause analysis. | |
| SR-AI License | ✗ Incompatible | WR-AI fails to preserve pass-through restrictions of SR-AI. | |
| Permissive SPDX | SR-AI License | ✗ Incompatible | Cannot enforce use restrictions in upstream SR-AI licenses. |
| WR-AI License | ✗ Incompatible | Cannot enforce use restrictions in upstream WR-AI licenses. | |
| Copyleft SPDX | SR-AI License | ✗ Incompatible | "No additional restrictions" conflicts with SR-AI pass-through mandate. |
Through collaborative analysis with senior attorneys (Cohen's κ = 0.90), we identified 7 obligations and 5 AI-specific restrictions that impact compatibility assessments.
Retain copyright notices in all copies or substantial uses of the software/models.
GeneralInclude the full text of the license in modified software/models.
GeneralDistribution requires mandatory disclosure of source code, model weights, and training datasets.
GeneralDistribution requires a prominent notice stating modifications made to the original version.
GeneralModified versions must be distributed under the original license (copyleft).
GeneralOriginal attributions or trademarks must be prominently displayed on Model Card or UI.
AI-SpecificDistributors must incorporate behavioral restriction clauses into downstream licenses.
AI-SpecificProhibits use in high-risk or unethical domains (military, medical misinformation, surveillance).
AI-SpecificUsage-based thresholds, e.g., LLaMA's 700 million monthly active user ceiling.
AI-SpecificDirect commercial use as a hosted service (e.g., API) is prohibited without authorization.
AI-SpecificProhibits using model outputs to train competing AI models (e.g., LLaMA 2/3).
AI-SpecificRestrictions on the ownership and usage rights of model-generated content.
AI-SpecificLISCAN.AI employs three complementary detection methods tailored to different license pair combinations.
Leverages the OSADL License Compatibility Matrix covering 113 mainstream SPDX licenses. Provides direct, deterministic compatibility determination for standard open-source license pairs.
Applies our attorney-validated LCA rules to determine compatibility between traditional SPDX licenses and AI-specific licenses (SR-AI or WR-AI), covering all cross-paradigm combinations.
Employs LLM-powered semantic analysis with structured prompting to detect conflicting interpretations across 12 identified terms (O1–O7, R1–R5). Also handles licenses with additional unstructured clauses.
Evaluated on a ground truth dataset of 188 license combinations with 88 incompatibility issues, compared against state-of-the-art baseline LIDETECTOR.
We reported 56 detected issues to Hugging Face developers and tracked their responses to validate LISCAN.AI's practical effectiveness.
This paper makes three primary contributions to the field of AI license compliance.
LISCAN.AI is the first LLM-powered framework for automated detection of license incompatibilities for AI artifacts. It combines LMD Graphs for derivation tracking, legally validated LCA rules, and semantic analysis to detect incompatibilities (92.9% Precision, 89.8% Recall).
Publicly released resources on liscanai.github.io: (1) Available LISCAN.AI tool at liscan-ai.com. (2) Ground Truth Dataset with 188 license combinations and 88 incompatibility issues.
Large-scale validation of 2,204,432 Hugging Face models reveals pervasive incompatibilities: 37.2% of main-component license pairs and 89.7% of derivative models show incompatibilities, exposing systemic risks in the AI ecosystem.