π―Market Background
Last updated
Last updated
We observe that the data collection and preparation process is a massive and expensive task. Companies must invest significant manpower in manually annotating and cleansing data to ensure data quality. For datasets that require particularly fine-grained annotations, the costs are even higher. This involves direct labour costs, time costs, and potential issues with data quality, making data acquisition a major pain point in AI projects. As models become more complex and demand for high-quality data increases, this problem becomes more prominent, necessitating a solution to reduce overall training costs.
Web3 solutions centred around decentralized networks are emerging as a new narrative in AI data and training. By combining decentralized features with a Crypto-based incentive system, Web3 holds the potential to significantly reduce data acquisition and training costs, as well as trust costs, while enhancing efficiency and inclusiveness.
However, Web3 data infrastructure currently lacks a practical and efficient data governance framework.
Nevertheless, a practical and efficient data governance infrastructure is crucial for configuring the relevant rights and interests of all participants in the data ecosystem:
- Data providers must have the rights to informed consent and the freedom to access, copy, transfer, and dispose of data.
- Data processors require the power to autonomously control, use, and derive profits from the data.
- Data derivatives need to have operational rights.
The rise of ChatGPT and GPT-4 has highlighted the potential of artificial intelligence (AI). However, the massive data and training behind AI algorithms bring about significant challenges, particularly in terms of cost and compliance.
High Costs of Data Collection and Preparation
Manual Data Annotation and Cleaning:
Companies invest significant manpower in manually annotating and cleaning data to ensure quality.
Fine-grained annotations are particularly costly, involving direct labor, time, and potential quality issues.
Pain Points in Data Acquisition:
Data acquisition is a major challenge in AI projects.
As models become more complex and demand higher quality data, the costs and difficulties increase.
Privacy Breaches:
Last year, Microsoftβs AI research team accidentally exposed a large amount of data, including sensitive user information.
Although the incident didnβt escalate, it prompted many tech companies to reevaluate AI data security.
Compliance Challenges:
The Italian Data Protection Authority accused ChatGPT of illegally collecting user data, violating GDPR.
OpenAI admitted that approximately 1.2% of ChatGPT Plus users' data may have been exposed during a temporary service interruption.
Web3 solutions, based on decentralized networks and crypto-based incentives, offer promising approaches to reduce data acquisition and training costs while enhancing efficiency and trust.
Lack of Efficient Data Governance Framework:
A practical and efficient data governance infrastructure is crucial for configuring the rights and interests of all participants in the data ecosystem.
Rights and Interests of Participants:
Data Providers: Rights to informed consent, and the freedom to access, copy, transfer, and dispose of data.
Data Processors: Rights to autonomously control, use, and derive profits from the data.
Data Derivatives: Operational rights.
Cassava is building the first Web3 ecosystem for AI data governance, aiming to fundamentally address the pain points faced in Web2 data training and reshape the field of Web3 data governance.