- AI BEST SEARCH
- AI Glossary & Keyword Index [AI BEST SEARCH]
- Self-Supervised Learning
Self-Supervised Learning
Self-supervised learning is a machine learning approach in which the model automatically generates pseudo-labels from unlabeled data and uses them to train itself. Compared to traditional supervised learning, it can build high-performance models without requiring large amounts of manually labeled data—a key advantage. Generally, machine learning requires human-created labels (ground-truth data). In self-supervised learning, the model automatically generates training signals from the structure and features of the data itself. This enables "reducing the cost of label creation" and "making use of large amounts of data" simultaneously—a next-generation learning paradigm that has attracted significant attention. Examples of how it works: • Hiding part of the input data and asking the model to predict the missing portion (e.g., filling in a missing region of an image) • Shuffling the order of a time series and asking the model to restore the original order • Masking parts of text and inferring them from context (e.g., masked language modeling used in BERT) Representative application areas of self-supervised learning: • Natural language processing (NLP): Large language models such as BERT and GPT are pre-trained using self-supervision • Image recognition: Techniques such as SimCLR, MoCo, and BYOL enable high-accuracy feature extraction • Audio processing: Systems such as wav2vec 2.0 learn features automatically from speech data Advantages: • Can leverage unlabeled data, dramatically reducing data collection costs • Effective as representation learning, making it easy to transfer to downstream tasks • Enables high-performance pre-training that deeply captures data structure and context Self-supervised learning sits between supervised and unsupervised learning. It has become an indispensable training technology in recent AI models—especially foundation models and LLMs. Going forward, expanded applications to multimodal learning across vision, language, and audio are expected.