Introduction to ML Models

ai/ml

machine learning (ml)

Learn about the basics of Machine Learning models and why TileDB is the right fit for ML workflows.

Machine Learning (ML) models are algorithms or mathematical representations that enable computers to learn from and make predictions or decisions based on data. These models identify patterns or structures in the data and generalize from them to perform tasks such as classification, regression, clustering, and more.

Types of ML models

You can broadly categorize ML models based on the type of learning they use:

Supervised learning models
Unsupervised learning models
Semi-supervised learning models
Reinforcement learning models

Supervised learning models

Supervised learning models train on labeled data, meaning that each training example is paired with an output label.

Examples

Linear regression: Predicts continuous values (for example, predicting house prices).
Logistic regression: Predicts probabilities and classifies data into binary categories (for example, spam detection).
Decision trees: Splits data into subsets based on feature values (for example, customer segmentation).
Support vector Machines (SVM): Finds the hyperplane that best separates different classes (for example, image classification).
Neural networks: Multi-layered models capable of capturing complex patterns (for example, image and speech recognition).

Unsupervised learning models

Unsupervised learning models train on unlabeled data. The goal of unsupervised learning models is to find hidden patterns or intrinsic structures in the data.

Examples

\(k\)-means clustering: Groups data points into a predefined number of clusters based on similarity (for example, customer segmentation).
Principal component analysis (PCA): Reduces the dimensionality of data while preserving most of the variance (for example, data visualization).
Hierarchical clustering: Builds a hierarchy of clusters (for example, gene expression analysis).

Semi-supervised learning models

Semi-supervised learning models use both labeled and unlabeled data for training. Typically, a small amount of labeled data and a large amount of unlabeled data.

Examples

Self-training: Uses its own predictions to label the unlabeled data iteratively.
Co-training: Uses multiple classifiers and exploits different views of the data.

Reinforcement learning models

Reinforcement learning models learn by interacting with an environment, receiving rewards or penalties based on actions taken.

Examples

\(Q\)-learning: Learns the value of actions in states to maximize cumulative reward (for example, game playing).
Deep \(Q\)-networks (DQN): Combines \(Q\)-learning with deep neural networks for complex environments (for example, robotic control).

Summary

ML models form the core of ML applications, varying from straightforward linear models to intricate neural networks. Each model type is tailored to specific problems and data characteristics. Choosing the correct model requires assessing factors such as data nature, task type, target accuracy, and computational efficiency. By learning from data, these models enable automation and intelligent decision-making in several fields, including healthcare, finance, imaging, geosciences, marketing, and autonomous systems.