ML-Notes

Hardware Agnostic Code

This will run the model in the hardware that your system has aviable to it. Defaulting to CPU if neither Apple Silicon (mps), NVIDIA CUDA (cuda) or AMD Radeon Open Compute is found on the system.

device = "cuda" if torch.cuda.is_available() else "mps" if torch.backends.mps.is_available() else "hip" if torch.backends.hip.is_available() else "cpu"

Creating Valid Data

When creating sample data, you need at least a 2D tensor/matrix. Because machine learning models require a feature dimension. ie (n, 1) where n is some sample, and 1 is the corresponding feature.

As an example:

For a house: (Sample: n, Feature: 3)

Sample:
1. A specific house.

Features:

Size: 1500 square feet.
Bedrooms: 3.
Location Index: 2 (e.g., urban area).

This is usually done with the unsqueeze dim=1 property for a range. ie:

X = torch.arange(0, 1, 0.02).unsqueeze(dim=1)

The torch.arange creates a matrix of 50 samples, but no features - the unsqueeze at the first dimension adds 1 dimension to the tensor/matrix.

Setting the Algorithm

This particular problem uses the Linear Regression algorithm, which is rendered like this in code:

y = weight * X + bias

Creating Training/Testing Split

Normally, when training split the data 80/20. 80 for training, and 20 for testing.

Something like:

train_split = int(0.8 * len(X)) # this gets 80% of the current length of the dataset, and this needs to be an int

X_train, y_train = X[:train_split], y[:train_split] # : denotes the start of the index, up to 80%

X_test, y_test = X[train_split:], y[train_split:] # 80%: denotes at the end of the 80, to the end, which will be 20%

Creating/Inheriting Model class

When creating a model, you will need to import nn from torch, and in particular nn.Module.

Usually something like:

import torch

from torch import nn

You will have to subclass it, in a custom class, that uses the Module as a superclass.

class LinearRegressionModel(nn.Module): # nn.Module is the base class for all neural network modules in PyTorch, this is how it is inherited by the custom class

def __init__(self): # This is the constructor for the class, it is a way to initialize the class's attributes

super().__init__() # This is how we inherit from nn.Module, ensures the parent class’s constructor initializes properly.

self.weights = nn.Parameter(torch.randn(1,requires_grad=True,dtype=torch.float)) # creates a models parameter, using random

self.bias = nn.Parameter(torch.randn(1, requires_grad=True,dtype=torch.float)) # creates a models parameter, using random

def forward(self, x: torch.Tensor) -> torch.Tensor: # REQUIRED: Forward method is required for all nn.Module subclasses, it needs to overide the forward method in the nn.Module class

return self.weights * x + self.bias

Inside that you will need to initialize the the weights and biases, usually to random or zero, and set the forward loop. The forward loop is required.

After that is created, you will need to initialize the loss function, and the optimizer (and which paramars you are optimizing.)

Then, in the training loop, you will need to set the model to train mode, do a forward propagation, calculate the loss, set the gradient accumulation to zero, do the backward propagation, and then the step function.

Once this is done you can do a test, using model eval, and a forward pass on the test data, then calculate the loss, and see the results (on previously unseen data)

Loss Functions

For regression, you will want to use MAE nn.L1Loss(), or MSE nn.MSELoss().

For classification, you might want to use binary cross entropynn.BCELoss() or nn.BCEWithLogitsLoss() ((recommended), or categorical cross entropynn.CrossEntropyLoss()which can also be used for mutli-class classification.

BCELoss does not have the sigmoid activation function, while BCEWithLogitsLoss combines the sigmoid activation. However BCEWith...it is more stable, because it has one layer, instead of first doing BCELoss, then Sigmoid in a different layer.

Optimizer

Use Stochastic Gradient Descent (SDG) Optimizer for Classification, Regression, and others torch.optim.SDG()

Use Adam Optimizer for Classification, Regression and otherstorch.optim.Adam() but torch.optim.AdamW() is recommended.

Logits

Logits represent the unprocessed outputs directly from the model after a forward pass.

When using BCEWithLogitsLoss, the model will output raw logits (unnormalized values) we must convert them into -> prediction probabilities (sigmoid) -> and then into prediction labels (round)

To get this, run the forward pass, then torch.sigmoid(), then torch.round() or torch.round(torch.sigmoid()) in full. This is used for binary classification, as the outputs will be 0 or 1.

Our Model outputs are going to be raw logits.

We can convert these logits into prediction probabilities by passing them to some kind of activation function

(e.g. sigmoid for binary classification and softmax for multiclass classification).

Then we can convert our model's prediction probabilities to prediction labels by either rounding them or taking the torch.argmax(), which is used for multiclass classification, taking the index of the maximum probability.

Types of Learning and their Optimal Algorithms

Supervised Learning

1. Linear Regression

Optimizer: torch.optim.SGD (Stochastic Gradient Descent)
Loss Function: torch.nn.MSELoss (Mean Squared Error)

2. Logistic Regression

Optimizer: torch.optim.SGD or torch.optim.Adam
Loss Function:
- torch.nn.BCEWithLogitsLoss (Recommended for binary classification)
  - Internally applies a sigmoid before computing binary cross-entropy, offering better numerical stability.
- torch.nn.CrossEntropyLoss (for multi-class classification)

3. K-Nearest Neighbors (KNN)

Note: KNN is a lazy learning algorithm and does not involve training in the conventional sense. There’s no standard optimizer or loss function because the model “trains” by storing data points and performing distance comparisons at inference time.

4. Support Vector Machine (SVM)

Optimizer: torch.optim.SGD
Loss Function: torch.nn.HingeEmbeddingLoss (implements the hinge loss typical for SVMs)

5. Naive Bayes

Note: Naive Bayes is probabilistic, based on Bayes' theorem. It does not require a traditional optimizer or loss function; instead, it calculates class probabilities from training data distributions.

6. Decision Trees

Note: Decision trees use heuristic-based splitting criteria (e.g., Gini impurity, entropy) rather than explicit optimizers or loss functions.

7. Neural Networks

Optimizer:
- Common choices include torch.optim.Adam, torch.optim.AdamW or torch.optim.SGD.
Loss Function (task-dependent):
- Regression: torch.nn.MSELoss
- Binary Classification: torch.nn.BCEWithLogitsLoss
- Multi-class Classification: torch.nn.CrossEntropyLoss

Training Loop

“Timid Frogs Leap Gracefully Backwards Swiftly”

1.Timid - Set the model to train mode (model.train()).

2.Frogs - Perform the forward pass.

3.Leap - Compute the loss.

4.Gracefully - Zero the gradients (optimizer.zero_grad()).

5.Backwards - Execute backward propagation (loss.backward()).

6.Swiftly - Take an optimization step (optimizer.step()).

Anonymous

Search

ML-Notes

Namespaces

More

Page actions

Contents

Hardware Agnostic Code

Creating Valid Data

Setting the Algorithm

Creating Training/Testing Split

Creating/Inheriting Model class

Loss Functions

Optimizer

Logits

Types of Learning and their Optimal Algorithms

Supervised Learning

1. Linear Regression

2. Logistic Regression

3. K-Nearest Neighbors (KNN)

4. Support Vector Machine (SVM)

5. Naive Bayes

6. Decision Trees

7. Neural Networks

Training Loop

Navigation

Navigation

Wiki tools

Wiki tools

Anonymous

Search

ML-Notes

Hardware Agnostic Code

Creating Valid Data

Setting the Algorithm

Creating Training/Testing Split

Creating/Inheriting Model class

Loss Functions

Optimizer

Logits

Types of Learning and their Optimal Algorithms

Supervised Learning

1. Linear Regression

2. Logistic Regression

3. K-Nearest Neighbors (KNN)

4. Support Vector Machine (SVM)

5. Naive Bayes

6. Decision Trees

7. Neural Networks

Training Loop

Navigation

Wiki tools

Page tools