Loading [MathJax]/jax/output/PreviewHTML/jax.js
Skip to main content
Beta
This lesson is in the beta phase, which means that it is ready for teaching by instructors outside of the original author team.
Introduction to Machine Learning in Python
- Machine learning borrows heavily from fields such as statistics and
computer science.
- In machine learning, models learn rules from data.
- In supervised learning, the target in our training data is
labelled.
- A.I. has become a synonym for machine learning.
- A.G.I. is the loftier goal of achieving human-like
intelligence.
- Data pre-processing is arguably the most important task in machine
learning.
- SQL is the tool that we use to extract data from database
systems.
- Data is typically partitioned into training and test sets.
- Setting random states helps to promote reproducibility.
- Loss functions allow us to define a good model.
-
is a known target. \hat{y} (y
hat) is a prediction.
- Mean squared error is an example of a loss function.
- After defining a loss function, we search for the optimal solution
in a process known as ‘training’.
- Optimisation is at the heart of machine learning.
- Linear regression is a popular model for regression tasks.
- Logistic regression is a popular model for classification
tasks.
- Decision trees are a useful and easily interpretable alternative for
classification tasks.
- Probabilities that can be mapped to a prediction class.
- Validation sets are used during model development, allowing models
to be tested prior to testing on a held-out set.
- Cross-validation is a resampling technique that creates multiple
validation sets.
- Cross-validation can help to avoid overfitting.
- Confusion matrices are the basis for many popular performance
metrics.
- AUROC is the area under the receiver operating characteristic. 0.5
is bad!
- TP is True Positive, meaning that our prediction hit its
target.
- Bootstrapping is a resampling technique, sometimes confused with
cross-validation.
- Bootstrapping allows us to generate a distribution of estimates,
rather than a single point estimate.
- Bootstrapping allows us to estimate uncertainty, allowing
computation of confidence intervals.
- Leakage occurs when training data is contaminated with information
that is not available at prediction time.
- Leakage leads to over-optimistic expectations of performance.