If correlation doesn’t imply causation, then what does? As a data scientist, it is often quite frustrating to work with correlation and not be able to draw conclusive causality. The best way to confidently obtain causality is, usually, through randomized experiments, such as the ones we saw in Chapter 8, Advanced Statistics. One would have […]
Introducing ML – How to Tell if Your Toaster is Learning – Machine Learning Essentials
Introducing ML In Chapter 1, Data Science Terminology, we defined ML as giving computers the ability to learn from data without being given explicit rules by a programmer. This definition still holds true. ML is concerned with the ability to ascertain certain patterns (signals) out of data, even if the data has inherent errors in […]
Example – heart attack prediction – How to Tell if Your Toaster is Learning – Machine Learning Essentials
Example – heart attack prediction Suppose we wish to predict whether someone will have a heart attack within a year. To predict this, we are given that person’s cholesterol level, blood pressure, height, smoking habits, and perhaps more. From this data, we must ascertain the likelihood of a heart attack. Suppose, to make this prediction, […]
ML paradigms – pros and cons 2 – How to Tell if Your Toaster is Learning – Machine Learning Essentials
Our data can be seen in Figure 10.8: Figure 10.8 – The first five rows (the head) of our bike-share data We can see that every row represents a single hour of bike usage. In this case, we are interested in predicting the count value, which represents the total number of bikes rented in the […]
ML paradigms – pros and cons – How to Tell if Your Toaster is Learning – Machine Learning Essentials
ML paradigms – pros and cons As we now know, ML can be broadly classified into three categories, each with its own set of advantages and disadvantages. SML This method leverages the relationships between input predictors and the output response variable to predict future data observations. The advantages of it are as follows: Let’s see […]
Correlation versus causation – How to Tell if Your Toaster is Learning – Machine Learning Essentials
Correlation versus causation In the context of linear regression, coefficients represent the strength and direction of the relationship between the predictor variables and the response variable. However, this statistical relationship should not be confused with causation. The coefficient B1, with a value of 9.17 in our previous code snippet, indicates the average change in the […]
Regression metrics – How to Tell if Your Toaster is Learning – Machine Learning Essentials
Regression metrics There are usually three main metrics when using regression ML models. They are as follows: • Mean Absolute Error (MAE): This is the average of the absolute errors between the predicted values and the actual values. It’s calculated by taking the sum of the absolute values of the errors (the differences between the […]
Performing naïve Bayes classification – Predictions Don’t Grow on Trees, or Do They?
Performing naïve Bayes classification Let’s get right into it! Let’s begin with naïve Bayes classification. This ML model relies heavily on results from previous chapters, specifically with Bayes’ theorem: Let’s look a little closer at the specific features of this formula: Naïve Bayes classification is a classification model, and therefore a supervised model. Given this, […]
Classification metrics 4 – Predictions Don’t Grow on Trees, or Do They?
We will use sklearn’s built-in accuracy and confusion matrix to look at how well our naïve Bayes models are performing: # compare predictions to true labels from sklearn import metricsprint metrics.accuracy_score(y_test, preds) print metrics.confusion_matrix(y_test, preds) The output is as follows: accuracy == 0.988513998564confusion matrix ==[[12035][11174]] First off, our accuracy is great! Compared to our null […]
Classification metrics – Predictions Don’t Grow on Trees, or Do They?
Classification metrics When evaluating classification models, different metrics are used compared to regression models. These metrics help to understand how well the model is performing, especially in terms of correctly predicting different classes. Let’s look at what they are: 2. Precision (best for binary classification – with only two classes): Also known as positive predictive […]