Today I want you to show how you can use the Amazon Machine Learning service to train (supervised learning ) a model that can categorize data (multiclass classification ).
Introduction to Machine Learning
Given you have a spreadsheet with data, one column is the outcome of your model (also called class or label) while all the other columns (also called features or attributes) are used by the model as input for prediction.
class
weight
height
human
75
180
cat
2.8
23
dog
5
50
The model is a function the takes a weightand a heightand outputs a class: (weight, height) => class. Supervised Machine Learning is about learning this function by training with a data set that you provide.
Iris flower data set example
In our case we want to predict the species of a flower called Iris by looking at four features. We will use the Iris flower data set which you can download to train our model.
The data set contains 50 records of 3 species of Iris:
Iris setosa
Iris versicolor
Iris virginica



Each records contains 4 features:
Sepal length
Sepal width
Petal length
Petal width
and each record has a species (class) assigned.
The data set is provided in CSV format and looks like this:
5.1 ,3.5 ,1.4 ,0.2 ,Iris-setosa 7.0 ,3.2 ,4.7 ,1.4 ,Iris-versicolor 6.3 ,3.3 ,6.0 ,2.5 ,Iris-virginica
The first 4 columns are the features while the 5th column is the class. In the end we want a model that predict a class out of the 4 features. So if you discover an Iris in nature you can predict the species by putting the 4 features into the model.
So let’s get started.
WARNING This example is not covered by the free tier. See the pricing page for more details. I spent 0.64 USD in this experiment.
Upload data set to S3
I assume that you have aws-cli installed and configured. You need to create a S3 bucket and upload the CSV file. Make sure to replace $YourName with your name or something that makes your bucket name unique.
$ aws --region us-east-1 s3 mb s3 ://$YourName-iris-data $ wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data $ aws --region us-east-1 s3 cp iris.data s3 ://$YourName-iris-data /iris.data
Machine Learning
We are going to create three things with the Machine Learning service:
Datasource: This links to the S3 bucket and defines the schema of our data
Model: That’s the actual model that is generated
Evaluation: We also test how accurate our model ist
Datasource
Open the Machine Learning Console . Make sure that you are in the us-east-1 (N. Virgina) region.
Click the Get started button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step1@730w.webp 730w, /images/2016/01/step1@730w2x.webp 1460w, /images/2016/01/step1@610w.webp 610w, /images/2016/01/step1@610w2x.webp 1220w, /images/2016/01/step1@450w.webp 450w, /images/2016/01/step1@450w2x.webp 900w, /images/2016/01/step1@330w.webp 330w, /images/2016/01/step1@330w2x.webp 660w, /images/2016/01/step1@545w.webp 545w, /images/2016/01/step1@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click Launch button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step2@730w.webp 730w, /images/2016/01/step2@730w2x.webp 1460w, /images/2016/01/step2@610w.webp 610w, /images/2016/01/step2@610w2x.webp 1220w, /images/2016/01/step2@450w.webp 450w, /images/2016/01/step2@450w2x.webp 900w, /images/2016/01/step2@330w.webp 330w, /images/2016/01/step2@330w2x.webp 660w, /images/2016/01/step2@545w.webp 545w, /images/2016/01/step2@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Define the location of your data set and Verify the data set.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step3@730w.webp 730w, /images/2016/01/step3@730w2x.webp 1460w, /images/2016/01/step3@610w.webp 610w, /images/2016/01/step3@610w2x.webp 1220w, /images/2016/01/step3@450w.webp 450w, /images/2016/01/step3@450w2x.webp 900w, /images/2016/01/step3@330w.webp 330w, /images/2016/01/step3@330w2x.webp 660w, /images/2016/01/step3@545w.webp 545w, /images/2016/01/step3@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click the Continue button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step4@730w.webp 730w, /images/2016/01/step4@730w2x.webp 1460w, /images/2016/01/step4@610w.webp 610w, /images/2016/01/step4@610w2x.webp 1220w, /images/2016/01/step4@450w.webp 450w, /images/2016/01/step4@450w2x.webp 900w, /images/2016/01/step4@330w.webp 330w, /images/2016/01/step4@330w2x.webp 660w, /images/2016/01/step4@545w.webp 545w, /images/2016/01/step4@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
The service is smart enough the create the schema of our data automatically. Just Continue .
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step5@730w.webp 730w, /images/2016/01/step5@730w2x.webp 1460w, /images/2016/01/step5@610w.webp 610w, /images/2016/01/step5@610w2x.webp 1220w, /images/2016/01/step5@450w.webp 450w, /images/2016/01/step5@450w2x.webp 900w, /images/2016/01/step5@330w.webp 330w, /images/2016/01/step5@330w2x.webp 660w, /images/2016/01/step5@545w.webp 545w, /images/2016/01/step5@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Now you need to define the column that is the prediction target (class). Select the last column and click Continue .
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step6@730w.webp 730w, /images/2016/01/step6@730w2x.webp 1460w, /images/2016/01/step6@610w.webp 610w, /images/2016/01/step6@610w2x.webp 1220w, /images/2016/01/step6@450w.webp 450w, /images/2016/01/step6@450w2x.webp 900w, /images/2016/01/step6@330w.webp 330w, /images/2016/01/step6@330w2x.webp 660w, /images/2016/01/step6@545w.webp 545w, /images/2016/01/step6@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click the Review button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step7@730w.webp 730w, /images/2016/01/step7@730w2x.webp 1460w, /images/2016/01/step7@610w.webp 610w, /images/2016/01/step7@610w2x.webp 1220w, /images/2016/01/step7@450w.webp 450w, /images/2016/01/step7@450w2x.webp 900w, /images/2016/01/step7@330w.webp 330w, /images/2016/01/step7@330w2x.webp 660w, /images/2016/01/step7@545w.webp 545w, /images/2016/01/step7@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click the Continue button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step8@730w.webp 730w, /images/2016/01/step8@730w2x.webp 1460w, /images/2016/01/step8@610w.webp 610w, /images/2016/01/step8@610w2x.webp 1220w, /images/2016/01/step8@450w.webp 450w, /images/2016/01/step8@450w2x.webp 900w, /images/2016/01/step8@330w.webp 330w, /images/2016/01/step8@330w2x.webp 660w, /images/2016/01/step8@545w.webp 545w, /images/2016/01/step8@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Model
The service is recognizes that it deals with a multi-class prediction problem. We only need to make one adjustment, so please select Custom Training and evaluation settings an click the Continue button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step9@730w.webp 730w, /images/2016/01/step9@730w2x.webp 1460w, /images/2016/01/step9@610w.webp 610w, /images/2016/01/step9@610w2x.webp 1220w, /images/2016/01/step9@450w.webp 450w, /images/2016/01/step9@450w2x.webp 900w, /images/2016/01/step9@330w.webp 330w, /images/2016/01/step9@330w2x.webp 660w, /images/2016/01/step9@545w.webp 545w, /images/2016/01/step9@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click the Continue button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step10@730w.webp 730w, /images/2016/01/step10@730w2x.webp 1460w, /images/2016/01/step10@610w.webp 610w, /images/2016/01/step10@610w2x.webp 1220w, /images/2016/01/step10@450w.webp 450w, /images/2016/01/step10@450w2x.webp 900w, /images/2016/01/step10@330w.webp 330w, /images/2016/01/step10@330w2x.webp 660w, /images/2016/01/step10@545w.webp 545w, /images/2016/01/step10@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click the Continue button.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step11@730w.webp 730w, /images/2016/01/step11@730w2x.webp 1460w, /images/2016/01/step11@610w.webp 610w, /images/2016/01/step11@610w2x.webp 1220w, /images/2016/01/step11@450w.webp 450w, /images/2016/01/step11@450w2x.webp 900w, /images/2016/01/step11@330w.webp 330w, /images/2016/01/step11@330w2x.webp 660w, /images/2016/01/step11@545w.webp 545w, /images/2016/01/step11@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Evaluation
Now we tell the service that it should randomly split our data set into a training dat set (70% of the data) and a validation data set (30% of the data). The idea is that the training data set is used to train the model while the validation data set is used to determine the accuracy of the model. So the accuracy is calculated with data that the model has never seen before.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step12-1@730w.webp 730w, /images/2016/01/step12-1@730w2x.webp 1460w, /images/2016/01/step12-1@610w.webp 610w, /images/2016/01/step12-1@610w2x.webp 1220w, /images/2016/01/step12-1@450w.webp 450w, /images/2016/01/step12-1@450w2x.webp 900w, /images/2016/01/step12-1@330w.webp 330w, /images/2016/01/step12-1@330w2x.webp 660w, /images/2016/01/step12-1@545w.webp 545w, /images/2016/01/step12-1@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Click the Finish button to start the model training process.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step13@730w.webp 730w, /images/2016/01/step13@730w2x.webp 1460w, /images/2016/01/step13@610w.webp 610w, /images/2016/01/step13@610w2x.webp 1220w, /images/2016/01/step13@450w.webp 450w, /images/2016/01/step13@450w2x.webp 900w, /images/2016/01/step13@330w.webp 330w, /images/2016/01/step13@330w2x.webp 660w, /images/2016/01/step13@545w.webp 545w, /images/2016/01/step13@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Now you need to wait a few minutes until your model is ready.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step14@730w.webp 730w, /images/2016/01/step14@730w2x.webp 1460w, /images/2016/01/step14@610w.webp 610w, /images/2016/01/step14@610w2x.webp 1220w, /images/2016/01/step14@450w.webp 450w, /images/2016/01/step14@450w2x.webp 900w, /images/2016/01/step14@330w.webp 330w, /images/2016/01/step14@330w2x.webp 660w, /images/2016/01/step14@545w.webp 545w, /images/2016/01/step14@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Now it’s time to check the accuracy of the model. Click Evaluation: ML model: Iris flow data set and then Explorer performance on the left and you will get a matrix that shows you how well your model works.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step15@730w.webp 730w, /images/2016/01/step15@730w2x.webp 1460w, /images/2016/01/step15@610w.webp 610w, /images/2016/01/step15@610w2x.webp 1220w, /images/2016/01/step15@450w.webp 450w, /images/2016/01/step15@450w2x.webp 900w, /images/2016/01/step15@330w.webp 330w, /images/2016/01/step15@330w2x.webp 660w, /images/2016/01/step15@545w.webp 545w, /images/2016/01/step15@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
My model has an overall accuracy of 86%. If you like you can explore in depth what mistakes your model made by looking at the result matrix.
WARNING Depending on the randomization of the data it is possible that you get different results than me!
Why is the model not 100% accurate? Simplified explanation: We are either training with to less data (the model has not seen enough real-world data) or not all relevant features are in our data set to really distinguish the species of Iris.
Now we need to predict something. Open the Try real-time predictions link on the left, enter the four values and click the Create prediction button. After that you should see the prediction result an the right. In my case the mode is over 99% confident that the right class is Iris-virginica.
[wpcc-element _tag=”source” type=”image/webp” srcset=”/images/2016/01/step16@730w.webp 730w, /images/2016/01/step16@730w2x.webp 1460w, /images/2016/01/step16@610w.webp 610w, /images/2016/01/step16@610w2x.webp 1220w, /images/2016/01/step16@450w.webp 450w, /images/2016/01/step16@450w2x.webp 900w, /images/2016/01/step16@330w.webp 330w, /images/2016/01/step16@330w2x.webp 660w, /images/2016/01/step16@545w.webp 545w, /images/2016/01/step16@545w2x.webp 1090w” sizes=”(min-width: 1200px) 730px, (min-width: 992px) 610px, (min-width: 768px) 450px, (min-width: 576px) 330px, 545px” _close=”0″]
Cleanup
Make sure to delete your Evaluation, Model and the three Datasources. Don’t forget to delete your S3 bucket.