Machine learning with AWS
A CTO's notes from implementing machine learning using AWS ML services.
If you've ever attended an ML conference, or seen an ML session, you'll likely hear the statement:
"If you're not doing Machine Learning now, you'll be left behind".
This is especially true if you are in the business of delivering unique insights and data to your customers that allow them to take informed and decisive actions based on multitudes of data at scale.
At K-Safe, we use machine learning in order to understand the reasons why incidents happen across a wide variety of activities such as cycling, running or even riding an electric scooter, with the goal of offering the most accurate incident detection in the industry and possible prevention strategies too.
A typical workflow for machine learning includes the following steps:
- Data preparation
- Build your ML model
- Train and tune your model
- Deploy and manage your model
One of the biggest pieces of advice I can share is to ensure you spend most of your time on step 1 - possibly up to 80% of the project time, as ensuring you have clean, accurate data is the single most important priority for any ML project.
"Shit data IN = Shit data OUT"
There are numerous tools for ensuring you are formatting, labelling and validating your data correctly, perhaps I'll even dedicate an entire future post to it. I find AWS Glue particularly good for processing the formatting the data as required into a staging S3 bucket, ready for building and training the ML model.
Once you have your data, and in the case of AWS you'll most likely host this data in an S3 bucket, you'll want to fire up AWS Sagemaker.
Sagemaker takes steps 2 & 3 and automates them with a feature called Autopilot. It's great for teams that are resource contrained or looking to get to market as fast as possible. Sagemaker will try multiple different algorithms and report back on the success and confidence that it has on each ones accuracy. It refers to these runs as experiements.
One you are happy that AWS Sagemaker has determined an accurate model, it's time to put it to the test.
Finally, step 4 - deploying and managing your model is continuous. You'll want to ensure that your model keeps providing the correct information and doesn't start to drift or gain bias away from the original intention. Just look at what happened to Microsoft's "Tay" chatbot (https://www.huffingtonpost.co.uk/entry/microsoft-tay-racist-tweets_n_56f3e678e4b04c4c37615502).
So, that's it for this introductory post about ML in AWS - a high-level whistlestop tour of the steps and service mappings to get you started.