In my last post, I have explained very basic information for Machine learning and I also explained the development life cycle for a Machine learning project.
In this post, I will explain some frequent issues during the Machine Learning development and how you can overcome using Azure Machine Learning along with some basic Data cleansing task using Azure Machine Learning.
One of the biggest problems with Machine learning development:
In the Machine learning workflow, there is, sometimes, friction in the hand over between Data scientist and Operations.
The models which are developed are often recorded which causes translation errors:
Thus Data scientist loses visibility in the model performance due to that.
How Azure Machine learning can solve this big problem?
With Azure Machine learning, the workflow is dramatically enhanced because it enables the operations engineer to encapsulate the model instead of recording that, which reduces the noise in the system:
Additionally, Azure Machine learning provides the capability to make the experiments more efficient by reducing the time to prepare the data as well as well as by simplifying the experimentation valuation
What are the components of Azure Machine learning?
It is an Azure service which consists of libraries like Microsoft ML Spark libraries and tools like Azure Workbench and these work together with the IDEs like Visual Studio Code, PyCharm, Jupyter etc and third-party libraries like TensorFlow, TLC, CNTK etc.
You can train as well as deploy using Docker on Azure Compute such as HD Insight, VMs, GPUs, Azure container services as well as IOT devices.
Below is the complete picture of the things I explained above:
Apart from this, Azure Machine Learning can help Data scientists as below:
- You can reuse some existing Python, R scripts
- Easy configuration for modeling and deployment
- Easy to use graphical interface
- No need to setup anything, it is ready to start and no more computing resource limitations
- Azure marketplace to utilize existing models or publish/monetize your new models
- Built-in Algorithms:
Can it help developers as well?
Yes, it does:
- Very helpful existing ML APIs which you can use
- Can easily use ML models in day to day applications
- It brings prediction capabilities to the masses and available to non-experts
- Predictive models can be used to interpret the huge data that would result from the Internet Of Things(IOT)
How to get into Azure Machine Learning?
To get started with Studio, go to https://studio.azureml.net. If you’ve signed into Machine Learning Studio before, click Sign In. Otherwise, click Sign up here and choose between free and paid options.
Now let us take a quick example of Data cleansing of some large data in Azure Machine learning.
For example, our task is to find whether the image has snow leopard or not. There are a bunch of images and from these images, we are required to find which are those images which have snow leopard in it.
One of the examples of those images is as below:
To load the data:
Create a new experiment by clicking +NEW at the bottom of the Machine Learning Studio window, select EXPERIMENT, and then select Blank Experiment:
Now, we have the bunch of images metadata which contains the long Image path along with a couple of timestamps, we will load them into our Azure:
As you can see in above image, all those images have some unique image number. So our first task is to take out those unique numbers from those long path.
With Azure Machine Learning, it can be done by just a few clicks.
We will use Derive column by Example feature for this.
Now just give the image number for only a couple of images and the system will learn and will perform the task for rest of the path on its own. It is one type of Supervised learning:
Once the process is done:
As you can see, it learned to take out the image number from rest of the images even though some of the images have parenthesis.
Now let us see another example of cleansing the data task.
In the above metadata, the images path are categorized into mainly 2 folders, one with otherImages and another with snowLeopardImages:
So we will give 0 to otherImages and 1 to snowLeopardImages with use of Derived by example again:
It may require giving 0 and 1 more than once(around 5 times max) because by giving more examples, Machine is learning that it should put 0 against all otherImages and 1 against all snowLeopardImages
Once the process is over, we can see the count of 0 and 1 by clicking on Value count as shown below:
So below window shows, we have 2864 images without Snow Leopard and around 800 images with Snow Leopard Images:
By very few clicks, we can do data cleaning tasks.
Microsoft research has many pre-existing libraries but we can use other open source and third party libraries.
In next post, we will see how we can integrate Python code into Azure Machine Learning to improve the accuracy and the deployment of the same Snow Leopard model.
Hope it helps.