Overfitting in Machine Learning: Machine Learning Questions & Answers Part – II


I have started the series of Machine Learning Questions and Answers, you can find the first post here.

Let us see some more questions and answers.

  1. What is OverFitting in Machine Learning? OR Is OverFitting good or bad in Machine Learning?

Answer – 

Overfitting is not good for Machine learning projects.

As the name suggests, it is nothing but trying to fit something over than required.

One liner for Overfitting:

When you remembered but have not learned something, thus you are not prepared for the future


  • So Overfitting occurs When we capture low-level details in a particular data set but we fail to capture the higher level, more abstract details of that data set. Thus it creates the problem for the future examples
  • In other words, Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data
  • And as per Wikipedia –  Overfitting is the production of an analysis that corresponds too closely or exactly to a particular set of data, and may, therefore, fail to fit additional data or predict future observations reliably.

Real life example of Overfitting:

  1. You started reading a chemistry book
  2. You read every single low-level detail like every single letter and digit
  3. You read the whole book from start to end
  4. But what if you can not think about the bigger picture like relating different topics with each other, how one topic is related to another topic etc
  5. Thus you have remembered the book but have not learned it
  6. So if someone comes and asks you something from that book, it is not sure whether you can answer all of them

Let us see some examples:

You have some data and you put it into the X and Y axis as below:


If you try to separate the X and 0 something like below:


Though it looks valid, it looks so bad and may create a problem in future. This may happen when you take data too literally. So in above example, this may work perfectly for current data but what if some new data is added, it will not work perfectly.

When you can just take something simple as below:


By this, if you add some more data then also the distance of 0 would be lesser compared to the line which we drew above.

In short, prepare for future instead of giving too much time to the present.

Let us take some visual examples:

For example, you want to create a program which can identify the lion.

In this case, we are overfitting when we collect very basic and complex data like height and width of the lion some other very minor and deep details as below:


So the program will perfectly identify a lion which comes with little bit similar details as mentioned above, but this would become useless incase of identifying some new examples like a White Lion.


The program would fail to identify White Lion because it was trained with some unnecessary deep details instead of some abstract and useful details.

2) How can we avoid Overfitting?

Answer – 

We can avoid Overfitting by splitting the data 3 way:


  • Training set
  • Cross-Validation
  • Test set

This way we are assuring that the model is not dependant on any particular set

Apart from this, you can:

  • Keep it Simple
  • Feature selection: consider using fewer feature combinations and decrease the number of numeric attributes bins
  • Increase the amount of regularization used
  • Increase the amount of training data examples
  • Increase the number of passes on the existing training data

Hope it helps



Classification and Clustering : Machine Learning Questions & Answers Part – I


I have recently started posting related to Machine Learning and I got some very positive feedback from people because they are liking the way I explain Machine Learning related topics in simple words.

As per the demand, I am starting the series of Machine Learning Questions and Answers.

I will keep on posting the questions along with the answers here as soon as I get to know it.

So let us start:

1) What are the types of Machine Learning and what is the difference between Supervised Machine Learning and UnSupervised Machine learning?

Answer – For this, I have already written a post which you can find here.

2) What is the difference between Classification and Clustering?

Answer –

One liner for Classification:

Classifying data into pre-defined categories

One liner for Clustering:

Grouping data into a set of categories

Key difference:

Classification is taking data and putting it into pre-defined categories and in Clustering the set of categories, that you want to group the data into, is not known beforehand.

Let us go a bit deeper into Classification first:


  • In classification, you would start with one instance(one object) to be classified
  • You would classify it into pre-defined categories which are nothing but the labels
  • Do this based on the training data which has already been classified
  • For Example – In sentiment analysis, you would classify one comment as positive or negative and you would do this based on the set of training data which are already been classified into positive and negative comments
  • So if you understood Supervised Machine Learning then you would realize that Classification is nothing but the Supervised Machine Learning

Simple understanding with an example:

You give your algorithm(your friend) some data(Set of People), called as Training data, and made him learn which data corresponds to which label(Male or Female). Then you point your algorithm to certain data, called as Test data, and ask it to determine whether it is Male or Female. The better your teaching is, the better it’s prediction.

Some real-life examples:

  • If the e-mail is a spam or not
  • Is the comment on a Facebook post or a Tweet on Twitter is positive or negative
  • If the trading day is an Up-day or a down-day
  • Handwritten Digit Recognition
  • Speech Recognition
  • Image Recognition

Example of Classification Algorithm:

  • K – Nearest neighbor
  • Decision Trees
  • Bayesian Classifier

Steps of Classification Setup:


  1. The problem has to be defined first
  2. Then you would represent your data in the form of Numerical attributes called Features. This is done both for the training data which has already been classified and the test data which has to be classified in the future
  3. You would take your training data and feed it into a classification algorithm to train a model
  4. Take new instance that needs to be classified or take the test data and pass it to classifier to classify

Now let us see something more about Clustering:


  • Instead of taking single Instance(As the case of Classification above), we are taking large number of instances
  • We divide these number of instances into the groups
  • So as we had pre-defined categories in Classification, in clustering the groups are unknown beforehand
  • Basically, we do not know before the clusters are formed, what to call those clusters because we would not know until then, what would be the common categories inside these clusters
  • Yes, it is UnSupervised Machine Learning

Simple understanding with an example:

In Clustering, you provide the data(Set of people) to the algorithm(your friend) and ask it to group the data.

Now, it’s up to algorithm to decide what’s the best way to the group is? (Gender, Color or age group).

Again, you can definitely influence the decision made by the algorithm by providing extra inputs.

Some Real-life examples:

  • How we can divide set of articles such that those articles have the same theme(we do not know the theme of the articles ahead of time)
  • Identifying groups of houses according to their house type, value and geographical location
  • Earthquake epicenters to identify dangerous zones
  • Putting telephone towers in a new city using clustering such that all its user receives optimum single strength

Example of Clustering Algorithm:

  • K- Means Algorithm
  • Expectation maximization

Steps of Clustering Setup:


  1. You would start with the problem statement, which is the database which needs to be clustered
  2. Then you would represent points in that dataset using features.
  3. No training step here
  4. You would directly feed the data into Clustering algorithm to find the final clusters, without any training steps

3) Can Classifier and Clustering go hand an hand OR Can Classifier and Clustering work together?

Answer – Yes they can.

For example, you have set of articles -> you divide these articles into the clusters based on the tags  -> The Articles are grouped based on the tags

Now you have an article -> Article is sent to Classifier and Classifier will assign one of the tags from the tags that are discovered during Clustering above – > Tag is identified

So basically, the articles which are grouped based on the tags into different clusters are becoming the training data for the Classifier.


  • Classification assigns the category to 1 new item, based on already labeled items while Clustering takes a bunch of unlabeled items and divide them into the categories
  • In Classification, the categories\groups to be divided are known beforehand while in Clustering, the categories\groups to be divided are unknown beforehand
  • In Classification, there are 2 phases – Training phase and then the test phase while in Clustering, there is only 1 phase – dividing of training data in clusters
  • Classification is Supervised Learning while Clustering is UnSupervised Learning.

Hope it helps.




Linear Regression in simple words: Machine Learning Algorithm part I


I have written a post in which I explained Machine Learning in simple words. You can find the post here.

Heart of the Machine Learning is the bunch of Algorithms. Algorithm plays very important role in creating the models. Nowadays different languages like Python, R and different tools like Azure, AWS has made our life so easy if we want to create Machine Learning projects.

But one should understand the algorithm instead of just using pre-existing libraries which above languages have already created.

In this article, I will try to explain Linear Regression.

Let us go back to our school days. You might remember syntax:

Y = MX + B


  • B is the intercept,
  • M is the Slope, can be positive or negative
  • X is Independent variable
  • Y is dependent variable

So if you have X, you can figure out what Y is.

In simple words “Linear Regression” is a method to predict dependent variable (Y) based on values of independent variables (X). 

Situation 1:

If X increases and Y also increases, it is called positive relation:


Situation B:

If X increases but Y decreases, it is called negative relation:


Let us see how to create a regression line:

To conduct regression, we require different observation. We can put those observations between X and Y:


Once all the observations are placed correctly, we can create the line which will fit all those observation dots and this is called the Regression line:


As we know, all the observation would never be in the straight line, there is always a difference between estimated value and an actual value. In the end, we are required to minimize the difference between estimated value and an actual value. We will call this difference as errors:


We would target to minimize these errors and above line has many errors when we compare actual with estimated values:


Let us take some examples to understand the Positive relationship and Negative relationship.

For example, if we study more our grades would increase:


It is a positive relationship where:

  • y is estimated grades
  • x is study time
  • b0, we can derive mathematically and it is the y interceptor
  • b1, this can also be derived mathematically, it is the slope

In simple words, if your study time is 0, grades are 10% and if you increase study time by 10 hours, grades would be greater than 10% and grades can be counted using above syntax.

Though there might be lot other features which affect the grades, for simplicity, we have only considered study time as the only feature. This is also called Univariate Linear Regression.

Multiple Linear Regression

We can increase our feature set by selecting more parameters like IQ of the person, the interest in the subject etc. For example, we can plot the grades against the interest of the person in particular subject and study time on a single graph, where the vertical axis plots grades and the two horizontal axes the interest in the subject and study time:

In this case, we can again fit a predictor to the data. But instead of drawing a line through the data we have to draw a plane through the data because the function that best predicts the grades is a function of two variables.

Now, let us see the Negative relationship:

If you spend more time on Facebook, grades would decrease:


As you can see above, if x increases, it would be multiplied by -b1(slope). Thus y decreases.

X is a dependent variable which we can manipulate, control, change and Y is a dependent variable which is nothing but the outcome of X’s activity.

Let us see some calculation of Study example:


Hope it helps.


First look of Azure Machine Learning : Azure Machine Learning part II


In my last post, I have explained very basic information for Machine learning and I also explained the development life cycle for a Machine learning project.

In this post, I will explain some frequent issues during the Machine Learning development and how you can overcome using Azure Machine Learning along with some basic Data cleansing task using Azure Machine Learning.

One of the biggest problems with Machine learning development:

In the Machine learning workflow, there is, sometimes, friction in the hand over between Data scientist and Operations.

The models which are developed are often recorded which causes translation errors:


Thus Data scientist loses visibility in the model performance due to that.

How Azure Machine learning can solve this big problem?

With Azure Machine learning, the workflow is dramatically enhanced because it enables the operations engineer to encapsulate the model instead of recording that, which reduces the noise in the system:


Additionally, Azure Machine learning provides the capability to make the experiments more efficient by reducing the time to prepare the data as well as well as by simplifying the experimentation valuation

What are the components of Azure Machine learning?

It is an Azure service which consists of libraries like Microsoft ML Spark libraries and tools like Azure Workbench and these work together with the IDEs like Visual Studio Code, PyCharm, Jupyter etc and third-party libraries like TensorFlow, TLC, CNTK etc.

You can train as well as deploy using Docker on Azure Compute such as HD Insight, VMs, GPUs, Azure container services as well as IOT devices.

Below is the complete picture of the things I explained above:


Apart from this, Azure Machine Learning can help Data scientists as below:

  • You can reuse some existing Python, R scripts
  • Easy configuration for modeling and deployment
  • Easy to use graphical interface
  • No need to setup anything, it is ready to start and no more computing resource limitations
  • Azure marketplace to utilize existing models or publish/monetize your new models
  • Built-in Algorithms:


Can it help developers as well?

Yes, it does:

  • Very helpful existing ML APIs which you can use
  • Can easily use ML models in day to day applications
  • It brings prediction capabilities to the masses and available to non-experts
  • Predictive models can be used to interpret the huge data that would result from the Internet Of Things(IOT)

How to get into Azure Machine Learning?

To get started with Studio, go to https://studio.azureml.net. If you’ve signed into Machine Learning Studio before, click Sign In. Otherwise, click Sign up here and choose between free and paid options.

Sign in to Machine Learning Studio
Now let us take a quick example of Data cleansing of some large data in Azure Machine learning.

For example, our task is to find whether the image has snow leopard or not. There are a bunch of images and from these images, we are required to find which are those images which have snow leopard in it.

One of the examples of those images is as below:


To load the data:

Create a new experiment by clicking +NEW at the bottom of the Machine Learning Studio window, select EXPERIMENT, and then select Blank Experiment:


Now, we have the bunch of images metadata which contains the long Image path along with a couple of timestamps, we will load them into our Azure:


As you can see in above image, all those images have some unique image number. So our first task is to take out those unique numbers from those long path.

With Azure Machine Learning, it can be done by just a few clicks.

We will use Derive column by Example feature for this.


Now just give the image number for only a couple of images and the system will learn and will perform the task for rest of the path on its own. It is one type of Supervised learning:


Once the process is done:


As you can see, it learned to take out the image number from rest of the images even though some of the images have parenthesis.

Now let us see another example of cleansing the data task.

In the above metadata, the images path are categorized into mainly 2 folders, one with otherImages and another with snowLeopardImages:


So we will give 0 to otherImages and 1 to snowLeopardImages with use of Derived by example again:



It may require giving 0 and 1 more than once(around 5 times max) because by giving more examples, Machine is learning that it should put 0 against all otherImages and 1 against all snowLeopardImages

Once the process is over, we can see the count of 0 and 1 by clicking on Value count as shown below:


So below window shows, we have 2864 images without Snow Leopard and around 800 images with Snow Leopard Images:


By very few clicks, we can do data cleaning tasks.

Microsoft research has many pre-existing libraries but we can use other open source and third party libraries.

In next post, we will see how we can integrate Python code into Azure Machine Learning to improve the accuracy and the deployment of the same Snow Leopard model.

Hope it helps.




Machine Learning in simple words: Azure Machine Learning part I


Nowadays Machine learning is a very hot topic, everyone is talking about Machine learning and discussing how it can be useful in their business or in his or her career.

Machine Learning in simple words

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that machines should be able to learn and adapt through experience.

Types of Machine Learning

Supervised Learning
  • When there is pre-defined dataset to train your program
  • Based on its training data the program can make accurate decisions when given new data
  • So it is like learning with the teacher
  • It is like Classification and regression
  • For example you receive bunch of flowers with labels and your program can indention the flowers on basis of the labeling
Unsupervised Learning
  • When there is no teacher to train
  • When your program is smart enough to automatically find patterns and relationships in the database which is without labeling.
  • In this learning, you didn’t use any past/prior knowledge about people and classified them “on-the-go”
  • It is like clustering and association
  • For example, you receive flowers without labeling so the program needs the algorithm to identify the flowers
Reinforcement learning
  • It is just like hit and trial kind of learning
  • The program learns from their own experience.
  • A software program that performs a defined task optimally and learns by trial and error through the experience.

Where it is being used

Many big industries have already started implementing Machine learning for their business.

For example, I recently participated in a well-known bank hackathon where the themes of the hackathon were mainly on Machine learning and AI.

One of the examples is, Mobile Check Deposits – Take a picture of your filled cheque and upload it to your account. No need to physically visit the bank and wait for the cheque to be deposited in your account. It saves time and easier to use. Also can be used for fraud detection.

This is just one example, but there are many other examples:

  • Self-driving cars
  • Fraud preventions techniques
  • Air traffic controls
  • Uber uses Machine learning to make Uber more powerful
  • Social networks like Facebook uses machine learning, for example when you upload an image it automatically suggests whom you should tag in the picture
  • Pinterest can recommend similar pins from the image you uploaded
  • Snapchat introduced facial filters, called Lenses. These filters track facial movements, allowing users to add animated effects or digital masks that adjust when their faces moved
  • Online shopping, the suggestion comes from the user’s previous interest
  • Smart personal assistance like Alexa, Cortana, Siri and lot more

At this moment, there are many sensors and other things which are collecting the data which they will use for their Machine Learning projects.

What’s required to create good machine learning systems?

  • Data preparation capabilities
  • Algorithms – basic and advanced
  • Automation and iterative processes
  • Scalability
  • Ensemble modeling
  • Easy and frequent deployments

Machine learning project Lifecycle

It basically contains 3 teams working together:


First Data scientist acquires and transforms the data building a deep understanding which allows them to build a model:


Once the model is chosen, Operational Engineer deploys it and setups monitoring and management in the production environment:


And programmatic access to this deployed model are embedded in code by the Developers converting them into the API which can be accessed from outer world:


These APIs can be accessed from the outer world.

For example, Microsoft Cognitive services have an open Vision API. Have a look here if you require more information on this.

In my next post, I will explain some frequent issues during the Machine Learning development and how you can overcome using Azure Machine Learning. (* Update – The post is here)

Hope it helps.

ErrorCode = ‘0x80004005 : 80008083: .Net Core 2.0 + IIS exception

We all know that .Net Core has been announced so people started implementing with .Net core.

One exception which people are getting while frequently while deploying .Net Core application(created by Visual Studio 2017) on IIS is as below:

Application ‘<IIS path>’ with physical root ‘<Application path>’ failed to start process with commandline ‘”dotnet” .\MyApp.dll’, ErrorCode = ‘0x80004005 : 80008083.

Reason of the exception?

This exception comes when runtime required is not deployed on the server and web application was lately moved to Visual Studio 2017. Because VS2017 RC is shipped with the new version of .NET Core SDK and your server has some other .net version than Core.

Meaning of error code:

  • 0x80008083 – code for version conflict.
  • 0x80004005 – file is missing or cannot be accessed

So the error means a different version of dotnet needs to be installed on the server.


.Net Core 1.0 needs to be installed on the server.

Steps are as below:

  • Stop IIS
  • Install .Net core 1.0 on the server which you can find here.
  • Start IIS again

Above error will not come after this.


I have explained for .Net Core 1.0 but it depends on the version. For example, if you are deploying .Net Core 2.0 then need to install the sdk accordingly.

Hope it helps.