This short article is completely Q & A based on Data Mining in SSAS(SQL Server Analysis Services)where we will discuss fundamentals and Time Series Algorithms. So below are the questions and its proposed answer which will help reader to improve fundamentals of Data Mining in SSAS.
What is Data Mining?
- Data Mining is nothing but some kind of logic or algorithm running on genuine historical data to bring out some kind of prediction.
- So Data Mining means mining or hunting from the genuine historical data using some algorithms to do prediction, forecasting, etc. in an intelligent way.
Explain the term “training an algorithm”?
- The term “training an algorithm” means that whatever algorithm we have with us is dead or not useful. Now to make that algorithm alive or useful that algorithm has to be trained.
- Training an algorithm means that algorithm is running on a genuine historical data and as soon as it runs on the genuine data the algorithm starts working and puts some mind into it. By running on the data it will try to figure out what will be the future prediction of the data.
- In short we say that to train an algorithm we have to make that algorithm to run on genuine data and this is how the algorithm gets trained.
- Now trained means we are artificially putting some intelligence on a dead algorithm so that it can think in terms of that data to do forecasting, predictions, etc.
In what scenarios would we use Time Series Algorithm?
- The Time Series Algorithm is used in scenarios where we want to do forecasting like about the annual sales, annual sales profit and various other forecasting.
What type of project will we select for Data Mining?
- For Data Mining we will select our regular Multidimensional project as we used to select the project when we were doing SSAS.
For doing Data Mining Cube is compulsory?
- Now usually it is not compulsory to have a Cube or create a Cube for doing Data Mining. But professionally in the industry for doing Data Mining Cube is necessary.
- Because data coming from the Cube will be faster, so as a best industry practice Cube is necessary for Data Mining but it is not compulsory because Data Mining can also be done without Cube.
Explain Sequence Clustering Algorithm?
- Sequence Clustering Algorithm is a combination of sequence analysis and clustering.
- It identifies clusters of similarly ordered events in a sequence and these clusters can be used to predict events based on their characteristics.
- For example there is a shop and some people will call first, then inquire and then go to the shop to purchase, some people see the advertisements and then go to the shop for purchasing & some people go directly to the shop for purchasing. So this algorithm will see which event of purchasing from the shop is most used and do analysis.
Explain Continuous & Key Time Data Types?
- When we are configuring our mining structure there are two data types named Continuous & Key Time.
- So Continuous means if we have Sales Amount so continuous data type will be like 100, 100.10, 100.20, 101, 101.5 etc.
- And Key Time Type means suppose if we have Sales Year then the Key time data type will be like Saturday, Sunday ,Monday, Tuesday, etc. It is kind of a discreet data type.
What is Model in Data Mining?
- Model is nothing but it is a thought process which will make us understand that how that thing will look like in real world or when it executes. For example if we want to build a car first we will build a model of a car.
- Now in SSAS we have an algorithm and this algorithm is dead by itself so when this logic runs through the historical data it becomes alive or we can say get trained and have the thought process. And this thought process it uses to predict.
- So Model means our Algorithm + our Historical Data. So basically it tells us how our algorithm has been trained and does the prediction in an intelligent way.
- So Model = Algorithm getting trained using Historical data.
What is the query language for Data Mining?
- The query language for Data Mining is called as DMX query language.
Explain Deviation in Time Series?
- Deviation in Time Series means at what point the algorithm starting changing the trend or it started making movements.
Our algorithm is not showing the proper trend what can be the reason?
- The reason for algorithm not showing the proper trend is due to the bad data or we can say not so genuine historical data.
Where does data mining model gets deployed?
- The data mining model gets deployed in our analysis services in our SQL Server Management Studio.
Can we deploy data mining in tabular?
- No, we cannot deploy our data mining in tabular.
Explain Predict Function in DMX?
- Predict function in DMX is nothing but a kind of function used to do prediction on particular column or table using DMX queries.
- Also this predict function says that what column reference to give and what are the number of items we want on which we want to do prediction. We can also specify the start and the end value so that the prediction can happen in between them.
If you are now feeling bored with huge dose of theory below is the practical SSIS project video which will refresh you and go energetic: –