Descriptive |
Predictive |
Prescriptive |
It provides insights into the past to answer “what has happened” |
Understands the future to answer “what could happen” |
Suggest various courses of action to answer “what should you do” |
Uses data aggregation and data mining techniques |
Uses statistical models and forecasting techniques |
Uses simulation algorithms and optimization techniques to advise possible outcomes |
Example : An ice cream company can analyze how much ice cream was sold, which flavors were sold, and whether more or less ice cream was sold than the day before |
Example : An ice cream company can analyze how much ice cream was sold, which flavors were sold, and whether more or less ice cream was sold than the day before |
Example : Lower prices to increase the sale of ice creams, produce more/fewer quantities of a specific flavor of ice cream |
Data Mining | Data Profiling |
---|---|
It involves analyzing a pre-built database to identify patterns. | It involves analyses of raw data from existing datasets. |
It also analyzes existing databases and large datasets to convert raw data into useful information. | In this, statistical or informative summaries of the data are collected. |
It usually involves finding hidden patterns and seeking out new, useful, and non-trivial data to generate useful information. | It usually involves the evaluation of data sets to ensure consistency, uniqueness, and logic. |
Data mining is incapable of identifying inaccurate or incorrect data values. | In data profiling, erroneous data is identified during the initial stage of analysis. |
Classification, regression, clustering, summarization, estimation, and description are some primary data mining tasks that are needed to be performed. | This process involves using discoveries and analytical methods to gather statistics or summaries about the data. |
1.5*IQR (interquartile range)
, that is, if it lies above the top quartile (Q3) or below the bottom quartile (Q1).± (3*standard deviation)
.
n-gram
is a method used to identify the next item in a sequence, usually words or speech. N-grams uses a probabilistic model that accepts contiguous sequences of items as input. These items can be syllables, words, phonemes, and so on. It then uses that input to predict future items in the sequence.
WHERE |
HAVING |
WHERE clause operates on row data. | The HAVING clause operates on aggregated data. |
In the WHERE clause, the filter occurs before any groupings are made. |
HAVING is used to filter values from a group. |
Aggregate functions cannot be used. | Aggregate functions can be used. |
SELECT column1, column2, ...
FROM table_name
WHERE condition;?
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);?
SQL
is a query within another query. It is also known as a nested query or an inner query. Subqueries are used to enhance the data to be queried by the main query. SELECT name, email, phone
FROM employee
WHERE emp_id IN (
SELECT emp_id
FROM employee
WHERE city = 'Texas');?