# SharePoint Orange

Data Science Hadoop HDFS HBase Hive Pig Chukwa MapReduce EC2 SharePoint Spark Storm Kafka Docker

## Tuesday, June 6, 2017

## Thursday, March 30, 2017

## Monday, February 27, 2017

## Friday, March 18, 2016

## Saturday, January 2, 2016

### Algorithm : Types, Classification and Definition

- Simple recursive algorithms
- Backtracking algorithms
- Divide and conquer algorithms
- Dynamic programming algorithms
- Greedy algorithms
- Branch and bound algorithms
- Brute force algorithms
- Randomized algorithms

**Complexity of Algorithms :**

- Constant - O(1)
- Logarithmic - O(log(N))
- Linear - O(N)
- Quadratic - O(N*N)
- Cubic - O(N*N*N)
- ...
- Exponential - O(N!) or O(2^N) or O(N^K) or many others.

## Tuesday, December 29, 2015

### Road to Machine Learning

- You start with a dataset to analyse. - Purchase / Social / Medical / Travel
- Many variable are typically collected. - Categorical / Continuous / Geo
- Majority of them can be irrelevant and cause noise.
- Data Mining is Statistics at Scale and Speed.
- Applications in Intelligence / Genetics / Natural Sc. / Bussiness.
- Data Mining has origin with Categorical data whereas Statistics deals with Continuous data.
- Large model overfits the training dataset and may lead to higher prediction error with new situations.
- Consider if predictor variable would be available and relationship holds in future data.
- Cluster analysis is example for
**Unsupervised learning** - Dimension Reduction
- Association Rules
- Classification is example of
**Supervised learning** - Regression, Regression Trees, Nearest Neighbour - Continuous response.
- Logistic Regression, Classification Trees, Nearest Neighbour, Discriminant analysis and Naive Bayes methods are well suited for Categorical response.
**Data Mining**should be viewed as a process :- Data Storage & PreProcessing
- Identify variables for investigation
- Screen the outliers and missing values from data
- Data need to be partitioned for
,*training*and*test*set.*evaluation* - Use
for Large datasets.*Sampling* - Visualize your data - Line, Bar, Scatter, Box, Histogram, Map, Geo
- Summary of data - Mean, Median, Mode, Standard Deviation, Correlation, Principal Components
- Apply appropriate model - Linear, Logistic, Trees, K-means ...
- Verify finding against
set.*evaluation* - Get the insights, Apply the findings! Plan - do - check - act !!

https://www.linkedin.com/in/alokawi (

*Data Engineer, Analytics Engineer, Data Science*)

## Friday, October 16, 2015

### Correlation, Regression and Causation ?

**What is a Correlation in statistics?**

Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. For example, height and weight are related; taller people tend to be heavier than shorter people. The relationship isn't perfect.

**What is a Regression in statistics?**

In statistics, linear regression is an approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.

**What is a Causation in statistics?**

Subscribe to:
Posts (Atom)