What precision and recall are?

What precision and recall are?

 

After the predictive model has been finished, the most important question is: How good is it? Does it predict well?

Evaluating the model is one of the most important tasks in the data science project,  it indicates how good predictions are. Very often for classification problems we look at metrics called precision and recall, to define them in detail let’s quickly introduce confusion matrix first.

Confusion Matrix for binary classification is made of four simple ratios:

  • True Negative(TN): case was true negative and predicted negative
  • True Positive(TP): case was true positive and predicted positive
  • False Negative(FN): case was true positive but predicted negative
  • False Positive(FP): case was true negative but predicted positive

 

https://machinelearningtutorials.com

 

Understanding the confusion matrix, calculating precision and recall is easy.

 

Precision – is the ratio of correctly predicted positive observations to the total predicted positive observations, or what percent of positive predictions were correct?

Precision = TP/TP+FP

 

Recall – also called sensitivity, is the ratio of correctly predicted positive observations to all observations in actual class – yes, or what percent of the positive cases did you catch?

Recall = TP/TP+FN

 

There are also two more useful matrices coming from confusion matrix,  Accuracy – correctly predicted observation to the total observations and F1 score the weighted average of Precision and Recall. Although intuitively it is not as easy to understand as accuracy, the F1 score is usually more useful than accuracy, especially if you have an uneven class distribution.

Example Python Code to get Precision and Recall:

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_recall_fscore_support as score

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)

precision, recall, fscore, support = score(y_test, preds)

print('precision:',precision)
print('recall:',recall)

Was the above useful? Please share with others on social media.

If you want to look for more information, check some free online courses available at   coursera.orgedx.org or udemy.com.

Recommended reading list:

 

Data Science from Scratch: First Principles with Python

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.

If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.

Get a crash course in Python
Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science
Collect, explore, clean, munge, and manipulate data
Dive into the fundamentals of machine learning
Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering
Explore recommender systems, natural language processing, network analysis, MapReduce, and databases
Practical Statistics for Data Scientists: 50 Essential Concepts

Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you’ll learn:

Why exploratory data analysis is a key preliminary step in data science
How random sampling can reduce bias and yield a higher quality dataset, even with big data
How the principles of experimental design yield definitive answers to questions
How to use regression to estimate outcomes and detect anomalies
Key classification techniques for predicting which categories a record belongs to
Statistical machine learning methods that “learn” from data
Unsupervised learning methods for extracting meaning from unlabeled data
Doing Data Science: Straight Talk from the Frontline

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know.

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

Topics include:

Statistical inference, exploratory data analysis, and the data science process
Algorithms
Spam filters, Naive Bayes, and data wrangling
Logistic regression
Financial modeling
Recommendation engines and causality
Data visualization
Social networks and data journalism
Data engineering, MapReduce, Pregel, and Hadoop
The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists

The Data Science Handbook contains interviews with 25 of the world s best data scientists. We sat down with them, had in-depth conversations about their careers, personal stories, perspectives on data science and life advice. In The Data Science Handbook, you will find war stories from DJ Patil, US Chief Data Officer and one of the founders of the field. You ll learn industry veterans such as Kevin Novak and Riley Newman, who head the data science teams at Uber and Airbnb respectively. You ll also read about rising data scientists such as Clare Corthell, who crafted her own open source data science masters program. This book is perfect for aspiring or current data scientists to learn from the best. It s a reference book packed full of strategies, suggestions and recipes to launch and grow your own data science career.
Introduction to Machine Learning with Python: A Guide for Data Scientists

Machine learning has become an integral part of many commercial applications and research projects, but this field is not exclusive to large companies with extensive research teams. If you use Python, even as a beginner, this book will teach you practical ways to build your own machine learning solutions. With all the data available today, machine learning applications are limited only by your imagination.

You’ll learn the steps necessary to create a successful machine-learning application with Python and the scikit-learn library. Authors Andreas Müller and Sarah Guido focus on the practical aspects of using machine learning algorithms, rather than the math behind them. Familiarity with the NumPy and matplotlib libraries will help you get even more from this book.

With this book, you’ll learn:

Fundamental concepts and applications of machine learning
Advantages and shortcomings of widely used machine learning algorithms
How to represent data processed by machine learning, including which data aspects to focus on
Advanced methods for model evaluation and parameter tuning
The concept of pipelines for chaining models and encapsulating your workflow
Methods for working with text data, including text-specific processing techniques
Suggestions for improving your machine learning and data science skills

Data Science News – handpicked articles, news, and stories from Data Science world.

Data Science News – handpicked articles, news, and stories from Data Science world.

NEWS

 

Experts Predict When Artificial Intelligence Will Exceed Human Performance – Artificial intelligence is changing the world and doing it at breakneck speed. The promise is that intelligent machines will be able to do every task better and more cheaply than humans. Rightly or wrongly, one industry after another is falling under its spell, even though few have benefited significantly so far.

 

This app uses artificial intelligence to turn design mockups into source code – While traditionally it has been the task of front-end developers to transform the work of designers from raw graphical user interface mockups to the actual source code, this trend might soon be a thing of the past – courtesy of artificial intelligence.

 

Applying deep learning to real-world problems – The rise of artificial intelligence in recent years is grounded in the success of deep learning. Three major drivers caused the breakthrough of (deep) neural networks: the availability of huge amounts of training data, powerful computational infrastructure, and advances in academia.

 

How AI Can Keep Accelerating After Moore’s Law – Google CEO Sundar Pichai was obviously excited when he spoke to developers about a blockbuster result from his machine-learning lab earlier this month. Researchers had figured out how to automate some of the work of crafting machine-learning software, something that could make it much easier to deploy the technology in new situations and industries.

 

Bayesian GAN – Generative adversarial networks (GANs) can implicitly learn rich distributions over images, audio, and data which are hard to model with an explicit likelihood. We present a practical Bayesian formulation for unsupervised and semi-supervised learning with GANs. Within this framework, we use stochastic gradient Hamiltonian Monte Carlo to marginalize the weights of the generator and discriminator networks.

 

The $1700 great Deep Learning box. – After years of using a thin client in the form of increasingly thinner MacBooks, I had gotten used to it. So when I got into Deep Learning (DL), I went straight for the brand new at the time Amazon P2 cloud servers. No upfront cost, the ability to train many models simultaneously and the general coolness of having a machine learning model out there slowly teaching itself.

 

Exploring LSTMs – The first time I learned about LSTMs, my eyes glazed over. Not in a good, jelly donut kind of way. It turns out LSTMs are a fairly simple extension to neural networks, and they’re behind a lot of the amazing achievements deep learning has made in the past few years. So I’ll try to present them as intuitively as possible – in such a way that you could have discovered them yourself.

 

An Algorithm Summarizes Lengthy Text Surprisingly Well – ho has time to read every article they see shared on Twitter or Facebook, or every document that’s relevant to their job? As information overload grows ever worse, computers may become our only hope for handling a growing deluge of documents. And it may become routine to rely on a machine to analyze and paraphrase articles, research papers, and other text for you.

 

Divide and Conquer: How Microsoft researchers used AI to master Ms. Pac-Man – Microsoft researchers have created an artificial intelligence-based system that learned how to get the maximum score on the addictive 1980s video game Ms. Pac-Man, using a divide-and-conquer method that could have broad implications for teaching AI agents to do complex tasks that augment human capabilities.

 

Open Source Datasets – A large-scale, high-quality dataset of URL links to approximately 300,000 video clips that cover 400 human action classes, including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 400 video clips. Each clip is human annotated with a single action class and lasts around 10s.

 

One Model To Learn Them All – Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. 

 

THE POWER OF GTC – GTC is the largest and most important event of the year for GPU developers. GTC and the global GTC event series offer valuable training and a showcase of the most vital work in the computing industry today – including artificial intelligence and deep learning, healthcare, virtual reality, accelerated analytics, and self-driving cars.

 

 Artificial intelligence can now predict suicide with remarkable accuracy – When someone commits suicide, their family and friends can be left with the heartbreaking and answerless question of what they could have done differently. Colin Walsh, data scientist at Vanderbilt University Medical Center, hopes his work in predicting suicide risk will give people the opportunity to ask “what can I do?” while there’s still a chance to intervene.

BOOKS

Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor

The Sentient Machine: The Coming Age of Artificial Intelligence

Out of Remote Control (The DATA Set Book 7)

COURSES

 

Learning Path: TensorFlow: The Road to TensorFlow – Discover deep learning and machine learning with Python and TensorFlow

Python for Data Structures, Algorithms, and Interviews! – Get a kick start on your career and ace your coding interviews!

Python for Data Science by UC San DiegoX – Learn to use powerful, open-source, Python tools, including Pandas, Git and Matplotlib, to manipulate, analyze, and visualize complex datasets.

High-Dimensional Data Analysis by HarvardX  – A focus on several techniques that are widely used in the analysis of high-dimensional data.

A developer’s guide to the Internet of Things (IoT) – The Internet of Things (IoT) is an area of rapid growth and opportunity. Technical innovations in networks, sensors and applications, coupled with the advent of ‘smart machines’ have resulted in a huge diversity of devices generating all kinds of structured and unstructured data that needs to be processed somewhere.

Neural Networks for Machine Learning –  Learn about artificial neural networks and how they’re being used for machine learning, as applied to speech and object recognition, image segmentation, modeling language and human motion, etc. We’ll emphasize both the basic algorithms and the practical tricks needed to get them to work well.

If you have found above useful, please don’t forget to share with others on social media.

Validate-test a predictive model.

 

Validate-test a predictive model.

Why evaluate/test model at all?

Evaluating the performance of a model is one of the most important stages in predictive modeling, it indicates how successful model has been for the dataset. It enables to tune parameters and in the end test the tuned model against a fresh cut of data.

Below we will look at few most common validation metrics used for predictive modeling. The choice of metrics influences how you weight the importance of different characteristics in the results and your ultimate choice of which machine learning algorithm to choose. Before we move on to a variety of metrics lets get basics right.

Golden rules for validating-testing a model.

Rule #1

Never use same data for training and testing!!!

Rule #2

Look at Rule #1

What it means is, always leave cut of data that are not included in training to test your model against after fitting/tuning is finished.

There is a very simple way to set data aside, using Scikit-Learn:

 

from sklearn import datasets
from sklearn.cross_validation import train_test_split

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

print('Full dataset, features:',len(X))
print('Full dataset, labels:',len(y))
print('Train dataset, features:',len(X_train))
print('Test dataset, features:',len(X_test))
print('Train dataset, labels:',len(y_train))
print('Test dataset, features:',len(y_test))

In some cases, it may seem like ‘losing’ part of the training set, especially when data sample is not large enough.There are ways around it as well, one of most popular is ‘Cross Validation’.

Cross Validation

It is simple ‘trick’ that splits data into n equal parts. Then successively hold out each part and fit the model using the rest. This gives n estimates of model performance that can be combined into an overall measure. Although very ‘heavy’ from computing point of view, very efficient and widely used method to avoid overfitting and improve ‘out of sample’ performance of the model.

Here is an example of Cross Validation using Scikit-Learn:

 

from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

num_instances = len(X_train)
kfold = cross_validation.KFold(len(X_train),n_folds=10, random_state=1)
model = LogisticRegression()
preds = cross_validation.cross_val_score(model, X_train, y_train, cv=kfold)
print(preds.mean(), preds.std())

Two main types of predictive models.

Talking about a predictive modeling,  there are two main types of problems to be solved:

  • Regression problems are those where you are trying to predict or explain one thing (dependent variable) using other things (independent variables) with continuous output eg exact price of a stock next day.
  • Classification problems try to determine group membership by deriving probabilities eg. will the stock price go up/down or will not change next day. Algorithms like SVM and KNN create a class output. Algorithms like Logistic Regression, Random Forest, Gradient Boosting, Adaboost etc. give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.

In regression problems, we do not have such inconsistencies in output. The output is always continuous in nature and requires no further treatment.

Techniques/metrics for model validation.

Having the dataset divided, and model fitted there is a question, what kind of quantifiable validation metrics to use. There are few very basic quick and dirty methods to check performance. One of them is value range – if model outputs are far outside of the response variable range, that would immediately indicate poor estimation or model inaccuracy.

Most often there is a need to use something more sophisticated and ‘scientific’.

Accuracy

Accuracy is a classification metric, it the number of correct predictions made as a ratio of all predictions. Probably it is the most common evaluation metric for classification problems.

Below is an example of calculating classification accuracy using Scikit-Learn.

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)
print(accuracy_score(preds,y_test))

Precision

Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. The question that this metric answer is of all passengers that labeled as survived, how many actually survived? High precision relates to the low false positive rate.

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import precision_score

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)
print(precision_score(preds,y_test, average=None))

Sensitivity or Recall

Recall (Sensitivity) – is the ratio of correctly predicted positive observations to all observations in actual class.

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import recall_score

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)
print(recall_score(preds,y_test, average=None))

F1 score

F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution.

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import f1_score

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)
print(f1_score(preds,y_test, average=None))

Confusion Matrix

It is a matrix of dimension N x N, where N is the number of classes being predicted. The confusion matrix is a presentation of the accuracy of a model with two or more classes. The table presents predictions on the x-axis and accuracy outcomes on the y-axis.

There are four possible options:

  • True positives (TP), which are the instances that are positives and are classified as positives.
  • False positives (FP), which are the instances that are negatives and are classified as positives.
  • False negatives (FN), which are the instances that are positives and are classified as negatives.
  • True negatives (TN), which are the instances that are negatives and are classified as negatives.

Below is an example code to compute confusion matrix, using ScikitLearn.

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)
print(confusion_matrix(preds,y_test))

Gain and Lift Chart.

Many times the measure of the overall effectiveness of the model is not enough. It may be important to know if the model does increasingly better with more data. Is there any marginal improvement in the model’s predictive ability if for example, we consider 70% of the data versus only 50%?

The lift charts represent the actual lift for each percentage of the population, which is defined as the ratio between the percentage of positive instances found by using the model and without using it.

These types of charts are common in business analytics of Direct Marketing where the problem is to identify if a particular prospect was worth calling.

Kolmogorov-Smirnov Chart.

This non-parametric statistical test is used to compare two distributions, to assess how close they are to each other. In this context, one of the distributions is the theoretical distribution that the observations are supposed to follow (usually a continuous distribution with one or two parameters, such as Gaussian), while the other distribution is the actual, empirical, parameter-free, discrete distribution computed on the observations.

KS is maximum difference between % cumulative Goods and Bads distribution across score/probability bands. The gains table typically has % cumulative Goods (or Event) and % Cumulative Bads (Or Non-event) across 10/20 score bands. Using gains table, we can find the KS for the model which has been used for creating gains table.

KS is point estimate, meaning it is only one value and indicate the score/probability band where separate between Goods (or Event) and Bads (or Non-event) is maximum.

Area Under the ROC curve (AUC – ROC)

The ROC curve is almost independent of the response rate. The Receiver Operating Characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier. The curve is created by plotting the true positive rate (TP) against the false positive rate (FP) at various threshold settings. This is one of the popular metrics used in the industry.  The biggest advantage of using ROC curve is that it is independent of the change in

The biggest advantage of ROC curve is that it is independent of the change in the proportion of responders.

An area under ROC Curve (or AUC) is a performance metric for binary classification problems. The AUC represents a model’s ability to discriminate between positive and negative classes. An area of 1.0 represents a model that made all predictions perfectly. An area of 0.5 represents a model as good as random.

The example below provides a demonstration of calculating AUC.

 

import numpy as np
from sklearn.metrics import roc_auc_score
y_true = np.array([0, 0, 1, 1])
y_scores = np.array([0.1, 0.4, 0.35, 0.8])
roc_auc_score(y_true, y_scores)

Gini Coefficient

Gini coefficient is used in classification problems. It can be derived from the AUC ROC number.

 

from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.cross_validation import train_test_split

def gini(list_of_values):
sorted_list = sorted(list(list_of_values))
height, area = 0, 0
for value in sorted_list:
height += value
area += height - value / 2.
fair_area = height * len(list_of_values) / 2
return (fair_area - area) / fair_area
 
def normalized_gini(y_pred, y):
normalized_gini = gini(y_pred)/gini(y)
return normalized_gini

data = datasets.load_iris()
X = data['data']
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()
model.fit(X_train,y_train)
preds = model.predict(X_test)
print(normalized_gini(preds,y_test))

Logarithmic Loss

Log Loss quantifies the accuracy of a classifier by penalizing false classifications. Minimizing the Log Loss is basically equivalent to maximizing the accuracy of the classifier.

In order to calculate Log Loss, the classifier must assign a probability to each class rather than simply give the most likely class.  A perfect classifier would have a Log Loss of precisely zero. Less ideal classifiers have progressively larger values of Log Loss.

Mean Absolute Error

The mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes.

The Mean Absolute Error (or MAE) is the sum of the absolute differences between predictions and actual values. The measure gives an idea of the magnitude of the error, but no idea of the direction.

The example below demonstrates simple example:

 

from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print(mean_absolute_error(y_true, y_pred))

Mean Squared Error

One of the most common measures used to quantify the performance of the model. It is an average of the squares of the difference between the actual observations and those predicted. The squaring of the errors tends to heavily weight statistical outliers, affecting the accuracy of the results.

Below a basic example of calculation using ScikitLearn.

 

from sklearn.metrics import mean_squared_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print(mean_squared_error(y_true, y_pred))

Root Mean Squared Error (RMSE)

RMSE is the most popular evaluation metric used in regression problems. It follows an assumption that error is unbiased and follow a normal distribution.

RMSE metric is given by:

rmse

where N is Total Number of Observations.

R^2 Metric

The R^2 (or R Squared) metric provides an indication of the goodness of fit of a set of predictions to the actual values. In statistical literature, this measure is called the coefficient of determination. This is a value between 0 and 1 for no-fit and perfect fit respectively.

Below a basic example of calculation using ScikitLearn.

 

from sklearn.metrics import r2_score 
y_true = [3, -0.5, 2, 7] 
y_pred = [2.5, 0.0, 2, 8] 
print(r2_score(y_true, y_pred))

If you think the above is useful please share it with others via social media.

The topic of model validation and testing is much more complicated then examples above if you are interested in it, please check the reading list below or look at available courses on udemy.com, coursera.org

 

           

Machine Learning with TensorFlow Intro

Machine Learning with TensorFlow Intro.

What is TensorFlow?

The shortest definition would be, TensorFlow is a general-purpose library for graph-based computation.

But there is a variety of other ways to define TensorFlow, for example, Rodolfo Bonnin in his book – Building Machine Learning Projects with TensorFlow brings up definition like this:

“TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) passed between them.”

To quote the TensorFlow website, TensorFlow is an “open source software library for numerical computation using data flow graphs”.  Name TensorFlow derives from the operations which neural networks perform on multidimensional data arrays, often referred to as ‘tensors’. It is using data flow graphs and is capable of building and training variety of different machine learning algorithms including deep neural networks, at the same time, it is general enough to be applicable in a wide variety of other domains as well. Flexible architecture allows deploying computation to one or more CPUs or GPU in a desktop, server, or mobile device with a single API.

TensorFlow is Google Brain’s second generation machine learning system, released as open source software in 2015. TensorFlow is available on 64-bit Linux, macOS, and mobile computing platforms including Android and iOS. TensorFlow provides a Python API, as well as C++, Haskell, Java and Go APIs. Google’s machine learning framework became lately ‘hottest’ in data science world, it is particularly useful for building deep learning systems for predictive models involving natural language processing, audio, and images.

 

What is ‘Graph’ or ‘Data Flow Graph’? What is TensorFlow Session?

 

Trying to define what TensorFlow is, it is hard to avoid using word ‘graph’, or ‘data flow graph’, so what is that? The shortest definition would be, TensorFlow Graph is a description of computations. Deep learning (neural networks with many layers) uses mostly very simple mathematical operations – just many of them, on high dimensional data structures(tensors). Neural networks can have thousands or even millions of weights. Computing them, by interpreting every step (Python) would take forever.

That’s why we create a graph made up of defined tensors and mathematical operations and even initial values for variables. Only after we’ve created this ‘recipe’ we can pass it to what TensorFlow calls a session. To compute anything, a graph must be launched in a Session. The session runs the graph using very efficient and optimized code. Not only that, but many of the operations, such as matrix multiplication, are ones that can be parallelised by supported GPU (Graphics Processing Unit) and the session will do that for you. Also, TensorFlow is built to be able to distribute the processing across multiple machines and/or GPUs.

TensorFlow programs are usually divided into a construction phase, that assembles a graph, and an execution phase that uses a session to execute operations in the graph. To do machine learning in TensorFlow, you want to create tensors, adding operations (that output other tensors), and then executing the computation (running the computational graph). In particular, it’s important to realize that when you add an operation on tensors, it doesn’t execute immediately. TensorFlow waits for you to define all the operations you want to perform and then optimizes the computation graph, ‘deciding’ how to execute the computation, before generating the data. Because of this, tensors in TensorFlow are not so much holding the data as a placeholder for holding the data, waiting for the data to arrive when a computation is executed.

 

Prerequisites:

 

NEURAL NETWORKS – basics.

Before we move on to create our first model in TensorFlow, we’ll need to get the basics right, talk a bit about the structure of a simple neural network.

A simple neural network has some input units where the input goes. It also has hidden units, so-called because from a user’s perspective they’re hidden. And there are output units, from which we get the results. Off to the side are also bias units, which are there to help control the values emitted from the hidden and output units. Connecting all of these units are a bunch of weights, which are just numbers, each of which is associated with two units. The way we train neural network is to assign values to all those weights. That’s what training a neural network does, find suitable values for those weights. One step in “running” the neural network is to multiply the value of each weight by the value of its input unit, and then to store the result in the associated unit.

There is plenty of resources available online to get more background on the neural networks architectures, few examples below:

 

MATHEMATICS

Deep learning uses very simple mathematical operations, it would be recommended to get/refresh at least basics of them.  I recommend starting from one of the following:

 

PYTHON

It would be advised to have basics Python programming before moving forward, few available resources:

 

Let’s do it… TensorFlow first example code.

 

To keep things simple let’s start with ‘Halo World’ example.

importing TensorFlow

 

import tensorflow as tf

Declaring constants/variables, TensorFlow constants can be declared using the tf.constant function, and variables with the tf.Variable function.  The first element in both is the value to be assigned the constant/variable when it is initialised.  TensorFlow will infer the type of the constant/variable initialised value, but it can also be set explicitly using the optional dtype argument. It’s important to note that, as the Python code runs through these commands, the variables haven’t actually been declared as they would have been if you just had a standard Python declaration.

x = tf.constant(2.0) 
y = tf.Variable(3.0)

Lets make our code compute something, simple multiplication.

z = y * x

Now comes the time when we would like to see the outcome, except nothing, has been computed yet… welcome to the TensorFlow. To make use of TensorFlow variables and perform calculations, Session must be created and all variables must be initialized. We can do it using the following statements.

sess = tf.Session()

init = tf.global_variables_initializer()

sess.run(init)

 

We have Session and even all constants/variables in place. Let’s see the outcome.

print("z = y * x = ", sess.run(z))

 

If you see something like this:
‘z = y * x = 6.0’
Congratulations, you have just coded you first TensorFlow ‘model’.

Below whole code in one piece:

 

import tensorflow as tf
x = tf.constant(2.0)
y = tf.Variable(3.0)
z = y * x
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print("z = y * x = ", sess.run(z))

This tutorial, of course, will not end up like this and will be continued soon… in next part, we will code our first neural network in TensorFlow.

 

If You liked this post please share it on your social media, if you have any questions or comments please make use of contact form.

 

Recommended reading list below:

Data Science Update – News

Data Science News Digest – handpicked articles, news, and stories from Data Science world.

 

NEWS

 

  • CUDA 9 Features Revealed  – At the GPU Technology Conference, NVIDIA announced CUDA 9, the latest version of CUDA’s powerful parallel computing platform and programming model.

 

 

 

  • AlphaGo’s next move – Chinese Go Grandmaster and world number one Ke Jie departed from his typical style of play and opened with a “3:3 point” strategy – a highly unusual approach aimed at quickly claiming corner territory at the start of the game.

 

  • Integrate Your Amazon Lex Bot with Any Messaging Service – Is your Amazon Lex chatbot ready to talk to the world? When it is, chances are that you’ll want it to be able to interact with as many users as possible. Amazon Lex offers built-in integration with Facebook, Slack and Twilio. But what if you want to connect to a messaging service that isn’t supported? Well, there’s an API for that–the Amazon Lex API.

 

  • How Our Company Learned to Make Better Predictions About Everything – In Silicon Valley, everyone makes bets. Founders bet years of their lives on finding product-market fit, investors bet billions on the future value of ambitious startups, and executives bet that their strategies will increase a company’s prospects. Here, predicting the future is not a theoretical superpower, it’s part of the job.

 

  • Are Pop Lyrics Getting More Repetitive? – In 1977, the great computer scientist Donald Knuth published a paper called The Complexity of Songs, which is basically one long joke about the repetitive lyrics of newfangled music (example quote: “the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced”).

 

  • Home advantages and wanderlust – When Burnley got beat 3-1 by Everton at Goodison Park on the 15th April, 33 games into their Premier League season, they’d gained only 4 points out of a possible 51 in their away fixtures. But during this time they’d also managed to accrue 32 points out of a possible 48 at Turf Moor; if the league table were based upon only home fixtures, they’d be in a highly impressive 6th place.

 

 

 

  • The Simple, Economic Value of Artificial Intelligence – How does this framing now apply to our emerging AI revolution?  After decades of promise and hype, AI seems to have finally arrived, – driven by the explosive growth of big data,  inexpensive computing power and storage, and advanced algorithms like machine learning that enable us to analyze and extract insights from all that data. 

    BOOKS

Immortal Life: A Soon To Be True Story Kindle Edition by Stanley Bing

Neural Network Programming with Python Kindle Edition by Fabio. M. Soares, Rodrigo Nunes

 

If you have found above useful, please don’t forget to share with others on social media.

Learn TensorFlow for free.

Below a list of free resources to learn TensorFlow:

  1. TensorFlow website: www.tensorflow.org
  2. Udacity free course: www.udacity.com
  3. Google Cloud Platform: cloud.google.com
  4. Coursera free course: www.coursera.orgicon
  5. Machine Learning with TensorFlow by Nishant Shukla : www.tensorflowbook.com
  6. ‘First Contact With TensorFlow’ by Prof. JORDI TORRES: jorditorres.org  or you can order from Amazon: First Contact With Tensorflow
  7. Kadenze Academy: www.kadenze.com
  8. OpenShift: blog.openshift.com
  9. Tutorial by pkmital : github.com
  10. Tutorial by HyunsuLee : github.com
  11. Tutorial by orcaman : github.com
  12. Stanford CS224d: Lecture 7

Hope the above list would be useful.

Here some not free, but definitely worth trying resources available on Amazon:

 

What is Supervised Learning?

Supervised Learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and the desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a “reasonable” way (see inductive bias). In order to solve the supervised learning problem, one has to perform following steps: determine the type of training examples, gather a training set, determine the input feature representation of the learned function, determine the structure of the learned function and corresponding learning algorithm, complete the design, and evaluate the accuracy of the learned function. A wide range of supervised learning algorithms is available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems.

What is Statistical Significance?

Statistical Significance in statistical hypothesis testing is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study. The p-value is the probability of obtaining results at least as extreme as those observed, given that the null hypothesis is true. The significance level, α, is the probability of rejecting the null hypothesis, given that it is true. In any experiment or observation that involves drawing a sample from a population, there is always the possibility that an observed effect would have occurred due to sampling error alone. But if the p-value of an observed effect is less than the significance level, an investigator may conclude that that effect reflects the characteristics of the whole population, thereby rejecting the null hypothesis. A significance level is chosen before data collection and typically set to 5% or much lower, depending on the field of study. This technique for testing the significance of results was developed in the early 20th century. The term significance does not imply importance here, and the term statistical significance is not the same as research, theoretical, or practical significance. For example, the term clinical significance refers to the practical importance of a treatment effect.

What is Statistical Power?

Statistical Power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the probability of making a Type II error. The power is a function of the possible distributions, often determined by a parameter, under the alternative hypothesis. As the power increases, there are decreasing chances of a Type II error, which are also referred to as the false negative rate (β) since the power is equal to 1−β, again, under the alternative hypothesis. A similar concept is Type I error or the level of a test under the null hypothesis. Power analysis can be used to calculate the minimum sample size required so that one can be reasonably likely to detect an effect of a given size. For example: “how many times do I need to toss a coin to conclude it is rigged?” Power analysis can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size. In addition, the concept of power is used to make comparisons between different statistical testing procedures: for example, between a parametric and a nonparametric test of the same hypothesis.

What is Sentiment Analysis?

Sentiment Analysis refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. Generally speaking, sentiment analysis aims to determine the attitude of a speaker, writer, or other subjects with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. The attitude may be a judgment or evaluation, affective state (the emotional state of the author or speaker), or the intended emotional communication (the emotional effect intended by the author or interlocutor). A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, “beyond polarity” sentiment classification looks, for instance, at emotional states such as “angry”, “sad”, and “happy”.