Machine Learning with TensorFlow Intro

Machine Learning with TensorFlow Intro.

What is TensorFlow?

The shortest definition would be, TensorFlow is a general-purpose library for graph-based computation.

But there is a variety of other ways to define TensorFlow, for example, Rodolfo Bonnin in his book – Building Machine Learning Projects with TensorFlow brings up definition like this:

“TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) passed between them.”

To quote the TensorFlow website, TensorFlow is an “open source software library for numerical computation using data flow graphs”.  Name TensorFlow derives from the operations which neural networks perform on multidimensional data arrays, often referred to as ‘tensors’. It is using data flow graphs and is capable of building and training variety of different machine learning algorithms including deep neural networks, at the same time, it is general enough to be applicable in a wide variety of other domains as well. Flexible architecture allows deploying computation to one or more CPUs or GPU in a desktop, server, or mobile device with a single API.

TensorFlow is Google Brain’s second generation machine learning system, released as open source software in 2015. TensorFlow is available on 64-bit Linux, macOS, and mobile computing platforms including Android and iOS. TensorFlow provides a Python API, as well as C++, Haskell, Java and Go APIs. Google’s machine learning framework became lately ‘hottest’ in data science world, it is particularly useful for building deep learning systems for predictive models involving natural language processing, audio, and images.

 

What is ‘Graph’ or ‘Data Flow Graph’? What is TensorFlow Session?

 

Trying to define what TensorFlow is, it is hard to avoid using word ‘graph’, or ‘data flow graph’, so what is that? The shortest definition would be, TensorFlow Graph is a description of computations. Deep learning (neural networks with many layers) uses mostly very simple mathematical operations – just many of them, on high dimensional data structures(tensors). Neural networks can have thousands or even millions of weights. Computing them, by interpreting every step (Python) would take forever.

That’s why we create a graph made up of defined tensors and mathematical operations and even initial values for variables. Only after we’ve created this ‘recipe’ we can pass it to what TensorFlow calls a session. To compute anything, a graph must be launched in a Session. The session runs the graph using very efficient and optimized code. Not only that, but many of the operations, such as matrix multiplication, are ones that can be parallelised by supported GPU (Graphics Processing Unit) and the session will do that for you. Also, TensorFlow is built to be able to distribute the processing across multiple machines and/or GPUs.

TensorFlow programs are usually divided into a construction phase, that assembles a graph, and an execution phase that uses a session to execute operations in the graph. To do machine learning in TensorFlow, you want to create tensors, adding operations (that output other tensors), and then executing the computation (running the computational graph). In particular, it’s important to realize that when you add an operation on tensors, it doesn’t execute immediately. TensorFlow waits for you to define all the operations you want to perform and then optimizes the computation graph, ‘deciding’ how to execute the computation, before generating the data. Because of this, tensors in TensorFlow are not so much holding the data as a placeholder for holding the data, waiting for the data to arrive when a computation is executed.

 

Prerequisites:

 

NEURAL NETWORKS – basics.

Before we move on to create our first model in TensorFlow, we’ll need to get the basics right, talk a bit about the structure of a simple neural network.

A simple neural network has some input units where the input goes. It also has hidden units, so-called because from a user’s perspective they’re hidden. And there are output units, from which we get the results. Off to the side are also bias units, which are there to help control the values emitted from the hidden and output units. Connecting all of these units are a bunch of weights, which are just numbers, each of which is associated with two units. The way we train neural network is to assign values to all those weights. That’s what training a neural network does, find suitable values for those weights. One step in “running” the neural network is to multiply the value of each weight by the value of its input unit, and then to store the result in the associated unit.

There is plenty of resources available online to get more background on the neural networks architectures, few examples below:

 

MATHEMATICS

Deep learning uses very simple mathematical operations, it would be recommended to get/refresh at least basics of them.  I recommend starting from one of the following:

 

PYTHON

It would be advised to have basics Python programming before moving forward, few available resources:

 

Let’s do it… TensorFlow first example code.

 

To keep things simple let’s start with ‘Halo World’ example.

importing TensorFlow

 

import tensorflow as tf

Declaring constants/variables, TensorFlow constants can be declared using the tf.constant function, and variables with the tf.Variable function.  The first element in both is the value to be assigned the constant/variable when it is initialised.  TensorFlow will infer the type of the constant/variable initialised value, but it can also be set explicitly using the optional dtype argument. It’s important to note that, as the Python code runs through these commands, the variables haven’t actually been declared as they would have been if you just had a standard Python declaration.

x = tf.constant(2.0) 
y = tf.Variable(3.0)

Lets make our code compute something, simple multiplication.

z = y * x

Now comes the time when we would like to see the outcome, except nothing, has been computed yet… welcome to the TensorFlow. To make use of TensorFlow variables and perform calculations, Session must be created and all variables must be initialized. We can do it using the following statements.

sess = tf.Session()

init = tf.global_variables_initializer()

sess.run(init)

 

We have Session and even all constants/variables in place. Let’s see the outcome.

print("z = y * x = ", sess.run(z))

 

If you see something like this:
‘z = y * x = 6.0’
Congratulations, you have just coded you first TensorFlow ‘model’.

Below whole code in one piece:

 

import tensorflow as tf
x = tf.constant(2.0)
y = tf.Variable(3.0)
z = y * x
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
print("z = y * x = ", sess.run(z))

This tutorial, of course, will not end up like this and will be continued soon… in next part, we will code our first neural network in TensorFlow.

 

If You liked this post please share it on your social media, if you have any questions or comments please make use of contact form.

 

Recommended reading list below:

Data Science Update – News

Data Science News Digest – handpicked articles, news, and stories from Data Science world.

 

NEWS

 

  • CUDA 9 Features Revealed  – At the GPU Technology Conference, NVIDIA announced CUDA 9, the latest version of CUDA’s powerful parallel computing platform and programming model.

 

 

 

  • AlphaGo’s next move – Chinese Go Grandmaster and world number one Ke Jie departed from his typical style of play and opened with a “3:3 point” strategy – a highly unusual approach aimed at quickly claiming corner territory at the start of the game.

 

  • Integrate Your Amazon Lex Bot with Any Messaging Service – Is your Amazon Lex chatbot ready to talk to the world? When it is, chances are that you’ll want it to be able to interact with as many users as possible. Amazon Lex offers built-in integration with Facebook, Slack and Twilio. But what if you want to connect to a messaging service that isn’t supported? Well, there’s an API for that–the Amazon Lex API.

 

  • How Our Company Learned to Make Better Predictions About Everything – In Silicon Valley, everyone makes bets. Founders bet years of their lives on finding product-market fit, investors bet billions on the future value of ambitious startups, and executives bet that their strategies will increase a company’s prospects. Here, predicting the future is not a theoretical superpower, it’s part of the job.

 

  • Are Pop Lyrics Getting More Repetitive? – In 1977, the great computer scientist Donald Knuth published a paper called The Complexity of Songs, which is basically one long joke about the repetitive lyrics of newfangled music (example quote: “the advent of modern drugs has led to demands for still less memory, and the ultimate improvement of Theorem 1 has consequently just been announced”).

 

  • Home advantages and wanderlust – When Burnley got beat 3-1 by Everton at Goodison Park on the 15th April, 33 games into their Premier League season, they’d gained only 4 points out of a possible 51 in their away fixtures. But during this time they’d also managed to accrue 32 points out of a possible 48 at Turf Moor; if the league table were based upon only home fixtures, they’d be in a highly impressive 6th place.

 

 

 

  • The Simple, Economic Value of Artificial Intelligence – How does this framing now apply to our emerging AI revolution?  After decades of promise and hype, AI seems to have finally arrived, – driven by the explosive growth of big data,  inexpensive computing power and storage, and advanced algorithms like machine learning that enable us to analyze and extract insights from all that data. 

    BOOKS

Immortal Life: A Soon To Be True Story Kindle Edition by Stanley Bing

Neural Network Programming with Python Kindle Edition by Fabio. M. Soares, Rodrigo Nunes

 

If you have found above useful, please don’t forget to share with others on social media.

Learn TensorFlow for free.

Below a list of free resources to learn TensorFlow:

  1. TensorFlow website: www.tensorflow.org
  2. Udacity free course: www.udacity.com
  3. Google Cloud Platform: cloud.google.com
  4. Coursera free course: www.coursera.orgicon
  5. Machine Learning with TensorFlow by Nishant Shukla : www.tensorflowbook.com
  6. ‘First Contact With TensorFlow’ by Prof. JORDI TORRES: jorditorres.org  or you can order from Amazon: First Contact With Tensorflow
  7. Kadenze Academy: www.kadenze.com
  8. OpenShift: blog.openshift.com
  9. Tutorial by pkmital : github.com
  10. Tutorial by HyunsuLee : github.com
  11. Tutorial by orcaman : github.com
  12. Stanford CS224d: Lecture 7

Hope the above list would be useful.

Here some not free, but definitely worth trying resources available on Amazon:

 

What is Supervised Learning?

Supervised Learning is the machine learning task of inferring a function from labeled training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and the desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a “reasonable” way (see inductive bias). In order to solve the supervised learning problem, one has to perform following steps: determine the type of training examples, gather a training set, determine the input feature representation of the learned function, determine the structure of the learned function and corresponding learning algorithm, complete the design, and evaluate the accuracy of the learned function. A wide range of supervised learning algorithms is available, each with its strengths and weaknesses. There is no single learning algorithm that works best on all supervised learning problems.

What is Statistical Significance?

Statistical Significance in statistical hypothesis testing is attained whenever the observed p-value of a test statistic is less than the significance level defined for the study. The p-value is the probability of obtaining results at least as extreme as those observed, given that the null hypothesis is true. The significance level, α, is the probability of rejecting the null hypothesis, given that it is true. In any experiment or observation that involves drawing a sample from a population, there is always the possibility that an observed effect would have occurred due to sampling error alone. But if the p-value of an observed effect is less than the significance level, an investigator may conclude that that effect reflects the characteristics of the whole population, thereby rejecting the null hypothesis. A significance level is chosen before data collection and typically set to 5% or much lower, depending on the field of study. This technique for testing the significance of results was developed in the early 20th century. The term significance does not imply importance here, and the term statistical significance is not the same as research, theoretical, or practical significance. For example, the term clinical significance refers to the practical importance of a treatment effect.

What is Statistical Power?

Statistical Power of any test of statistical significance is defined as the probability that it will reject a false null hypothesis. Statistical power is inversely related to beta or the probability of making a Type II error. The power is a function of the possible distributions, often determined by a parameter, under the alternative hypothesis. As the power increases, there are decreasing chances of a Type II error, which are also referred to as the false negative rate (β) since the power is equal to 1−β, again, under the alternative hypothesis. A similar concept is Type I error or the level of a test under the null hypothesis. Power analysis can be used to calculate the minimum sample size required so that one can be reasonably likely to detect an effect of a given size. For example: “how many times do I need to toss a coin to conclude it is rigged?” Power analysis can also be used to calculate the minimum effect size that is likely to be detected in a study using a given sample size. In addition, the concept of power is used to make comparisons between different statistical testing procedures: for example, between a parametric and a nonparametric test of the same hypothesis.

What is Sentiment Analysis?

Sentiment Analysis refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine. Generally speaking, sentiment analysis aims to determine the attitude of a speaker, writer, or other subjects with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. The attitude may be a judgment or evaluation, affective state (the emotional state of the author or speaker), or the intended emotional communication (the emotional effect intended by the author or interlocutor). A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, “beyond polarity” sentiment classification looks, for instance, at emotional states such as “angry”, “sad”, and “happy”.

What is Semi-Supervised Learning?

Semi-Supervised Learning is a class of supervised learning tasks that also make use of unlabeled data for training – typically a small amount of labeled data with a large amount of unlabelled data. Semi-supervised learning falls between unsupervised learning (without any labeled training data) and supervised learning (with completely labeled training data). Many machine-learning researchers have found that unlabelled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. Semi-supervised learning is also of theoretical interest in machine learning and as a model for human learning. Methods of semi-supervise learning include generative methods, low-density separation, graph-based methods, heuristic approaches.

What is Semantic Indexing or Latent Semantic Indexing (LSI)?

Semantic Indexing or Latent Semantic Indexing (LSI) is a mathematical method used to determine the relationship between terms and concepts in content. The contents of a web page are crawled by a search engine and the most common words and phrases are collated and identified as the keywords for the page. LSI looks for synonyms related to the title of your page. Latent Semantic Indexing came as a direct reaction to people trying to cheat search engines by cramming Meta keyword tags full of hundreds of keywords, Meta description full of more keywords, and page content full of nothing more than random keywords and no subject-related material or worthwhile content. LSI will not affect a squeeze page that has no intention of achieving a search engine rank anyway, due to its minimalistic content. But for site owners or bloggers hoping to get on the search engines good side, pay attention to LSI.

What is Self-Organizing Map (SOM)?

Self-Organizing Map (SOM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is, therefore, a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning (such as backpropagation with gradient descent), and in the sense that they use a neighborhood function to preserve the topological properties of the input space. This makes SOMs useful for visualizing low-dimensional views of high-dimensional data. Like most artificial neural networks, SOMs operate in two modes: training and mapping. “Training” builds the map using input, while “mapping” automatically classifies a new input vector. A self-organizing map consists of components called nodes or neurons. Associated with each node are a weight vector of the same dimension as the input data vectors and a position in the map space. The usual arrangement of nodes is a two-dimensional regular spacing in a hexagonal or rectangular grid. The procedure for placing a vector from data space onto the map is to find the node with the closest weight vector to the data space vector.