2018-05-22

Reading Paper for TensorFlow

论文链接TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

1. Abstract

1.1 What is tensorflow?

TensorFlow is an interface for expressing machine learning algorithms
TensorFlow is an implementation for executing such algorithms
A computation expressed using tensorflow can be executed with little or no change on a variety of heterogeneous systems.
A variety of algorithms including training and inference algorithms.
Deployed across more than a dozen areas of computer science such as speech recognition, computer version, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery.

2. Introduction

The google brain project started in 2011 to explore the use of very-large-scale deep neural networks.
TensorFlow is built on the work of DistBelief which is first-generation scalable distributed training and inference system.
TensorFlow takes computations described using a stateful dataflow graphs and maps them onto a wide variety of different hardware platforms.

3. Programming Model and Basic Concepts

A TensorFlow computation is described by a directed graph. The graph represents a dataflow computation
In a TensorFlow graph, each node has zero or more inputs and zero or more outputs, and represents the instantitation of an operation.
In a TensorFlow graph, values that flow along normal edges in the graph(from outputs to inputs) are tensors.
Special edges called control dependencies can also exist in the graph.

3.1 Operations and Kernels

An operation can have attributes and all attributes must be provided or inferred at graph-construction time in order to instantiate a node to perform the operation.

Category	Examples
Element-wise mathematical operations	add, sub, mul, div, exp, log, greater, less, equal…
Array operations	concat, slice, split, constant, rank, shape, shuffle…
Matrix operations	matmul, matrixInverse, matrixDeterminant…
Stateful operations	variable, assign, assignAdd…
Neural-net building blocks	softmax, sigmoid, ReLU, convolution2D, maxPool…
Checkpointing operations	save, restore
Queue and synchronization operations	enqueue, dequeue, mutexAcquire, mutexRelease…
Control flow operations	merge, switch, enter, leave, nextIteration…

A kernel is a particular implementation of an operation.

3.2 Sessions

Clients programs interact with the TensorFlow system by creating a Session.

Extend
To create a computation graph, the Session interface supports an Extend method to augment(增加, 增添, 是扩张) the current graph managed by the Session with additional nodes and edges(the initial graph when a session is created is empty)
run
Most of our uses of TensorFlow set up a Session with a graph once, and then execute the full graph or a few distinct subgraphs thousands or millions of times via Run calls.

import tensorflow as tf
b = tf.Variable(tf.zeros([100])) # 100-d vector, init to zeroes
W = tf.Variable(tf.random_uniform([784,100],-1,1)) # 784x100 matrix w/rnd vals
x = tf.placeholder(name="x") # Placeholder for input
relu = tf.nn.relu(tf.matmul(W, x) + b) # Relu(Wx+b)
C = [...] # Cost computed as a function of Relu

s = tf.Session()
for step in xrange(0, 10):
input = ...construct 100-D input array ... # Create 100-d vector for input
result = s.run(C, feed_dict={x: input}) # Fetch cost, feeding x=input
print step, result

3.3 Variables

A Variable is a special king of operation that returns a handle to a persistent mutable tensor that survives across executions of a graph.

3.4 Implementation

The main components in a TensorFlow system are the client, which uses the Session interface to communicate with the master.

Local Implementations of the TensorFlow
Distributed Implementations of the TensorFlow
The distributed implementation shares most of the code with the local implementation, but extends it with support for an environment where the client, the master, and the workers can all be in different processes on different machines.

3.5 Devices

Each worker is responsible for one or more devices.
TensorFlow has implementations of Device interface for CPUs and GPUs, and new device implementations for other device types can be provided via a registration mechanism.

3.6 Tensors

A tensor is a typed, multi-dimensional array.
Tensor backing store buffers are reference counted引用计数 and are deallocated when no references remain.

3.7 Single-Device Execution

Keep track of a count per node of the number of dependencies of that node that have not yet been executed. Once this count drops to zero, the node is eligible for execution and is added to a ready queue. The ready queue is processed in some unspecified order. When a node has finished executing, the counts of all nodes that depend on the completed node are decremented.

3.8 Multi-Device Execution

There are to complications for multiple devices:

Decide which device to place the computation for each node in the graph
Manage the required communications of data across device doundaries.

Jspace