AI(Theory)

Introduction

AI development constitutes following

Learning Python
Learning AI
Learn basic animist code and understand some of models & domain theory.

The first two are code based learning and and third one is theoretical.
Sandbox is setup on our laptops.

epox may go very slow in Sandbox environment for advanced models so we cannot fine tune faster.

Next Setup our own cloud instance or setup digital environment for real time.
Put up the use case on table for crowd source.
We need to see problem from AI prospective.

Branches of AI

Over Fitting and Under Fitting

Under fitting is a scenario under which any new untrained varies too much below the trained data.
Over fitting is a scenario in which all data needed is trained on model hence any new data is not recognized.
Good Fit Model is where new data is recognized with very less variation.
A Good Model works for all generic data.

AI for business executives

AI is typically defined as the ability of a machine to perform cognitive functions we associate with human minds, such as perceiving, reasoning, learning and problem solving.
Examples of technologies that enable AI to solve business problems are robotics and autonomous vehicles, Computer Vision, language, vital agents, and machine learning.

Machine Learning

By definition most recent advances in AI have been achieved by applying machine learning to very large data sets.
Machine learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instruction.
The algorithms also adopt in response to new data and experiences to improve efficiency.

Major types of Machine Learning

Supervised Learning

Supervised Learning where we have the data labels and data available.

Unsupervised Learning

Unsupervised Learning where system should learn from scratch.

Reinforcement Learning

Reinforcement Learning where system is overloaded to take corrective actions.

Each Learning compromises of following inputs

What it is?
When to use it?
How it works?
Algorithms/Business Use cases

Algorithms

An algorithm uses training data and feedback from humans to learn the relationship of given inputs to a given output.

Deep Learning is now becoming a platform across

We can use for NLP.
We can use for Computer Vision.
We can use for Speech.

There are multiple algorithms available for each of these categories such as Linear Regression and Deep Learning.
We need to understand how the space is broadly structured.
Machine learning is used for predictive and prescriptive analysis and not descriptive analysis.
Machine learning algorithms detect patterns and learn how to make predictions and recommendations by processing data and experiences, rather than by receiving explicit programming instructions.The algorithms also adapt in response to new data and experiences to improve efficiency over time.
Following are the types of analysis

Descriptive

Describe what happened?
Employed heavily across all industries.

Predictive

Anticipate what will happen?
Employed in data driven organization as a key source of insight.

Prescriptive

Provide recommendations on what to do to achieve goals.
Employed heavily by leading data and internet companies.

There are algorithms in machine learning as well as deep learning.

As a developer what we should know is when to apply which algorithm?
What algorithms to apply for our use cases and data sets and start importing those libraries.
We are not interested in mathematics or statics behind the algorithms.
We will look more into AI development than AI research.

Deep Learning

Deep learning is a type of machine learning that can process a wide range of data resources, requires less data preprocessing by humans, and can often produce more accurate results than traditional machine learning approaches.
In deep learning, inter connected layers of software based calculators known as "neurons" form a neural network.
The network can ingest vast amounts of input data and process them through multiple layers that learn increasingly complex features of data at each layer.
The network can make a determination about the data, learn if its determination is correct, and use what it has learned to make determinations about new data.
For example, once it learns that an object looks like, it can recognize the object in a new image.
Deep learning can often outperform traditional methods.
There is a percent reduction in error rate achieved by deep learning than in traditional methods.
Deep Learning has following areas

Convolution neural networks
Recurrent Neural Network
Capsule Learning
Reinforcement Learning
Capsule Networks
Sequence Networks

Major Models of Deep Learning

Convolutional neural network.

We take the image and then Colvolve i.e. filter the image using each and every pixel.

Because every pixel does not matter.

A CNN is composed of two basic parts of feature extraction and classification.
For example in image below only pixels making up "3" are relevant.
Each layer has a Convolutional and Max Pooling mechanism which extracts a feature based on sample.
The filter is also called as Kernel because its size is based on the sampling size.
We focus on only relevant pixels based on Kernel size.
We extract the relevant features from the image using Kernel and then from these features we extract the further core features.
So in the image given the pixels which form up number "3" matters most.
This is based on human brains perspective of visual context.

In this biological neurons determine the key feature such as edges of the object,the shape of the object,the color of the object.
All this is consolidated and stored as an object in our brain.

Similarly in CNN we extract all these features until we get key features.

We consolidate all these features in Dense Layer which is just before output layer.
We then push these features to output layer called as "softmax".
Then the output layer may classify it into 0,1,2 based on probability distribution.

Recurrent neural network.

Machine Learning Use cases

Here we will discuss few algorithms of Machine Learning and their usage.
Decision Tree

Highly interpret-able classification or regression model that splits data feature values into branches at decision nodes(eg if a feature is a color, each possible color becomes a new branch) until a final decision output is made.
Business Use-cases

Understand product attributes that make a product most likely to be purchased.
Provide a decision framework for hiring new employees.

Naive Byes

Classification Technique that applies Bayes Theorem, which allows the probability of an event to be calculated based on knowledge of factors that might affect that event(eg if an email contains word "money", then the probability of it being spam is high).
Business Use cases

Analyze sentiment to access product perception in the market.
Create classifiers to filter spam emails.

Gradient-boosting trees

Classification of regression technique that generates decision trees sequentially, where each tree focuses on correcting the errors coming from the previous tree model.The final output is a combination of the result from all trees.
Business Use cases

Forecast a product demand and inventory levels.
Predict the price of cars based on their characteristics(eg age and mileage)

Support Vector Machine

A technique that's typically used for classification but can be transformed to perform regression.It draws a division between classes that's as wide as possible.It can also be generalized to solve non linear problems.
Business Use cases

Predict how many patients a hospital will need to serve in a time period.
Predict likely someone is to click on an online advertisement.

Ada Boost

Classification or regression technique that uses a multitude of model to come up with a decision but weighs them based on their accuracy in predicting the outcome.
Business Use cases

Detect fraudulent activity in credit-card transactions.Achieves lower accuracy than deep learning.
Simple,low-cost way to classify images(eg recognize land usage from satellite images for climate change models).Achieves lower accuracy than deep learning.

Simple Neural Network

Model in which artificial neurons(software-based-calculators) make upon input layer,one or more hidden layers where calculations take place, and an output layer.It can be used to classify data or find the relationship between variables in regression problems.
Business Use cases

Predict the probability that a patient joins a healthcare program.
Predict whether registered users will be willing or not to pay a particular price for a product.

Linear Regression

Highly interpret-able,standard method for modeling the past relationship between independent input variables and dependent output variables(which can have infinite number of values) to help predict future values of the output variables.
Business Use cases

Understand product-sales drivers such as competition prices, distribution advertisement etc.
Optimize price points and estimate product price elastic ties.

Logistic Regression

A model with some similarities to linear regression that's used for classification tasks, meaning the output variable is binary(eg, only black or white) rather than continuous(eg infinite list of potential colors).
Business Use cases

Classify customers based on how likely they are to repay a loan.
Predict if a skin liaison is begin or malignant based on its characteristics(size,shape,color) etc.

Linear quadratic discriminant analysis

Upgrades a logistic regression to deal with non linear problems- those in which changes to the value of input variables do not result in proportional changes to the output variables.
Business Use cases

Predict client chum.
Predict a sales lead's like hood of closing.

Deep Learning Resources

Neurons

sigmoid
tanh
ReLU

Cost Functions

cross-entropy

Stochastic Gradient Descent

mini batch size
learning rate
second-order

Initialization

glorot normal
glorot uniform

Reduce Overfitting

L1/L2 regularization
dropout
data expansion

Layers

dense
softmax
max pooling
flatten
convolutional

Simple Neural Network

Simple Neural network has one hidden layer.

Deep Learning Neural Network

Deep Learning neural network have 3 or 4 hidden layers.

Key Concepts in AI

Artificial Neuron

Neural Networks are based on Neurons
All the small dots in above image of Simple and Deep Learning Neural Network are Neurons.
Researchers got impressed by biological neurons are trying to build and improve artificial neurons.
There are inputs which are processed inside neurons and then there is an output from neuron.

Artificial Neural Network

This is a network based on neurons.
This can be of two types

Shallow Learning Network
Deep Learning Network

Hidden Layer

These are those layers which are hidden from a developer from a programmers perspective.
We cannot determine how to program neurons in this layer.
We can provide overall optimization functions like dropout or provide an over all algorithm but we cannot program particular neuron in this layer.
We don't have access to these neurons.The programming of these neurons is done by system itself.
These are hidden from a developer from implementation point of view.
Based on complexity of a problem

We need to figure out how many hidden layers needed to be there in an algorithm?
How many neurons per layer is needed?
How these neurons shall be connected?
What is the learning rate of these neurons?

Evolutionary computing is performed using these hidden layers.

Study of Consciousness is performed using evolutionary computing.
Theorists are trying to formalize consciousness in terms of function.

AI is inspired by biological brain.

An example is computer vision

There is an approach called as "feature extraction".

We have multiple layers and we keep extracting features from layer to layer.
This is very similar to how our brain functions.
Study of overview of how biological brain functions is also an interesting topic from the view of AI.

AI can be compared to an analogy of giving exam

We study for exam.We give exam.The exam gets evaluated.We Pass or fail.If we fail we study again and rewrite exam.
In AI we have data.
We create a model.

For example if we need to categorize MNIST digits and return 0-9.

We run the model and if accuracy is 50% or 60% which means only 60% of time our model works.
We redo our model by feeding following inputs

Adding another layer or more number of layers.
Changing Learning Rate
Picking up a different algorithm.

We run the model again with MNIST digits applying computer vision.Now if the accuracy goes up to 98% or 99% we achieve our goal.

This scenario is called as PDCA cycle or Plan,Do,Check,Act cycle.

In AI we can map it with Develop,Test,Test and Tweak.

We have a training data-set and we have a testing data-set.

Training data-set is a set on which model learns.
Testing data-set is where the model gets validated

This is called as supervised Learning.

Once the model is validated that is our production output

We can take that model and start applying production data.

Machine Learning
Deep Learning

Deep Learning neural network have 3 or 4 hidden layers.

Supervised Learning

This is when we already have a data.
For example in case of MNIST digits we have 60,000 digits as a part of data set and every digit has a classifier.

There is a handwritten pixel image of each of the data-set,a 28 X 28 grey-scale image.This is our "X data" or Source data.
In the label or "Y data" the label will classify what this image is i,e. is it 1 or 2 etc.

This is called as the data classifier.

Unsupervised Learning

There is no Label or Y data.
This is used in anomaly detecting,behavioral and analytics etc.
The row data is there we need to benchmark this and come out with an average data so that any out layers can be predicted.

Weights

An artificial neuron has some inputs assigned with weights.

For example visiting gym has following inputs.

Take Black coffee- 2 Weight-age
Some friend is joining - 7 Weight-age
Keep Alarm - 3 Weight-age

Every action has a weight assigned.
When we bring all these actions to a threshold we find that whenever weight is 5 or above we will go to gym.

So if we take a Black Coffer we may not go to Gym since weight 3 less than threshold.
When we will have black coffee and Alarm will be there the threshold equals to sum of weights so we will go.
When our friend joins we will definitely go since weight has already crossed threshold.

Every neuron has a threshold value based on which and inputs provided it gives the output.

Bias

This is the threshold value.

Activation Function

The neurons need to be properly activated for them to produce right results.
How and to what extent we are leveraging neuron based on a use case.
Examples are

Sigmoid

Formula

f(x)=1/1+ e pow -z

Tanh

Formula

tanh(x)=2/1+(e pow -2z) -1

Relu

It is the most common activation function used it stands for "Rectifying Linear Unit".
It solves problems like "vanishing gradient problems" etc.
Formula

f(x)={0 for x<0}
f(x)={x for x>=0}

Softmax

For activation of neurons in output layer we use this function.
This gives us right probability when there are more than 2 outputs.
For MNIST digits we will use softmax.

Cost Function

The output of AI depends upon accuracy for example if there is 80%-90% accuracy it is great but if there is 40%-50% accuracy it is not great.
The cost is the difference between the actual value and target value i.e. our accuracy percentage has to be lowest.
The cost function is minimal at the Local Minima(w).
So the more minimal the cost is the more efficient the model is.
We find in first iteration that our value is far from accuracy percentage(w) so we run again this time more efficiently thus reducing cost.
After each run/iteration the model reruns again learning based on the epox until it reaches the Local Minima(w) where the cost function is minimum.
This process is called as Gradient Descent.
After each run there is a learning grade which makes model jump to next position on chart.

Now if the learning grade is higher it may take a higher jump to next position and we may miss the Local Minima point.
If there is a smaller learning rate it may take a long time to hit Local Minima.

For this there are learning rate optimizer like Adam,Adagred,Sarcastic Gradient etc.

If we use Adam as an optimizer it takes bigger jumps in beginning and smaller gradually as it reaches the Local Minima.

In our code we only put the learning grade and optimizer name rest system takes care.

Gradient Descent

It is a mathematical algorithm we use to derive at the lowest cost function.

Learning Rate
Back propagation

Ability for a network to go back.
Network goes from left to right in a feed forward network fashion and once the cycle is over it can go back and pass the information to respective neurons, which adjusts them selves accordingly for a better output.

Jupyter Notebook

GUI for writing AI code.

Tensorflow

It is a super level library.

Keras

It is a framework of AI libraries.
We have frameworks like MNIST,Pytage,Cafeet besides Keras

Types of Layers

Dense Layer in which there is full connectivity each neuron is connected to each other neuron in next layer.

These help to Optimize

Flatten Layer
Dropout Layer

SpatialDropOut1D

SpatialDropout is capable to Dropping entire feature map itself.

SpatialDropOut2D

Batch normalization layer

To avoid over fitting we need to drop some neurons.

Conventional Neural Network

Primarily used for computer vision.
Announced in 2012.

Recurrent Neural Network

Used for natural Language Processing and natural language understanding.

Basic Recurral Neural Network
Long Short Term Network(LSTM)
Bidirectional LSTM etc.

CPU vs GPU

For running the whole model CPU is slow we may need GPU
For advanced models we need a stronger machine i.e. with GPU with good memory and so on.
We can use a keggle instance if we need a stronger GPU.This is a public domain so we must be careful about confidentiality.

We need to create a keggle account for this.
Various Data Scientists use keggle for developing there own code.

Practical Implementation of AI

These mathematical formula provided above are only for reference purposes.In practical coding they are not required.

This is because when AI and python came up we used to have low level coding.
Now we have AI Algorithms in python which cover up Low level coding,which is getting covered into high level api's like tensorflow which is again getting covered into super high level api's like keras.

Practical in AI is now limited to only following operations.

Import Library
Import keras libraries
Load the data
Preprocess the data
Building the model
Compile the model
Training the model
Testing the model

If we go for further advanced topics we have how to fine tune the model.

What are all the parameters we should use?

These are called as Hyper Parameters.

The knowledge of theory is needed only to the extent which model to import.
So for example if we have a particular problem of Anomaly Detection

We may use Decision Tree Algorithm which is a predictive algorithm.
We may use reinforcement learning.

Next we should know how to clean the data which comes under data preprocessing

Data Science and Data Engineering helps here.

Building the Model is trial and error

There is no fixed approach
How many neurons in a given layer?
How many layers we need?
What kind of connection type we need?

Fully connected
Dense layer

What kind of cost functions we need to consider?
What kind of dropout ratio we need to consider?

How are we going to optimize networks using multiple network optimization routines?

Deploying our models requires separate challenges on platforms like Amazon Lambda, AWS, GCP cloud environments.

This phase is called as the implementation phase.

Data Preparation

Another point which is very important while considering real time scenario's is the suitable data needed for modelling.

This involves data curation activities which must be handled systematically.
Since this data varies from the actual problem statement which we are solving using AI.
We should not over estimate data.

We can have dummy data ready representing a realistic situation.

Though for learning purposes we can use a standard use case with data like endless digits,or abstract flower dataset etc.

All this is available in library itself but in real life it does not happens like this.
We can use datasets like amnist or import imdb movie emotion detecting data to get a grip of NLP and machine learning.

Natural Language Processing

NLP - Preprocessing - few examples

Padding
Cleaning - getting rid of less useful parts
Capitalizaton - US(as in USA) is different from US
Stopword - "the" and "etc"
Tokenization - splitting paragraph into a sentence and sentence into words
Stemming - removing "ing", for example Running becomes "Run".

overstemming should be avoided for example university and universe cannot be stemmed to "univers" as such we may loose the meaning of original word.

Lemmonization

another more accurate approach to stemming.

Recurrent Neural Network Overview

In a Simple Network with Normal layers every decision is for 1 particular item only for example recognition of Male vs Female.

We have a neural network which reads Amnist and which classifies Amnist.

Someone writes no 1 by hand our network predicts no 1.
Someone writes no 5 by hand our network predicts no 5.

Recurrent Neural Network is used in NLP where there are phenemon like Word Vectors etc.

Each Sentence has a context which has to be kept in mind as we progress.
A connection between earlier words and current words which needs to be maintained.

There are two types of RNN

Condensed RNN

RNN which is not unrolled/expanded.

Unrolled RNN

This has various network components within a network.A block has multiple networks.
Each time stamp makes a prediction and remembers it.
For example if there is a sentence "I went to France.That's why I am fluent in French".

France will be remembered to check fluency in french

RNN has a problem called as "Vanishing Gradient Descent"

Simple RNN is not able to remember the words coming up much earlier this is resolved using LSTM's(Long Short Term Memory Networks).

There are different use cases for which AI routines can be built some are as follows.

Strengthening Email Security using AI
Determine gender of a Human Image
Threat Hunting using Artificial Intelligence.

There are different use-cases in different domains that can be handled by AI?

Different Steps in deriving an AI solution are as follows

Analysis
Develop AI routine

View for developing code solution

Implementation

Porting

Scaling

Natural Language Processing

There is a central word in any given sentence and then there are context of words.
Following are the components of NLP :

Word/Vectors
Recurral Neural Networks
Long short term memory networks
Bidirectional LSTM's
Stacked LSTM's
Parallel network architecture

Basic concepts of NLP

Natural Language Processing(NLP) concerns itself with the interaction between natural human languages and computing devices. NLP is a major aspect of computational linguistics, and also falls within the rearms of computer science and artificial intelligence.
Tokenization is generally an early step in NLP process, a step which splits longer Strings of text into smaller pieces, or into tokens. Larger chunks of text can be tokenized into sentences,sentences can be tokenized into words etc.Further processing is generally performed after a piece of text has been appropriately tokenized.

Example of Tokenization

Breaking text into sentences
Breaking sentences into words

Normalization Before further processing text needs to be normalized. Normalization generally refers to a series of related tasks meant to put all text on a level playing field.Converting all text to the same case(upper or lower),removing punctuation,expanding contractions,Converting numbers to their word equivalents, and so on.Normalization puts all words on equal footing,and allows processing to proceed uniformly.
Stemming Stemming is the process of eliminating affixes(suffixed,prefixes,circumflex) from a word stem.
Lemmatization is related to stepping,differing in that lemmatization is able to capture canonical forms based on a word's lemma.For example stemming the word "better" would fail to return its citation from (another word for Lemma).However lemmatization would result in the following.

Better -Good
It should be easy to see why the implementation of a stemmed would be the less difficult feat of the two.

Corpus In linguestics and NLP, corpus refers to a collection of texts.Such collections may be formed of a single language of texts, or can span multiple languages.

There are numerous reasons for which multilingual corpora may be useful
Corpora may also consist of themed texts.
Corpora are generally solely used for statistical linguistic analysis and hypothesis testing.

Stop Words are those words which are filtered out before processing of text, since these words contribute little to the overall meaning given that they are generally the most common words in a language.

For instance "the","and","a", while all required words in a particular passage don't generally contribute to one's understanding of content. As a simple example, the following panagram is just as legible if the stop words are removed.

The quick brown fox jumped over the lazy fox.

Parts of Speech(POS) Tagging POS tagging consists of assigning a category tag to the tokenism parts of a sentence.The most popular POS tagging would be identifying words as nouns,verbs, adjectives etc.
Statistical Language Modelling is a process of building a statistical language model which is meant to provide an estimate of a natural language.For a sequence of input words, the model would assign a probability to the entire sequence, which contributes to the estimated likelihood of various possible sequences.This can be especially useful for NLP applications which generate texts.
Bag of Words is a particular representation model used to simplify the contents of a selection of text.The bag of words model omits grammar and word order, but it is interested in number of words in a text.The ultimate representation of a text selection is that of a bag of words(bag referring to the set theory concepts of multisets, which differ from simple sets).

Actual storage mechanisms for the bag of words representation can vary but following is a simple example using a dictionary for intuitiveness,Sample text.

"Well, well, well " said John
"There, There, There" said James "There,there".

The resulting bag of words will be

{"well:3","said:2","john:1","there:4","james:1"}

n-grams

n-grams is another representation model for simplifying text selection contents.As opposed to the order-less representation of bag of words, n-gram modelling is interested in preserving contiguous sequence of N items from text selection.
An example of trigram(3-gram) model of the second sentence of the above example["There,there","said James", "said James","james there there"] appears as a list of representation below.

["there there said","there said James","said James there","James there there"]

NLP Algorithms

Parts of NLP

Word Vector
Dense for imdb classification
Simple recurral neural network
LSTMS,Bi LSTMS, Stacked LSTMS etc

There are 2 common algorithms

CBOW

Continuous Bag of Words or CBOW architecture, predicts a target, given context words.
Primarily Looks at the context of words for a given target word.
[drove][my][high][speed]Vehicle[down][the][road]

Word is bold while context words are non bold.

In the above example using context words target word is predicted.
Target word is vehicle.

Skip Ngram

Skip Ngram architecture, predicts context given a target word.

Word Vectors

NLP positions words in respective vector places.
Computer understands data in terms of vectors.
Related Words are placed together in a logical place.
A vector can be multidimensional so there is a huge space in which model organizes a word.
When ever there is a Corpus(bag of words) system starts positioning it in respective vectors.

Keras already handles Word Vectors using

hidden layer called as Embedding layer.

In below examples the words related to male-female are stored separately from verb-tense.

Production Deployment of AI Models

Unit Testing using test parameters by developers at the time of development of model.

These parameters are from Data itself which is split into Test data and Training data.

Flask js
Setting up a web service which sends parameters to server and gets results.
Tensor-flow serving which is a cloud environment used for large scale deployment(millions of people are using it).

There are many other frameworks that can be used for deploying example Mxnet, pytorch, caffee (other frameworks like tensorflow)

Google Cloud

It is a deployment ready framework
GCloud app deploy and GCloud App browse are commands used to push package to google cloud.
Entire package with app name is taken containerized and pushed to google cloud.

Generative Adversarial Networks

Used in AI to create things such as compose music or create art etc.
Applications

Create paintings.
Create interior designs.
Create cartoons.
Create cloths.
Create gaming programming.
Create animation characters.
Create combination of cloths based on source images.
Suggesting cloths design.

Cycle GAN

Convert one image to another

Convert Horse to Zebra.

Stack GAN

From a series of text instructions create an image.

Risks

Fake images as news can be spread by GAN
Security Issue

Data poisoning attack can be done
Some samples in training videos may be replaced with wrong samples.

Cross domain transfer GAN's.

From photos to painting.

Steps towards Production Deployments

Writing a Hello World Deployment.
Using Hello World with proched trapose we can have to and fro.
Load the model using VGenerate or Amnist.
Start Designing a better UI according to domain

Example Insurance Application
Loan Processing Application

Integrate code into our existing web application.
Deploy on tensor-flow serving.
Compare deployment using multiple frameworks
Deploy on deployment ready cloud provides like Google Cloud etc.
How we can optimize a deployment.

Going Down the Lane

Solve the problem using Keras
Solve the problem using Tensor Flow
Solve the problem using Python
Know the Artificial Intelligence Algorithm
Use AMNIST in keras,tensorflow and raw python.

Links for further brush-up

Course Era machine learning courses
Keggle Cloud
Deep Learning with python
Libraries to go through

Piplot
Pandas
Macplot
skykit learn

Learn Vectors,Space,Embedding Layers(Theory and Code)
Pre Sequence and Post Sequence
Production Deployment of Models
Generate adversarial networks
Multiple network architectures
Sequence Networks
Latest AI developments

References

http://colah.github.io/
deeplizard youtube channel
deeplearning.ai
fast.ai
ai gym

Search This Blog

Python Implant

AI(Theory)

Comments

Post a Comment

Popular posts from this blog

Python Vs Java

Sources to Read

Jupyter VS PyCharm vs Spyder