Jekyll2023-01-27T07:21:56+00:00https://vbvsharma.com/feed.xmlVaibhav SharmaVaibhav SharmaCharacter-Level Language Modeling with RNN2023-01-27T00:00:00+00:002023-01-27T00:00:00+00:00https://vbvsharma.com/recurrent%20neural%20networks/2023/01/27/Character-Level-Language-Modeling-with-RNN<p>In this tutorial we are going to learn to generate text using a character-based RNN. We will use a toy dataset, which consists dinosaur names, to generate new and unique dinosaur names. This toy example may not have many practical applications, but it can be used to learn fundamentals, which can then be applied to a more challenging real world problem. So, let us get started by reviewing the goals and outline of this tutorial:</p>
<p>After this tutorial you will understand:</p>
<ul>
<li>Preparing dataset for the RNN</li>
<li>Building a RNN in TensorFlow</li>
<li>Generating text using the trained RNN</li>
<li>Customizing training loop in TensorFlow</li>
</ul>
<p>This tutorial is Jupyter notebook styled, so that you can follow the code or even run it in a browser. You can find all the necessary things needed to learn from this tutorial <a href="https://github.com/vbvsharma/text-generation-with-an-rnn">here</a>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Let us import some libraries
</span>
<span class="kn">import</span> <span class="nn">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>
<span class="kn">import</span> <span class="nn">os</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">time</span>
</code></pre></div></div>
<p>Let us get started by reading the data file.</p>
<h2 id="exploring-the-dataset">Exploring the dataset</h2>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Read in the text
</span><span class="n">text</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'dinos.txt'</span><span class="p">,</span> <span class="s">'rb'</span><span class="p">).</span><span class="n">read</span><span class="p">().</span><span class="n">decode</span><span class="p">(</span><span class="n">encoding</span><span class="o">=</span><span class="s">'utf-8'</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span> <span class="p">(</span><span class="s">'Length of text: {} characters'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">)))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Length of text: 19913 characters
</code></pre></div></div>
<p>Let us see what the text looks like, by printing first 100 characters of the text.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="n">text</span><span class="p">[:</span><span class="mi">100</span><span class="p">])</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Aachenosaurus
Aardonyx
Abdallahsaurus
Abelisaurus
Abrictosaurus
Abrosaurus
Abydosaurus
Acanthopholis
</code></pre></div></div>
<p>Let us now see the unique characters in the file.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">vocab</span> <span class="o">=</span> <span class="nb">sorted</span><span class="p">(</span><span class="nb">set</span><span class="p">(</span><span class="n">text</span><span class="p">))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">'{} unique characters'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">vocab</span><span class="p">)))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 53 unique characters
</code></pre></div></div>
<p>Let us plot the word length distribution in the text.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">words</span> <span class="o">=</span> <span class="n">text</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="n">lengths</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">:</span>
<span class="n">lengths</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">lengths</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Word length distribution'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'Word Length'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Frequency'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<p><img src="/assets/images/2020-08-01-Character-Level-Language-Modeling-with-RNN/output.png" alt="Word length distribution" class="img-responsive" /></p>
<p>We see that most of the words are of length 12 or 13. Let us find the length of the longest name.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">max_length</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">lengths</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">'max length: </span><span class="si">{</span><span class="n">max_length</span><span class="si">}</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> max length: 23
</code></pre></div></div>
<h2 id="process-the-text">Process the text</h2>
<p>As the first preprocessing step we will pad the words to make them of the same length. We choose the new line character for the same.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">padding_char</span> <span class="o">=</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span> <span class="c1"># declaring the padding character
</span><span class="n">text</span> <span class="o">=</span> <span class="s">''</span> <span class="c1"># stores the processed text
</span>
<span class="k">for</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">words</span><span class="p">:</span>
<span class="c1"># For each word, add it to the text, and add appropriate padding
</span> <span class="n">text</span> <span class="o">=</span> <span class="n">text</span> <span class="o">+</span> <span class="n">word</span> <span class="o">+</span> <span class="n">padding_char</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">max_length</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">word</span><span class="p">))</span>
<span class="n">text</span><span class="p">[:</span><span class="mi">100</span><span class="p">]</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 'Aachenosaurus\n\n\n\n\n\n\n\n\n\n\nAardonyx\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nAbdallahsaurus\n\n\n\n\n\n\n\n\n\nAbelisaurus\n\n\n\n\n\n\n\n\n\n\n\n\nAbri'
</code></pre></div></div>
<p>Before training, we need to map strings to a numerical representation. Create two lookup tables: one mapping characters to numbers, and another for numbers to characters.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Creating a mapping from unique characters to indices
</span><span class="n">char2idx</span> <span class="o">=</span> <span class="p">{</span><span class="n">u</span><span class="p">:</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">u</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">vocab</span><span class="p">)}</span>
<span class="n">idx2char</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">vocab</span><span class="p">)</span>
<span class="n">text_as_int</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">char2idx</span><span class="p">[</span><span class="n">c</span><span class="p">]</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">text</span><span class="p">])</span>
</code></pre></div></div>
<p>Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to <code class="language-plaintext highlighter-rouge">len(unique)</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">'{'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">char</span><span class="p">,</span><span class="n">_</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">char2idx</span><span class="p">,</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">)):</span>
<span class="k">print</span><span class="p">(</span><span class="s">' {:4s}: {:3d},'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">char</span><span class="p">),</span> <span class="n">char2idx</span><span class="p">[</span><span class="n">char</span><span class="p">]))</span>
<span class="k">print</span><span class="p">(</span><span class="s">' ...</span><span class="se">\n</span><span class="s">}'</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> {
'\n': 0,
'A' : 1,
'B' : 2,
'C' : 3,
'D' : 4,
'E' : 5,
'F' : 6,
'G' : 7,
'H' : 8,
'I' : 9,
'J' : 10,
'K' : 11,
'L' : 12,
'M' : 13,
'N' : 14,
'O' : 15,
'P' : 16,
'Q' : 17,
'R' : 18,
'S' : 19,
...
}
</code></pre></div></div>
<p>Let us see first 100 characters of text as sequence of numbers.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">text_as_int</span><span class="p">[:</span><span class="mi">100</span><span class="p">]</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> array([ 1, 27, 29, 34, 31, 40, 41, 45, 27, 47, 44, 47, 45, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 27, 44, 30, 41, 40, 51, 50, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 28, 30,
27, 38, 38, 27, 34, 45, 27, 47, 44, 47, 45, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 28, 31, 38, 35, 45, 27, 47, 44, 47, 45, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 28, 44, 35])
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Show how the first 13 characters from the text are mapped to integers
</span><span class="k">print</span> <span class="p">(</span><span class="s">'{} ---- characters mapped to int ---- > {}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="n">text</span><span class="p">[:</span><span class="mi">13</span><span class="p">]),</span> <span class="n">text_as_int</span><span class="p">[:</span><span class="mi">13</span><span class="p">]))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 'Aachenosaurus' ---- characters mapped to int ---- > [ 1 27 29 34 31 40 41 45 27 47 44 47 45]
</code></pre></div></div>
<h2 id="the-prediction-task">The prediction task</h2>
<p>Given a character, or a sequence of characters, what is the most probable next character? This is the task we’re training the model to perform. The input to the model will be a sequence of characters, and we train the model to predict the output—the following character at each time step.</p>
<p>Since RNNs maintain an internal state that depends on the previously seen elements, given all the characters computed until this moment, what is the next character?</p>
<h2 id="create-training-examples-and-targets">Create training examples and targets</h2>
<p>Next divide the text into example sequences. Each input sequence will contain seq_length characters from the text.</p>
<p>For each input sequence, the corresponding targets contain the same length of text, except shifted one character to the right.</p>
<p>So break the text into chunks of seq_length+1. For example, say seq_length is 4 and our text is “Hello”. The input sequence would be “Hell”, and the target sequence “ello”.</p>
<p>To do this first use the <a href="https://www.tensorflow.org/api_docs/python/tf/data/Dataset#from_tensor_slices">tf.data.Dataset.from_tensor_slices</a> function to convert the text vector into a stream of character indices.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># The maximum length sentence we want for a single input in characters
</span><span class="n">seq_length</span> <span class="o">=</span> <span class="n">max_length</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">examples_per_epoch</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">text</span><span class="p">)</span><span class="o">//</span><span class="p">(</span><span class="n">seq_length</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="c1"># Create training examples / targets
</span><span class="n">char_dataset</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">Dataset</span><span class="p">.</span><span class="n">from_tensor_slices</span><span class="p">(</span><span class="n">text_as_int</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">char_dataset</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">i</span><span class="p">.</span><span class="n">numpy</span><span class="p">()])</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> A
a
c
h
e
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">batch</code> method lets us easily convert these individual characters to sequences of the desired size.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sequences</span> <span class="o">=</span> <span class="n">char_dataset</span><span class="p">.</span><span class="n">batch</span><span class="p">(</span><span class="n">seq_length</span><span class="p">,</span> <span class="n">drop_remainder</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="nb">repr</span><span class="p">(</span><span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">item</span><span class="p">.</span><span class="n">numpy</span><span class="p">()])))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 'Aachenosaurus\n\n\n\n\n\n\n\n\n\n\n'
'Aardonyx\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'
'Abdallahsaurus\n\n\n\n\n\n\n\n\n\n'
'Abelisaurus\n\n\n\n\n\n\n\n\n\n\n\n\n'
'Abrictosaurus\n\n\n\n\n\n\n\n\n\n\n'
</code></pre></div></div>
<p>For each sequence, duplicate and shift it to form the input and target text by using the <code class="language-plaintext highlighter-rouge">map</code> method to apply a simple function to each batch:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">split_input_target</span><span class="p">(</span><span class="n">chunk</span><span class="p">):</span>
<span class="n">input_text</span> <span class="o">=</span> <span class="n">chunk</span><span class="p">[:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">target_text</span> <span class="o">=</span> <span class="n">chunk</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">return</span> <span class="n">input_text</span><span class="p">,</span> <span class="n">target_text</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">sequences</span><span class="p">.</span><span class="nb">map</span><span class="p">(</span><span class="n">split_input_target</span><span class="p">)</span>
</code></pre></div></div>
<p>Print the first examples input and target values:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">input_example</span><span class="p">,</span> <span class="n">target_example</span> <span class="ow">in</span> <span class="n">dataset</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="mi">1</span><span class="p">):</span>
<span class="k">print</span> <span class="p">(</span><span class="s">'Input data: '</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">input_example</span><span class="p">.</span><span class="n">numpy</span><span class="p">()])),</span> <span class="s">':'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">input_example</span><span class="p">.</span><span class="n">numpy</span><span class="p">()))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">'Target data:'</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">target_example</span><span class="p">.</span><span class="n">numpy</span><span class="p">()])),</span> <span class="s">':'</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">target_example</span><span class="p">.</span><span class="n">numpy</span><span class="p">()))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Input data: 'Aachenosaurus\n\n\n\n\n\n\n\n\n\n' : 23
Target data: 'achenosaurus\n\n\n\n\n\n\n\n\n\n\n' : 23
</code></pre></div></div>
<p>Each index of these vectors are processed as one time step. For the input at time step 0, the model receives the index for “F” and trys to predict the index for “i” as the next character. At the next timestep, it does the same thing but the RNN considers the previous step context in addition to the current input character.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">input_idx</span><span class="p">,</span> <span class="n">target_idx</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">input_example</span><span class="p">[:</span><span class="mi">5</span><span class="p">],</span> <span class="n">target_example</span><span class="p">[:</span><span class="mi">5</span><span class="p">])):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Step {:4d}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">" input: {} ({:s})"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">input_idx</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">input_idx</span><span class="p">])))</span>
<span class="k">print</span><span class="p">(</span><span class="s">" expected output: {} ({:s})"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">target_idx</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">target_idx</span><span class="p">])))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Step 0
input: 1 ('A')
expected output: 27 ('a')
Step 1
input: 27 ('a')
expected output: 29 ('c')
Step 2
input: 29 ('c')
expected output: 34 ('h')
Step 3
input: 34 ('h')
expected output: 31 ('e')
Step 4
input: 31 ('e')
expected output: 40 ('n')
</code></pre></div></div>
<h2 id="create-training-batches">Create training batches</h2>
<p>We used <a href="https://www.tensorflow.org/api_docs/python/tf/data">tf.data</a> to split the text into manageable sequences. But before feeding this data into the model, we need to shuffle the data and pack it into batches.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Batch size
</span><span class="n">BATCH_SIZE</span> <span class="o">=</span> <span class="mi">64</span>
<span class="c1"># Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
</span><span class="n">BUFFER_SIZE</span> <span class="o">=</span> <span class="mi">10000</span>
<span class="n">dataset</span> <span class="o">=</span> <span class="n">dataset</span><span class="p">.</span><span class="n">shuffle</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">).</span><span class="n">batch</span><span class="p">(</span><span class="n">BATCH_SIZE</span><span class="p">,</span> <span class="n">drop_remainder</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">dataset</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <BatchDataset shapes: ((64, 23), (64, 23)), types: (tf.int64, tf.int64)>
</code></pre></div></div>
<h2 id="build-the-model">Build the model</h2>
<p>Use <a href="https://www.tensorflow.org/api_docs/python/tf/keras/Sequential">tf.keras.Sequential</a> to define the model. For this simple example three layers are used to define our model:</p>
<p>Use tf.keras.Sequential to define the model. For this simple example three layers are used to define our model:</p>
<ul>
<li><a href="https://www.tensorflow.org/api_docs/python/tf/keras/layers/Embedding">tf.keras.layers.Embedding</a>: The input layer. A trainable lookup table that will map the numbers of each character to a vector with embedding_dim dimensions;</li>
<li><a href="https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU">tf.keras.layers.GRU</a>: A type of RNN with size units=rnn_units (You can also use a LSTM layer here.)</li>
<li><a href="https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense">tf.keras.layers.Dense</a>: The output layer, with vocab_size outputs.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Length of the vocabulary in chars
</span><span class="n">vocab_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">vocab</span><span class="p">)</span>
<span class="c1"># The embedding dimension
</span><span class="n">embedding_dim</span> <span class="o">=</span> <span class="mi">256</span>
<span class="c1"># Number of RNN units
</span><span class="n">rnn_units</span> <span class="o">=</span> <span class="mi">1024</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">build_model</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">,</span> <span class="n">embedding_dim</span><span class="p">,</span> <span class="n">rnn_units</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">):</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">Sequential</span><span class="p">([</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">Embedding</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">,</span> <span class="n">embedding_dim</span><span class="p">,</span>
<span class="n">batch_input_shape</span><span class="o">=</span><span class="p">[</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">None</span><span class="p">]),</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">GRU</span><span class="p">(</span><span class="n">rnn_units</span><span class="p">,</span>
<span class="n">return_sequences</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">stateful</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">recurrent_initializer</span><span class="o">=</span><span class="s">'glorot_uniform'</span><span class="p">),</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">Dense</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">)</span>
<span class="p">])</span>
<span class="k">return</span> <span class="n">model</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span> <span class="o">=</span> <span class="n">build_model</span><span class="p">(</span>
<span class="n">vocab_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">vocab</span><span class="p">),</span>
<span class="n">embedding_dim</span><span class="o">=</span><span class="n">embedding_dim</span><span class="p">,</span>
<span class="n">rnn_units</span><span class="o">=</span><span class="n">rnn_units</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="n">BATCH_SIZE</span><span class="p">)</span>
</code></pre></div></div>
<p>For each character the model looks up the embedding, runs the GRU one timestep with the embedding as input, and applies the dense layer to generate logits predicting the log-likelihood of the next character:</p>
<p><img src="/assets/images/2020-08-01-Character-Level-Language-Modeling-with-RNN/text_generation_training.png" alt="A drawing of the data passing through the model [Credits - Tensorflow]" class="img-responsive" /></p>
<h2 id="try-the-model">Try the model</h2>
<p>Now run the model to see that it behaves as expected.</p>
<p>First check the shape of the output:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">input_example_batch</span><span class="p">,</span> <span class="n">target_example_batch</span> <span class="ow">in</span> <span class="n">dataset</span><span class="p">.</span><span class="n">take</span><span class="p">(</span><span class="mi">1</span><span class="p">):</span>
<span class="n">example_batch_predictions</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">input_example_batch</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">example_batch_predictions</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="s">"# (batch_size, sequence_length, vocab_size)"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(64, 23, 53) # (batch_size, sequence_length, vocab_size)
</code></pre></div></div>
<p>In the above example the sequence length of the input is 22 but the model can be run on inputs of any length:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (64, None, 256) 13568
_________________________________________________________________
gru (GRU) (64, None, 1024) 3938304
_________________________________________________________________
dense (Dense) (64, None, 53) 54325
=================================================================
Total params: 4,006,197
Trainable params: 4,006,197
Non-trainable params: 0
_________________________________________________________________
</code></pre></div></div>
<p>To get actual predictions from the model we need to sample from the output distribution, to get actual character indices. This distribution is defined by the logits over the character vocabulary. Try it for the first example in the batch:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sampled_indices</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">categorical</span><span class="p">(</span><span class="n">example_batch_predictions</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">num_samples</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">sampled_indices</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">sampled_indices</span><span class="p">,</span><span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">).</span><span class="n">numpy</span><span class="p">()</span>
</code></pre></div></div>
<p>This gives us, at each timestep, a prediction of the next character index:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sampled_indices</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> array([13, 7, 12, 40, 33, 6, 49, 8, 26, 37, 46, 35, 25, 9, 51, 47, 25,
21, 27, 31, 0, 6, 35])
</code></pre></div></div>
<p>Decode these to see the text predicted by this untrained model:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span><span class="p">(</span><span class="s">"Input: </span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">input_example_batch</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">numpy</span><span class="p">()])))</span>
<span class="k">print</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Next Char Predictions: </span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="nb">repr</span><span class="p">(</span><span class="s">""</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">sampled_indices</span><span class="p">])))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Input:
'Archaeopteryx\n\n\n\n\n\n\n\n\n\n'
Next Char Predictions:
'MGLngFwHZktiYIyuYUae\nFi'
</code></pre></div></div>
<h2 id="train-the-model">Train the model</h2>
<p>At this point the problem can be treated as a standard classification problem. Given the previous RNN state, and the input this time step, predict the class of the next character.
Attach an optimizer, and a loss function</p>
<p>The standard <a href="https://www.tensorflow.org/api_docs/python/tf/keras/losses/sparse_categorical_crossentropy">tf.keras.losses.sparse_categorical_crossentropy</a> loss function works in this case because it is applied across the last dimension of the predictions.</p>
<p>Because our model returns logits, we need to set the <code class="language-plaintext highlighter-rouge">from_logits</code> flag.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="n">labels</span><span class="p">,</span> <span class="n">logits</span><span class="p">):</span>
<span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">sparse_categorical_crossentropy</span><span class="p">(</span><span class="n">labels</span><span class="p">,</span> <span class="n">logits</span><span class="p">,</span> <span class="n">from_logits</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">example_batch_loss</span> <span class="o">=</span> <span class="n">loss</span><span class="p">(</span><span class="n">target_example_batch</span><span class="p">,</span> <span class="n">example_batch_predictions</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Prediction shape: "</span><span class="p">,</span> <span class="n">example_batch_predictions</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="s">" # (batch_size, sequence_length, vocab_size)"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"scalar_loss: "</span><span class="p">,</span> <span class="n">example_batch_loss</span><span class="p">.</span><span class="n">numpy</span><span class="p">().</span><span class="n">mean</span><span class="p">())</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Prediction shape: (64, 23, 53) # (batch_size, sequence_length, vocab_size)
scalar_loss: 3.9535964
</code></pre></div></div>
<p>Configure the training procedure using the <a href="https://www.tensorflow.org/api_docs/python/tf/keras/Model#compile">tf.keras.Model.compile</a> method. We’ll use <a href="https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam">tf.keras.optimizers.Adam</a> with default arguments and the loss function.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="s">'adam'</span><span class="p">,</span> <span class="n">loss</span><span class="o">=</span><span class="n">loss</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="configure-checkpoints">Configure checkpoints</h2>
<p>Use a <a href="https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint">tf.keras.callbacks.ModelCheckpoint</a> to ensure that checkpoints are saved during training:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Directory where the checkpoints will be saved
</span><span class="n">checkpoint_dir</span> <span class="o">=</span> <span class="s">'./training_checkpoints'</span>
<span class="c1"># Name of the checkpoint files
</span><span class="n">checkpoint_prefix</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">path</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">checkpoint_dir</span><span class="p">,</span> <span class="s">"ckpt_{epoch}"</span><span class="p">)</span>
<span class="n">checkpoint_callback</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">callbacks</span><span class="p">.</span><span class="n">ModelCheckpoint</span><span class="p">(</span>
<span class="n">filepath</span><span class="o">=</span><span class="n">checkpoint_prefix</span><span class="p">,</span>
<span class="n">save_weights_only</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="execute-the-training">Execute the training</h2>
<p>To keep training time reasonable, use 20 epochs to train the model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">EPOCHS</span><span class="o">=</span><span class="mi">20</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">history</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="n">EPOCHS</span><span class="p">,</span> <span class="n">callbacks</span><span class="o">=</span><span class="p">[</span><span class="n">checkpoint_callback</span><span class="p">])</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Epoch 1/20
24/24 [==============================] - 15s 625ms/step - loss: 3.0601
.
.
.
Epoch 20/20
24/24 [==============================] - 21s 868ms/step - loss: 0.6993
</code></pre></div></div>
<h2 id="generate-text">Generate text</h2>
<h3 id="restore-the-latest-checkpoint">Restore the latest checkpoint</h3>
<p>To keep this prediction step simple, use a batch size of 1.</p>
<p>Because of the way the RNN state is passed from timestep to timestep, the model only accepts a fixed batch size once built.</p>
<p>To run the model with a different <code class="language-plaintext highlighter-rouge">batch_size</code>, we need to rebuild the model and restore the weights from the checkpoint.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tf</span><span class="p">.</span><span class="n">train</span><span class="p">.</span><span class="n">latest_checkpoint</span><span class="p">(</span><span class="n">checkpoint_dir</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> './training_checkpoints/ckpt_20'
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span> <span class="o">=</span> <span class="n">build_model</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">,</span> <span class="n">embedding_dim</span><span class="p">,</span> <span class="n">rnn_units</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="n">load_weights</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">train</span><span class="p">.</span><span class="n">latest_checkpoint</span><span class="p">(</span><span class="n">checkpoint_dir</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">build</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">TensorShape</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="bp">None</span><span class="p">]))</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (1, None, 256) 13568
_________________________________________________________________
gru_1 (GRU) (1, None, 1024) 3938304
_________________________________________________________________
dense_1 (Dense) (1, None, 53) 54325
=================================================================
Total params: 4,006,197
Trainable params: 4,006,197
Non-trainable params: 0
_________________________________________________________________
</code></pre></div></div>
<h2 id="the-prediction-loop">The prediction loop</h2>
<p>The following code block generates the text:</p>
<ul>
<li>It Starts by choosing a start string, initializing the RNN state and setting the number of characters to generate.</li>
<li>Get the prediction distribution of the next character using the start string and the RNN state.</li>
<li>Then, use a categorical distribution to calculate the index of the predicted character. Use this predicted character as our next input to the model.</li>
<li>The RNN state returned by the model is fed back into the model so that it now has more context, instead than only one character. After predicting the next character, the modified RNN states are again fed back into the model, which is how it learns as it gets more context from the previously predicted characters.</li>
</ul>
<p><img src="/assets/images/2020-08-01-Character-Level-Language-Modeling-with-RNN/text_generation_sampling.png" alt="To generate text the model's output is fed back to the input [Credit - TensorFlow]" class="img-responsive" /></p>
<p>Looking at the generated text, you’ll see the model knows when to capitalize and when to end the generated text. With the small number of epochs it generates believable dinosaur names.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">generate_text</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">start_string</span><span class="p">):</span>
<span class="c1"># Evaluation step (generating text using the learned model)
</span>
<span class="c1"># Number of characters to generate
</span> <span class="n">num_generate</span> <span class="o">=</span> <span class="n">max_length</span>
<span class="c1"># Converting our start string to numbers (vectorizing)
</span> <span class="n">input_eval</span> <span class="o">=</span> <span class="p">[</span><span class="n">char2idx</span><span class="p">[</span><span class="n">s</span><span class="p">]</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">start_string</span><span class="p">]</span>
<span class="n">input_eval</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">(</span><span class="n">input_eval</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># Empty string to store our results
</span> <span class="n">text_generated</span> <span class="o">=</span> <span class="p">[]</span>
<span class="c1"># Low temperatures results in more predictable text.
</span> <span class="c1"># Higher temperatures results in more surprising text.
</span> <span class="c1"># Experiment to find the best setting.
</span> <span class="n">temperature</span> <span class="o">=</span> <span class="mf">0.2</span>
<span class="c1"># Here batch size == 1
</span> <span class="n">model</span><span class="p">.</span><span class="n">reset_states</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_generate</span><span class="p">):</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">input_eval</span><span class="p">)</span>
<span class="c1"># remove the batch dimension
</span> <span class="n">predictions</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
<span class="c1"># using a categorical distribution to predict the character returned by the model
</span> <span class="n">predictions</span> <span class="o">=</span> <span class="n">predictions</span> <span class="o">/</span> <span class="n">temperature</span>
<span class="n">predicted_id</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">categorical</span><span class="p">(</span><span class="n">predictions</span><span class="p">,</span> <span class="n">num_samples</span><span class="o">=</span><span class="mi">1</span><span class="p">)[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">].</span><span class="n">numpy</span><span class="p">()</span>
<span class="c1"># We pass the predicted character as the next input to the model
</span> <span class="c1"># along with the previous hidden state
</span> <span class="n">input_eval</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">expand_dims</span><span class="p">([</span><span class="n">predicted_id</span><span class="p">],</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">text_generated</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">idx2char</span><span class="p">[</span><span class="n">predicted_id</span><span class="p">])</span>
<span class="k">return</span> <span class="p">(</span><span class="n">start_string</span> <span class="o">+</span> <span class="s">''</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="n">text_generated</span><span class="p">))</span>
</code></pre></div></div>
<p>Let us generate a new dinosaur name, that starts with Ram.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">generated_text</span> <span class="o">=</span> <span class="n">generate_text</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">start_string</span><span class="o">=</span><span class="sa">u</span><span class="s">"Ram"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">generated_text</span><span class="p">.</span><span class="n">strip</span><span class="p">())</span> <span class="c1"># strip new characters and print
</span></code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Ramberaptor
</code></pre></div></div>
<p>Try to generate dinosaur names yourself and while doing so play around with the <code class="language-plaintext highlighter-rouge">temperature</code> parameter above. The easiest thing you can do to improve the results it to train it for longer (try <code class="language-plaintext highlighter-rouge">EPOCHS=30</code>).</p>
<p>You can also experiment with a different start string, or try adding another RNN layer to improve the model’s accuracy, or adjusting the temperature parameter to generate more or less random predictions.</p>
<h2 id="customized-training">Customized Training</h2>
<p>The above training procedure is simple, but does not give you much control.</p>
<p>So now that you’ve seen how to run the model manually let’s unpack the training loop, and implement it ourselves. This gives a starting point if, for example, to implement curriculum learning to help stabilize the model’s open-loop output.</p>
<p>We will use <code class="language-plaintext highlighter-rouge">tf.GradientTape</code> to track the gradients. You can learn more about this approach by reading the <a href="https://www.tensorflow.org/guide/eager">eager execution guide</a>.</p>
<p>The procedure works as follows:</p>
<ul>
<li>First, initialize the RNN state. We do this by calling the <a href="https://www.tensorflow.org/api_docs/python/tf/keras/Model#reset_states">tf.keras.Model.reset_states</a> method.</li>
<li>Next, iterate over the dataset (batch by batch) and calculate the predictions associated with each.</li>
<li>Open a <a href="https://www.tensorflow.org/api_docs/python/tf/GradientTape">tf.GradientTape</a>, and calculate the predictions and loss in that context.</li>
<li>Calculate the gradients of the loss with respect to the model variables using the <code class="language-plaintext highlighter-rouge">tf.GradientTape.grads</code> method.</li>
<li>Finally, take a step downwards by using the optimizer’s <code class="language-plaintext highlighter-rouge">tf.train.Optimizer.apply_gradients</code> method.</li>
</ul>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span> <span class="o">=</span> <span class="n">build_model</span><span class="p">(</span>
<span class="n">vocab_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">vocab</span><span class="p">),</span>
<span class="n">embedding_dim</span><span class="o">=</span><span class="n">embedding_dim</span><span class="p">,</span>
<span class="n">rnn_units</span><span class="o">=</span><span class="n">rnn_units</span><span class="p">,</span>
<span class="n">batch_size</span><span class="o">=</span><span class="n">BATCH_SIZE</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">optimizer</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">@</span><span class="n">tf</span><span class="p">.</span><span class="n">function</span>
<span class="k">def</span> <span class="nf">train_step</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="n">target</span><span class="p">):</span>
<span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">GradientTape</span><span class="p">()</span> <span class="k">as</span> <span class="n">tape</span><span class="p">:</span>
<span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reduce_mean</span><span class="p">(</span>
<span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">sparse_categorical_crossentropy</span><span class="p">(</span>
<span class="n">target</span><span class="p">,</span> <span class="n">predictions</span><span class="p">,</span> <span class="n">from_logits</span><span class="o">=</span><span class="bp">True</span><span class="p">))</span>
<span class="n">grads</span> <span class="o">=</span> <span class="n">tape</span><span class="p">.</span><span class="n">gradient</span><span class="p">(</span><span class="n">loss</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">trainable_variables</span><span class="p">)</span>
<span class="n">optimizer</span><span class="p">.</span><span class="n">apply_gradients</span><span class="p">(</span><span class="nb">zip</span><span class="p">(</span><span class="n">grads</span><span class="p">,</span> <span class="n">model</span><span class="p">.</span><span class="n">trainable_variables</span><span class="p">))</span>
<span class="k">return</span> <span class="n">loss</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Training step
</span><span class="n">EPOCHS</span> <span class="o">=</span> <span class="mi">10</span>
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">EPOCHS</span><span class="p">):</span>
<span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span>
<span class="c1"># initializing the hidden state at the start of every epoch
</span> <span class="c1"># initally hidden is None
</span> <span class="n">hidden</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">reset_states</span><span class="p">()</span>
<span class="k">for</span> <span class="p">(</span><span class="n">batch_n</span><span class="p">,</span> <span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="n">target</span><span class="p">))</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">dataset</span><span class="p">):</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">train_step</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span>
<span class="k">if</span> <span class="n">batch_n</span> <span class="o">%</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">template</span> <span class="o">=</span> <span class="s">'Epoch {} Batch {} Loss {}'</span>
<span class="k">print</span><span class="p">(</span><span class="n">template</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">epoch</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">batch_n</span><span class="p">,</span> <span class="n">loss</span><span class="p">))</span>
<span class="c1"># saving (checkpoint) the model every 5 epochs
</span> <span class="k">if</span> <span class="p">(</span><span class="n">epoch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">5</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">model</span><span class="p">.</span><span class="n">save_weights</span><span class="p">(</span><span class="n">checkpoint_prefix</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">epoch</span><span class="o">=</span><span class="n">epoch</span><span class="p">))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">'Epoch {} Loss {:.4f}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">epoch</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">loss</span><span class="p">))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">'Time taken for 1 epoch {} sec</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">time</span><span class="p">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">save_weights</span><span class="p">(</span><span class="n">checkpoint_prefix</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">epoch</span><span class="o">=</span><span class="n">epoch</span><span class="p">))</span>
</code></pre></div></div>
<p><strong>Output:</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Epoch 1 Batch 0 Loss 3.9669971466064453
Epoch 1 Loss 1.5980
Time taken for 1 epoch 21.90713667869568 sec
.
.
.
Epoch 10 Batch 0 Loss 0.9955548644065857
Epoch 10 Loss 0.8829
Time taken for 1 epoch 15.176138639450073 sec
</code></pre></div></div>
<h2 id="references">References</h2>
<ul>
<li><a href="https://www.tensorflow.org/tutorials/text/text_generation">Text Generation with RNN</a></li>
<li><a href="https://www.coursera.org/learn/intro-to-deep-learning">Week 5 of Introduction to Deep Learning</a></li>
</ul>Vaibhav SharmaIn this tutorial we are going to learn to generate text using a character-based RNN. We will use a toy dataset, which consists dinosaur names, to generate new and unique dinosaur names. This toy example may not have many practical applications, but it can be used to learn fundamentals, which can then be applied to a more challenging real world problem. So, let us get started by reviewing the goals and outline of this tutorial:Logistic Regression with a Neural Network Mindset2023-01-07T00:00:00+00:002023-01-07T00:00:00+00:00https://vbvsharma.com/neural%20networks/2023/01/07/Logistic-Regression-with-a-Neural-Network-Mindset<p>Today we are going to implement logistic regression as a neural network. This is definitely one of the simplest neural network, and is great to get your feet wet in neural network. After completing this tutorial, you will know:</p>
<ul>
<li>How to implement logistic regression.</li>
<li>How to use logistic regression.</li>
<li>How to use gradient descent.</li>
<li>How a neural network works.</li>
<li>Case study - Breast Cancer Wisconsin Data Set (predict whether the tumor is benign or malignant)</li>
</ul>
<h2 id="introduction-to-logistic-regression">Introduction to Logistic Regression</h2>
<p><a href="https://en.wikipedia.org/wiki/Logistic_regression">Logistic regression</a> is a binary classification algorithm, which can be used to classify linearly seperable classes, i.e. two classes that can be separated by a line. It is one of the most basic algorithms. Most of them time, this is the first algorithm that I use for classification. It kind of provides the lower bound for accuracy. Let us see the architecture of logistic regression.</p>
<h1 id="the-general-architecture-of-the-learning-algorithm">The General Architecture of the learning algorithm</h1>
<p>Before you get started please check out the <strong><a href="/assets/docs/2019-06-29-Logistic-Regression-with-a-Neural-Network-Mindset/deep-learning-notation.pdf">notation</a></strong> that is being used below.
The following image explains the architecture of logistic regression as a neural network:</p>
<p><img src="/assets/images/2019-06-29-Logistic-Regression-with-a-Neural-Network-Mindset/Logistic-Regression-as-a-Neural-Network.png" alt="Logistic Regression as a Neural Network" class="img-responsive" /></p>
<p><strong>Mathematical expression of the algorithm</strong></p>
<p>For one example \(x^{(i)}\):</p>
\[z^{(i)} = w^T x^{(i)} + b\]
\[\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\]
\[\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\]
<p>The cost is then computed by summing over all training examples:
\(J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\)</p>
<p><strong>Key steps</strong>:
We will carry out the following steps:
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost<br />
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude</p>
<h2 id="building-the-parts-of-our-algorithm">Building the parts of our algorithm</h2>
<p>The main steps for building a Neural Network are:</p>
<ol>
<li>Define the model structure (such as number of input features)</li>
<li>Initialize the model’s parameters</li>
<li>Loop:
<ul>
<li>Calculate current loss (forward propagation)</li>
<li>Calculate current gradient (backward propagation)</li>
<li>Update parameters (gradient descent)</li>
</ul>
</li>
</ol>
<p>You often build 1-3 separately and integrate them into one function we call <code class="language-plaintext highlighter-rouge">model()</code>.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># import the libraries needed
</span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
</code></pre></div></div>
<h3 id="helper-functions"><strong>Helper functions</strong></h3>
<p>Implement <code class="language-plaintext highlighter-rouge">sigmoid()</code> function to compute $sigmoid( w^T x + b)$ to make predictions.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sigmoid</span><span class="p">(</span><span class="n">z</span><span class="p">):</span>
<span class="s">"""
Computer the sigmoid of z
Arguments:
x -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""</span>
<span class="c1"># Calculate sigmoid
</span> <span class="n">s</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">z</span><span class="p">))</span>
<span class="k">return</span> <span class="n">s</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># function to test sigmoid
</span><span class="k">def</span> <span class="nf">test_sigmoid</span><span class="p">():</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"sigmoid(0) = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">sigmoid</span><span class="p">(</span><span class="mi">0</span><span class="p">)))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"sigmoid(9.2) = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">sigmoid</span><span class="p">(</span><span class="mf">9.2</span><span class="p">)))</span>
<span class="n">test_sigmoid</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> sigmoid(0) = 0.5
sigmoid(9.2) = 0.999898970806
</code></pre></div></div>
<h3 id="initializing-parameters"><strong>Initializing parameters</strong></h3>
<p>Implement parameter initialization to initialize w as a vector of zeros and b to zero.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">initialize_with_zeros</span><span class="p">(</span><span class="n">dim</span><span class="p">):</span>
<span class="s">"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""</span>
<span class="c1"># Initialize `w` and `b`
</span> <span class="n">w</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">dim</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">b</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">assert</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">shape</span> <span class="o">==</span> <span class="p">(</span><span class="n">dim</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">assert</span><span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nb">float</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="nb">int</span><span class="p">))</span>
<span class="k">return</span> <span class="n">w</span><span class="p">,</span> <span class="n">b</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_initialize_with_zeros</span><span class="p">(</span><span class="n">dim</span><span class="p">):</span>
<span class="n">w</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">initialize_with_zeros</span><span class="p">(</span><span class="n">dim</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"w = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">w</span><span class="p">))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"b = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">b</span><span class="p">))</span>
<span class="n">test_initialize_with_zeros</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Output</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> w = [[ 0.]
[ 0.]]
b = 0
</code></pre></div></div>
<h3 id="forward-and-backward-propagation"><strong>Forward and Backward propagation</strong></h3>
<p>Now that your parameters are initialized, you can do the “forward” and “backward” propagation steps for learning the parameters.
Forward Propagation:</p>
<ul>
<li>You get X</li>
<li>You compute \(A = \sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})\)</li>
<li>You calculate the cost function: \(J = -\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})\)</li>
</ul>
<p>Here are the two formulas you will be using:</p>
\[\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\]
\[\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\]
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">propagate</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">):</span>
<span class="s">"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array
b -- bias, a scalar
X -- data of size (number of features, number of examples)
Y -- true "label" vector of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
"""</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="c1">### Forward Propagation
</span> <span class="c1"># compute activation
</span> <span class="n">A</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span>
<span class="c1"># compute cost
</span> <span class="n">cost</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">Y</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">Y</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">A</span><span class="p">)))</span>
<span class="c1">### Backward Propagation
</span> <span class="n">dw</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="p">(</span><span class="n">A</span> <span class="o">-</span> <span class="n">Y</span><span class="p">).</span><span class="n">T</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">A</span> <span class="o">-</span> <span class="n">Y</span><span class="p">)</span>
<span class="k">assert</span><span class="p">(</span><span class="n">dw</span><span class="p">.</span><span class="n">shape</span> <span class="o">==</span> <span class="n">w</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">assert</span><span class="p">(</span><span class="n">db</span><span class="p">.</span><span class="n">dtype</span> <span class="o">==</span> <span class="nb">float</span><span class="p">)</span>
<span class="n">cost</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">squeeze</span><span class="p">(</span><span class="n">cost</span><span class="p">)</span>
<span class="k">assert</span><span class="p">(</span><span class="n">cost</span><span class="p">.</span><span class="n">shape</span> <span class="o">==</span> <span class="p">())</span>
<span class="n">grads</span> <span class="o">=</span> <span class="p">{</span><span class="s">"dw"</span><span class="p">:</span> <span class="n">dw</span><span class="p">,</span>
<span class="s">"db"</span><span class="p">:</span> <span class="n">db</span><span class="p">}</span>
<span class="k">return</span> <span class="n">grads</span><span class="p">,</span> <span class="n">cost</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_propagate</span><span class="p">():</span>
<span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">]]),</span> <span class="mi">2</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]]),</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]])</span>
<span class="n">grads</span><span class="p">,</span> <span class="n">cost</span> <span class="o">=</span> <span class="n">propagate</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"dw = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"dw"</span><span class="p">]))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"db = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"db"</span><span class="p">]))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"cost = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">cost</span><span class="p">))</span>
<span class="n">test_propagate</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> dw = [[ 0.99993216]
[ 1.99980262]]
db = 0.499935230625
cost = 6.00006477319
</code></pre></div></div>
<h3 id="optimization"><strong>Optimization</strong></h3>
<ul>
<li>You have initialized your parameters.</li>
<li>You are also able to compute a cost function and its gradient.</li>
<li>Now, you want to update the parameters using gradient descent.</li>
</ul>
<p>Implementing the optimization function. The goal is to learn $w$ and $b$ by minimizing the cost function $J$. For a parameter $\theta$, the update rule is $ \theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">optimize</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">num_iterations</span><span class="p">,</span> <span class="n">learning_rate</span><span class="p">,</span> <span class="n">print_cost</span> <span class="o">=</span> <span class="bp">False</span><span class="p">):</span>
<span class="s">"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (number of features, 1)
b -- bias, a scalar
X -- data of shape (number of features, number of examples)
Y -- true "label" vector (containing 0 or 1), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""</span>
<span class="n">costs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_iterations</span><span class="p">):</span>
<span class="c1"># Cost and gradient calculation
</span> <span class="n">grads</span><span class="p">,</span> <span class="n">cost</span> <span class="o">=</span> <span class="n">propagate</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">)</span>
<span class="c1"># Retrieve derivatives from grads
</span> <span class="n">dw</span> <span class="o">=</span> <span class="n">grads</span><span class="p">[</span><span class="s">"dw"</span><span class="p">]</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">grads</span><span class="p">[</span><span class="s">"db"</span><span class="p">]</span>
<span class="c1"># update weights and bias
</span> <span class="n">w</span> <span class="o">=</span> <span class="n">w</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">dw</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">b</span> <span class="o">-</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">db</span>
<span class="c1"># Record the costs
</span> <span class="k">if</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">costs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cost</span><span class="p">)</span>
<span class="c1"># Print the cost every 100 training examples
</span> <span class="k">if</span> <span class="n">print_cost</span> <span class="ow">and</span> <span class="n">i</span> <span class="o">%</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"Cost after iteration %i: %f"</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">cost</span><span class="p">))</span>
<span class="n">params</span> <span class="o">=</span> <span class="p">{</span><span class="s">"w"</span><span class="p">:</span> <span class="n">w</span><span class="p">,</span>
<span class="s">"b"</span><span class="p">:</span> <span class="n">b</span><span class="p">}</span>
<span class="n">grads</span> <span class="o">=</span> <span class="p">{</span><span class="s">"dw"</span><span class="p">:</span> <span class="n">dw</span><span class="p">,</span>
<span class="s">"db"</span><span class="p">:</span> <span class="n">db</span><span class="p">}</span>
<span class="k">return</span> <span class="n">params</span><span class="p">,</span> <span class="n">grads</span><span class="p">,</span> <span class="n">costs</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_optimize</span><span class="p">():</span>
<span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">]]),</span> <span class="mi">2</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]]),</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]])</span>
<span class="n">params</span><span class="p">,</span> <span class="n">grads</span><span class="p">,</span> <span class="n">costs</span> <span class="o">=</span> <span class="n">optimize</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">num_iterations</span><span class="o">=</span> <span class="mi">100</span><span class="p">,</span> <span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.009</span><span class="p">,</span> <span class="n">print_cost</span> <span class="o">=</span> <span class="bp">False</span><span class="p">)</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"w = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s">"w"</span><span class="p">]))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"b = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s">"b"</span><span class="p">]))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"dw = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"dw"</span><span class="p">]))</span>
<span class="k">print</span> <span class="p">(</span><span class="s">"db = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"db"</span><span class="p">]))</span>
<span class="n">test_optimize</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> w = [[ 0.1124579 ]
[ 0.23106775]]
b = 1.55930492484
dw = [[ 0.90158428]
[ 1.76250842]]
db = 0.430462071679
</code></pre></div></div>
<h3 id="predict"><strong>Predict</strong></h3>
<p>The previous function will output the learned w and b. We are able to use w and b to predict the labels for a dataset X. Implement the <code class="language-plaintext highlighter-rouge">predict()</code> function. There is two steps to computing predictions:</p>
<ol>
<li>
<p>Calculate \(\hat{Y} = A = \sigma(w^T X + b)\)</p>
</li>
<li>
<p>Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector <code class="language-plaintext highlighter-rouge">Y_prediction</code>.</p>
</li>
</ol>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="s">"""
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (number of features, 1)
b -- bias, a scalar
X -- data of size (number of features, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
"""</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">Y_prediction</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">m</span><span class="p">))</span>
<span class="n">w</span> <span class="o">=</span> <span class="n">w</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1"># Compute vector "A" predicting the probabilities of class being "1"
</span> <span class="n">A</span> <span class="o">=</span> <span class="n">sigmoid</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">w</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span> <span class="o">+</span> <span class="n">b</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">A</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]):</span>
<span class="c1"># Convert probabilities a[0,i] to actual predictions p[0,i]
</span> <span class="n">Y_prediction</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">A</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span><span class="p">]</span> <span class="o">></span> <span class="mf">0.5</span> <span class="k">else</span> <span class="mi">0</span>
<span class="k">assert</span><span class="p">(</span><span class="n">Y_prediction</span><span class="p">.</span><span class="n">shape</span> <span class="o">==</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">m</span><span class="p">))</span>
<span class="k">return</span> <span class="n">Y_prediction</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">test_predict</span><span class="p">():</span>
<span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">]]),</span> <span class="mi">2</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">]])</span>
<span class="k">print</span><span class="p">(</span><span class="s">"predictions = "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">predict</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X</span><span class="p">)))</span>
<span class="n">test_predict</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Output</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>predictions = [[ 1. 1.]]
</code></pre></div></div>
<p><strong>What to remember:</strong>
You’ve implemented several functions that:</p>
<ul>
<li>Initialize (w,b)</li>
<li>Optimize the loss iteratively to learn parameters (w,b):
<ul>
<li>computing the cost and its gradient</li>
<li>updating the parameters using gradient descent</li>
</ul>
</li>
<li>Use the learned (w,b) to predict the labels for a given set of examples</li>
</ul>
<h3 id="merge-all-functions-into-a-model"><strong>Merge all functions into a model</strong></h3>
<p>You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.</p>
<p>Implement the model function. Use the following notation:
- Y_prediction for your predictions on the test set
- Y_prediction_train for your predictions on the train set
- w, costs, grads for the outputs of optimize()</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">model</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">Y_test</span><span class="p">,</span> <span class="n">num_iterations</span><span class="o">=</span><span class="mi">2000</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">print_cost</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
<span class="s">"""
Builds the logistic regression model by calling the function you've implemented previously
Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""</span>
<span class="c1"># initialize parameters with zeros
</span> <span class="n">w</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">initialize_with_zeros</span><span class="p">(</span><span class="n">X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="c1"># Gradient descent
</span> <span class="n">parameters</span><span class="p">,</span> <span class="n">grads</span><span class="p">,</span> <span class="n">costs</span> <span class="o">=</span> <span class="n">optimize</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">,</span> <span class="n">num_iterations</span><span class="p">,</span> <span class="n">learning_rate</span><span class="p">,</span> <span class="n">print_cost</span><span class="p">)</span>
<span class="c1"># Retrieve parameters w and b from dictionary "parameters"
</span> <span class="n">w</span> <span class="o">=</span> <span class="n">parameters</span><span class="p">[</span><span class="s">"w"</span><span class="p">]</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">parameters</span><span class="p">[</span><span class="s">"b"</span><span class="p">]</span>
<span class="c1"># Predict test/train set examples
</span> <span class="n">Y_prediction_test</span> <span class="o">=</span> <span class="n">predict</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X_test</span><span class="p">)</span>
<span class="n">Y_prediction_train</span> <span class="o">=</span> <span class="n">predict</span><span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">X_train</span><span class="p">)</span>
<span class="c1"># Print train/test Errors
</span> <span class="k">print</span><span class="p">(</span><span class="s">"train accuracy: {} %"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="mi">100</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">Y_prediction_train</span> <span class="o">-</span> <span class="n">Y_train</span><span class="p">))</span> <span class="o">*</span> <span class="mi">100</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"test accuracy: {} %"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="mi">100</span> <span class="o">-</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">Y_prediction_test</span> <span class="o">-</span> <span class="n">Y_test</span><span class="p">))</span> <span class="o">*</span> <span class="mi">100</span><span class="p">))</span>
<span class="n">d</span> <span class="o">=</span> <span class="p">{</span><span class="s">"costs"</span><span class="p">:</span> <span class="n">costs</span><span class="p">,</span>
<span class="s">"Y_prediction_test"</span><span class="p">:</span> <span class="n">Y_prediction_test</span><span class="p">,</span>
<span class="s">"Y_prediction_train"</span> <span class="p">:</span> <span class="n">Y_prediction_train</span><span class="p">,</span>
<span class="s">"w"</span> <span class="p">:</span> <span class="n">w</span><span class="p">,</span>
<span class="s">"b"</span> <span class="p">:</span> <span class="n">b</span><span class="p">,</span>
<span class="s">"learning_rate"</span> <span class="p">:</span> <span class="n">learning_rate</span><span class="p">,</span>
<span class="s">"num_iterations"</span><span class="p">:</span> <span class="n">num_iterations</span><span class="p">}</span>
<span class="k">return</span> <span class="n">d</span>
</code></pre></div></div>
<p>We have successfully implemented logistic regression as a neural network. Now, let us use it for prediction.</p>
<h2 id="case-study---breast-cancer-wisconsin-data-set">Case Study - Breast Cancer Wisconsin Data Set</h2>
<p><a href="http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29">Breast Cancer Wisconsin (Diagnostic) Data Set</a>’s features can be used to predict the type of tumor, malignant or benign. You can check the data set’s description <a href="http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.names">here</a> and download it from <a href="http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data">here</a>. Its attributes are as follows:</p>
<h4 id="attribute-information"><strong>Attribute Information</strong></h4>
<p>1) ID number</p>
<p>2) Diagnosis (M = malignant, B = benign)</p>
<p>3-32) Other attributes</p>
<p>Now, let us load this data set.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Ouput</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(569, 32)
</code></pre></div></div>
<p>So we have 569 rows and 32 coulmns.</p>
<p>Let us check what the data looks like?</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Explore a bit
</span><span class="n">data</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>
<p><strong>Ouput</strong></p>
<div>
<style>
.dataframe thead tr:only-child th {
text-align: right;
}
.dataframe thead th {
text-align: left;
}
.dataframe tbody tr th {
vertical-align: top;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>...</th>
<th>22</th>
<th>23</th>
<th>24</th>
<th>25</th>
<th>26</th>
<th>27</th>
<th>28</th>
<th>29</th>
<th>30</th>
<th>31</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>842302</td>
<td>M</td>
<td>17.99</td>
<td>10.38</td>
<td>122.80</td>
<td>1001.0</td>
<td>0.11840</td>
<td>0.27760</td>
<td>0.3001</td>
<td>0.14710</td>
<td>...</td>
<td>25.38</td>
<td>17.33</td>
<td>184.60</td>
<td>2019.0</td>
<td>0.1622</td>
<td>0.6656</td>
<td>0.7119</td>
<td>0.2654</td>
<td>0.4601</td>
<td>0.11890</td>
</tr>
<tr>
<th>1</th>
<td>842517</td>
<td>M</td>
<td>20.57</td>
<td>17.77</td>
<td>132.90</td>
<td>1326.0</td>
<td>0.08474</td>
<td>0.07864</td>
<td>0.0869</td>
<td>0.07017</td>
<td>...</td>
<td>24.99</td>
<td>23.41</td>
<td>158.80</td>
<td>1956.0</td>
<td>0.1238</td>
<td>0.1866</td>
<td>0.2416</td>
<td>0.1860</td>
<td>0.2750</td>
<td>0.08902</td>
</tr>
<tr>
<th>2</th>
<td>84300903</td>
<td>M</td>
<td>19.69</td>
<td>21.25</td>
<td>130.00</td>
<td>1203.0</td>
<td>0.10960</td>
<td>0.15990</td>
<td>0.1974</td>
<td>0.12790</td>
<td>...</td>
<td>23.57</td>
<td>25.53</td>
<td>152.50</td>
<td>1709.0</td>
<td>0.1444</td>
<td>0.4245</td>
<td>0.4504</td>
<td>0.2430</td>
<td>0.3613</td>
<td>0.08758</td>
</tr>
<tr>
<th>3</th>
<td>84348301</td>
<td>M</td>
<td>11.42</td>
<td>20.38</td>
<td>77.58</td>
<td>386.1</td>
<td>0.14250</td>
<td>0.28390</td>
<td>0.2414</td>
<td>0.10520</td>
<td>...</td>
<td>14.91</td>
<td>26.50</td>
<td>98.87</td>
<td>567.7</td>
<td>0.2098</td>
<td>0.8663</td>
<td>0.6869</td>
<td>0.2575</td>
<td>0.6638</td>
<td>0.17300</td>
</tr>
<tr>
<th>4</th>
<td>84358402</td>
<td>M</td>
<td>20.29</td>
<td>14.34</td>
<td>135.10</td>
<td>1297.0</td>
<td>0.10030</td>
<td>0.13280</td>
<td>0.1980</td>
<td>0.10430</td>
<td>...</td>
<td>22.54</td>
<td>16.67</td>
<td>152.20</td>
<td>1575.0</td>
<td>0.1374</td>
<td>0.2050</td>
<td>0.4000</td>
<td>0.1625</td>
<td>0.2364</td>
<td>0.07678</td>
</tr>
</tbody>
</table>
<p>5 rows × 32 columns</p>
</div>
<p>In this set, the 0<sup>th</sup> column is <code class="language-plaintext highlighter-rouge">id</code>, 1<sup>st</sup> is the class, <code class="language-plaintext highlighter-rouge">B</code> for benign and <code class="language-plaintext highlighter-rouge">M</code> for malignant, and further columns are real-valued input features. Let us gather the input features and corresponding output classes from data.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Extract input features from column 2 to 32
</span><span class="n">X</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">:]</span>
<span class="c1"># Extract diagnosis class column 1
</span><span class="n">Y</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Changing M to 1 and B to 0, for malignant and benign respectively
</span><span class="n">Y</span> <span class="o">=</span> <span class="p">(</span><span class="n">Y</span> <span class="o">==</span> <span class="s">'M'</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">float64</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Changing DataFrame to numpy arrays
</span><span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">values</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">Y</span><span class="p">.</span><span class="n">values</span>
<span class="c1"># Changing shape of Y
</span><span class="n">Y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span> <span class="p">(</span><span class="n">Y</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># Normalize the data features
</span><span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">normalize</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
</code></pre></div></div>
<p>While working with neural networks and optimization algorithms, always normalize your data features. This is done for following two reasons:</p>
<ul>
<li>It prevents the values involved in a network from becoming too large or too small, hence reduces the chances of overflow or underflow</li>
<li>It helps the optimizer to converge (or approach convergence) faster.</li>
</ul>
<p>To see what happens when you don’t normalize your data, try commenting out those lines and run the code. (You will get cost’s value nan for nearly all epochs).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Prepare training and test data
# Put 70% of examples in training set and the remaing 30% in testing set
</span><span class="n">X_train</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="p">:].</span><span class="n">T</span>
<span class="n">Y_train</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">Y</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="p">:].</span><span class="n">T</span>
<span class="n">X_test</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):,</span> <span class="p">:].</span><span class="n">T</span>
<span class="n">Y_test</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">Y</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):,</span> <span class="p">:].</span><span class="n">T</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Let's run the model now!!!
</span><span class="n">model</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">Y_test</span><span class="p">,</span> <span class="n">print_cost</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<p><strong>Ouput</strong></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Cost after iteration 0: 0.693147
Cost after iteration 100: 0.647681
Cost after iteration 200: 0.614502
Cost after iteration 300: 0.585481
Cost after iteration 400: 0.559981
Cost after iteration 500: 0.537452
Cost after iteration 600: 0.517429
Cost after iteration 700: 0.499527
Cost after iteration 800: 0.483430
Cost after iteration 900: 0.468875
Cost after iteration 1000: 0.455647
Cost after iteration 1100: 0.443567
Cost after iteration 1200: 0.432484
Cost after iteration 1300: 0.422276
Cost after iteration 1400: 0.412835
Cost after iteration 1500: 0.404075
Cost after iteration 1600: 0.395917
Cost after iteration 1700: 0.388299
Cost after iteration 1800: 0.381163
Cost after iteration 1900: 0.374461
train accuracy: 90.20100502512562 %
test accuracy: 95.32163742690058 %
{'Y_prediction_test': array([[ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 0.,
0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0., 0.,
0., 0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 1.,
0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0.,
0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0.,
0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 1.,
0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 1., 0., 0.,
0., 1., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 1., 0., 1., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.,
1., 0.]]),
'Y_prediction_train': array([[ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 1.,
0., 1., 1., 0., 1., 1., 0., 0., 0., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0.,
0., 0., 0., 1., 0., 0., 1., 0., 1., 0., 0., 0., 0.,
0., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1.,
1., 0., 0., 1., 0., 1., 0., 1., 0., 0., 1., 0., 1.,
1., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 0.,
0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 0., 0., 1., 0., 0., 0., 1., 0., 0., 0., 0.,
1., 1., 0., 0., 1., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 1., 1., 0., 1., 0., 0., 0., 1., 0., 0., 1., 0.,
0., 0., 0., 1., 0., 0., 0., 0., 0., 1., 0., 0., 0.,
1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 0., 1., 1.,
0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 1., 1.,
0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 1.,
0., 1., 1., 1., 1., 0., 1., 1., 1., 0., 0., 0., 0.,
0., 0., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1., 0.,
0., 0., 1., 0., 0., 0., 0., 0., 1., 1., 0., 0., 1.,
0., 0., 1., 1., 0., 1., 0., 0., 0., 0., 1., 0., 0.,
1., 0., 0., 1., 0., 1., 1., 1., 0., 1., 1., 1., 1.,
1., 0., 1., 0., 1., 1., 0., 0., 0., 0., 0., 0., 1.,
0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 0.,
0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 1., 0., 0., 0., 1., 0., 1., 0.,
0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1.,
0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 1., 0., 1., 1., 1., 0., 1., 1., 0., 0., 1.,
0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
0., 0., 1., 1., 0., 0., 0., 0.]]),
'b': -2.82004015631605,
'costs': [0.69314718055994518,
0.64768121608659124,
0.61450221434667129,
0.58548108050436609,
0.55998108981562744,
0.53745153619366792,
0.51742861842701759,
0.49952734366191881,
0.4834301607574058,
0.46887545101664851,
0.45564724081802283,
0.44356654183651678,
0.43248428219793783,
0.42227562488511811,
0.41283543232151004,
0.40407465235038642,
0.395917434789408,
0.38829882374096664,
0.3811629029392809,
0.37446129799735056],
'learning_rate': 0.5,
'num_iterations': 2000,
'w': array([[ 2.2402501 ],
[ 1.46136967],
[ 2.39127058],
[ 4.24369791],
[ 0.58075704],
[ 3.17224642],
[ 5.31606193],
[ 6.06826111],
[ 0.47147587],
[-0.25072496],
[ 3.70336513],
[-0.19500842],
[ 3.66594003],
[ 4.71289032],
[-0.68066637],
[ 1.12038991],
[ 0.69818476],
[ 1.72171634],
[-0.62144059],
[-0.66633135],
[ 2.92714447],
[ 1.82405968],
[ 3.06332473],
[ 5.21202012],
[ 1.06115042],
[ 4.13504157],
[ 4.99853943],
[ 5.33078157],
[ 1.25268158],
[ 0.8297114 ]])}
</code></pre></div></div>
<p>We get,</p>
<p>train accuracy: 90.20 %</p>
<p>test accuracy: 95.32 %</p>
<p>Not bad for such a simple algorithm.</p>
<p>Well done for completing the tutorial. Hope you liked it.</p>
<p>The following is the complete code of the application of the above model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <span class="n">normalize</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">"http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data"</span><span class="p">,</span> <span class="n">header</span><span class="o">=</span><span class="bp">None</span><span class="p">)</span>
<span class="c1"># Extract input features from column 2 to 32
</span><span class="n">X</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">:]</span>
<span class="c1"># Extract diagnosis class column 1
</span><span class="n">Y</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">iloc</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>
<span class="c1"># Changing M to 1 and B to 0, for malignant and benign respectively
</span><span class="n">Y</span> <span class="o">=</span> <span class="p">(</span><span class="n">Y</span> <span class="o">==</span> <span class="s">'M'</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">float64</span><span class="p">)</span>
<span class="c1"># Changing DataFrame to numpy arrays
</span><span class="n">X</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">values</span>
<span class="n">Y</span> <span class="o">=</span> <span class="n">Y</span><span class="p">.</span><span class="n">values</span>
<span class="c1"># Changing shape of Y
</span><span class="n">Y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">Y</span><span class="p">,</span> <span class="p">(</span><span class="n">Y</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># Normalize the data features
</span><span class="n">X</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="c1"># More preprocessing, preaparing data to be fed to our model
</span><span class="n">X_train</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="p">:].</span><span class="n">T</span>
<span class="n">Y_train</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[:</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">Y</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="p">:].</span><span class="n">T</span>
<span class="n">X_test</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):,</span> <span class="p">:].</span><span class="n">T</span>
<span class="n">Y_test</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[</span><span class="nb">int</span><span class="p">(</span><span class="mf">0.7</span><span class="o">*</span><span class="n">Y</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]):,</span> <span class="p">:].</span><span class="n">T</span>
<span class="c1"># Let's run the model now!!!
</span><span class="n">model</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">,</span> <span class="n">X_test</span><span class="p">,</span> <span class="n">Y_test</span><span class="p">,</span> <span class="n">print_cost</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<p>Things left for another day:</p>
<ul>
<li>Analysis of logistic regression.</li>
<li>How to choose appropriate threshold for the logistic regression.</li>
<li>Prevent overfitting and underfitting (Regularization, one of the methods).</li>
<li>Why logistic regression performs poorly on data whose classes cannot be linearly separated.</li>
<li>How to tweak logistic regression to support multiclass classification.</li>
<li>After getting 95% test accuracy, why we need to experiment more to get accurate results (cross-validation).</li>
</ul>Vaibhav SharmaToday we are going to implement logistic regression as a neural network. This is definitely one of the simplest neural network, and is great to get your feet wet in neural network. After completing this tutorial, you will know: How to implement logistic regression. How to use logistic regression. How to use gradient descent. How a neural network works. Case study - Breast Cancer Wisconsin Data Set (predict whether the tumor is benign or malignant)Linear Regression From Scratch2023-01-04T00:00:00+00:002023-01-04T00:00:00+00:00https://vbvsharma.com/misc/2023/01/04/Linear-Regression-From-Scratch<p>Linear regression is one of the most basic algorithms in machine learning and statistics, and it is also one of the best understood algorithms out there. Here, we are going to study multivariate linear regression, which is just just a fancy name for linear regression when multiple independent variables are involved. We do this for two reasons:</p>
<ul>
<li>Multivariate Linear Regression is more general than Univariate Linear Regression.</li>
<li>Practically you will find yourself using Multivariate Linear Regression more often than Univariate Linear Regression, as it involves learning from multiple features.</li>
</ul>
<p>Here you will discover:</p>
<ul>
<li>What is linear regression?</li>
<li>How is it implemented?</li>
<li>How to estimate linear regression coefficients using gradient descent?</li>
<li>Case study - Inferring Price of House</li>
</ul>
<p>Let us get started.</p>
<h2 id="introduction-to-linear-regression">Introduction to Linear Regression</h2>
<p>Linear regression is defined as a linear relationship between a dependent variable (say \(y\)) and one or more independent variable (say \(x_{1},x_{2}, ..., x_{n}\)). This can be written mathematically as folows:</p>
\[y = \theta_{0} + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_{n} x_{n}\]
<p>In the above equation, the value of \(y\) is being predicted, \(x_{1},x_{2}, ..., x_{n}\) are input features that are being used to predict \(y\), and \(\theta_{0}, \theta_{1}, ..., \theta_{n}\) are called linear regression coefficients or linear regression parameters. These coefficients can be found using gradient descent. (I call them parameters throughout this post.)</p>
<p>The above linear regression model can also be written in vectorized form:</p>
\[\boldsymbol{y} =
\left(\begin{array}{cc}
\theta_{0}\\
\theta_{1}\\
.\\
\theta_{n-1}\\
\theta_{n}\end{array}\right)
\boldsymbol{.}
\left(\begin{array}{cc}
1\\
x_1\\
.\\
x_{n-1}\\
x_{n}\end{array}\right)\]
<h2 id="gradient-descent">Gradient Descent</h2>
<p>Gradient descent is an optimizing algorithm. It is used to choose the parameters which minimizes error on the dataset.</p>
<p>We start by initializing the parameters with random weights and perform gradient descent for some iterations. In each iteration we update the parameters such that it gives lesser error in each iteration. The size of each step during descent is defined by the learning rate, which is denoted by \(\alpha\). The final parameters are hopefully best fit estimate. If they are not, we may have to tune the algorithm. But that’s a different story.</p>
<p>Let us now formalize our gradient descent algorithm. Suppose we have <strong>m</strong> training examples, each training example has <strong>n</strong> features, and the \(i^{th}\) training example is denoted by \((x^{(i)}, y^{(i)})\). We have,</p>
<p><strong>Hypothesis:</strong> \(h_\theta = \theta_{0} + \theta_1 x_1 + \theta_2 x_2 + ... + \theta_{n} x_{n}\)</p>
<p><strong>Parameters:</strong> \(\theta_{0}, \theta_{1}, ..., \theta_{n}\)</p>
<p><strong>Cost function:</strong> \(J(\theta_{0}, \theta_{1}, ..., \theta_{n}) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}-y^{(i)})^2\)</p>
<p>Our goal is to reduce the cost of the model at each iteration using gradient descent. The gradient descent algorithm is stated below:</p>
<p><strong>Gradient Descent:</strong></p>
<p>Repeat {</p>
\[\theta_j := \theta_j - \alpha \frac{\partial}{\partial\theta_j}J(\theta_{0}, \theta_{1}, ..., \theta_{n})\]
<p>} (simultaneously update for every \(j = 0, ..., n\))</p>
<p>Once we understand the above equations, we can vectorize them.</p>
<p><strong>Vectorized Hypothesis:</strong> \(h_\theta = x^T\theta\) , here x is n+1 dimensional vector. We get this by adding “1” at the first position of feature vector.</p>
<p><strong>Vectorized Parameters:</strong> \(\theta\), here \theta is n+1 dimensional vector.</p>
<p><strong>Vectorized Cost function:</strong> \(J(\theta) = \frac{1}{2m} (X\theta - y)^T(X\theta - y)\), here \(X\) is a matrix with each row as an input feature. Therefore, its dimension is \(m \times (n+1)\).</p>
<p><strong>Vectorized Gradient Descent:</strong></p>
<p>Repeat {</p>
\[\theta := \theta - \alpha \frac{\partial}{\partial\theta}J(\theta)\]
<p>}</p>
<p>Here,</p>
\[\frac{\partial}{\partial\theta}J(\theta) = 2X^T(X\theta - y)\]
<p>Hence, finally we get</p>
<p>Repeat {</p>
\[\theta := \theta - \frac{\alpha}{m} X^T(X\theta - y)\]
<p>}</p>
<p>We can write this in Python code as follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">it</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_iters</span><span class="p">):</span>
<span class="n">hypothesis</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">hypothesis</span> <span class="o">-</span> <span class="n">y</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">theta</span> <span class="o">-</span> <span class="p">(</span><span class="n">alpha</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">loss</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="normal-equations-for-linear-regression">Normal Equations for Linear Regression</h2>
<p>There is also a closed-form solution to linear regression. But this method should only be used for small datasets, as it gets very expensive for large datasets. We can find the parameters as:</p>
\[\theta = (X^TX)^{-1}X^Ty\]
<p>Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop until convergence” like in radient descent.</p>
<h2 id="case-study---inferring-price-of-house">Case Study - Inferring Price of House</h2>
<p>Enough of the theory, let us get hands on now! We will be inferring price of houses, by using number of bedrooms and size of house as features.</p>
<p>You can download the code and data from <a href="https://github.com/vbvsharma/Linear-Regression-From-Scratch">here</a>.</p>
<h3 id="load-data">Load Data</h3>
<p>We can load data using <code class="language-plaintext highlighter-rouge">genfromtxt</code> from numpy. We also have to extract features (X) and actual prices (y) from the read data.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Loading data ...</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="c1"># Load data
</span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">genfromtxt</span><span class="p">(</span><span class="s">'data.txt'</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">','</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">1</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">reshape</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">reshape</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>
<h3 id="pre-process-data">Pre-process Data</h3>
<p>We have to normalize the data so that the gradients don’t explode. We only normalize the features (X) and not the actual prices (y).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">featureNormalize</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">X_norm</span> <span class="o">=</span> <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="n">mu</span><span class="p">)</span> <span class="o">/</span> <span class="n">sigma</span>
<span class="k">return</span> <span class="n">X_norm</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span>
</code></pre></div></div>
<p>We also have to add a column of ones to X, so that the intercept is not zero.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Add intercept term to X
</span><span class="n">ones_col</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">ones_col</span><span class="p">,</span> <span class="n">X</span><span class="p">))</span>
</code></pre></div></div>
<h3 id="gradient-descent-1">Gradient Descent</h3>
<p>Now we will perform gradient descent for some iterations, as discussed in the introductory theory.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">gradientDescent</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">num_iters</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="n">theta</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">J_history</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">num_iters</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">for</span> <span class="n">it</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_iters</span><span class="p">):</span>
<span class="n">hypothesis</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">hypothesis</span> <span class="o">-</span> <span class="n">y</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">theta</span> <span class="o">-</span> <span class="p">(</span><span class="n">alpha</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">loss</span><span class="p">)</span>
<span class="n">J_history</span><span class="p">[</span><span class="n">it</span><span class="p">]</span> <span class="o">=</span> <span class="n">computeCost</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="k">return</span> <span class="n">theta</span><span class="p">,</span> <span class="n">J_history</span>
</code></pre></div></div>
<h3 id="plotting-the-convergence-graph">Plotting the Convergence Graph</h3>
<p>Let us see how the cost of the model changes as we iterate.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="c1"># Choose some alpha value
</span><span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="n">num_iters</span> <span class="o">=</span> <span class="mi">100</span>
<span class="c1"># Init theta and run gradient descent
</span><span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">theta</span><span class="p">,</span> <span class="n">J_history</span> <span class="o">=</span> <span class="n">gradientDescent</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">num_iters</span><span class="p">)</span>
<span class="c1"># Plot the convergence graph
</span><span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">J_history</span><span class="p">)),</span> <span class="n">J_history</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'Number of iterations'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Cost J'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'Cost at each iteration.png'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># Display gradient descent's result
</span><span class="k">print</span><span class="p">(</span><span class="s">'Theta computed from gradient descent:'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">()</span>
</code></pre></div></div>
<p>We get the graph below.</p>
<p><img src="/assets/images/2019-06-01-Linear-Regression-From-Scratch/Cost-at-each-iteration.png" alt="Convergence graph" class="img-responsive" /></p>
<h3 id="estimating-price-of-a-house">Estimating price of a house</h3>
<p>Let us now use our mdel to predict the price of a 1000 sq-ft, 2 bedroom house.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Estimate the price of a 1000 sq-ft, 3 br house
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1000</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">x_norm</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">/</span> <span class="n">mu</span>
<span class="n">x_norm</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">x_norm</span><span class="p">)).</span><span class="n">reshape</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span>
<span class="n">price</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">x_norm</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Predicted price of a 1000 sq-ft, 3 br house (using gradient descent):'</span><span class="p">,</span> <span class="n">price</span><span class="p">,</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="using-normal-equations-to-find-parameters">Using Normal Equations to Find Parameters</h3>
<p>We can find the parameters analytically for small datasets. The code follows:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Load data
</span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">genfromtxt</span><span class="p">(</span><span class="s">'data.txt'</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">','</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">reshape</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># Add intercept term to X
</span><span class="n">ones_col</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">ones_col</span><span class="p">,</span> <span class="n">X</span><span class="p">))</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">normalEqn</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="c1"># Display normal equation's result
</span><span class="k">print</span><span class="p">(</span><span class="s">'Theta computed from normal equations:'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">()</span>
<span class="c1"># Estimate the price of a 1650 sq-ft, 3 br house
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1650</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">price</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Predicted price of a 1650 sq-ft, 3 br house (using normal equations):'</span><span class="p">,</span> <span class="n">price</span><span class="p">,</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="complete-code">Complete Code</h2>
<p>Here is the complete code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">matplotlib</span> <span class="kn">import</span> <span class="n">pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="k">def</span> <span class="nf">featureNormalize</span><span class="p">(</span><span class="n">X</span><span class="p">):</span>
<span class="s">"""
Calculates and returns a normalized version of X where
the mean value of each feature is 0 and the standard deviation
is 1. This is often a good preprocessing step to do when
working with learning algorithms.
Args:
X: It a ndarray which contains features. Each of its row is a training example and
each column has an attribute of training examples.
Returns:
X: The normalized version of X where
the mean value of each feature is 0 and the standard deviation
is 1.
mu: Contains mean of every column in X.
sigma: Contains standard deviation of every column in X
"""</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">X_norm</span> <span class="o">=</span> <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="n">mu</span><span class="p">)</span> <span class="o">/</span> <span class="n">sigma</span>
<span class="k">return</span> <span class="n">X_norm</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span>
<span class="k">def</span> <span class="nf">computeCost</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">):</span>
<span class="s">"""
Computes the cost of using theta as the
parameter for linear regression to fit the data points in X and y
Args:
X: Input feature ndarray.
y: Output array
theta: Current parameters for linear regression.
Returns:
J: Computed cost of using theta as parameters for linear regression
to fit the data points in X and y.
"""</span>
<span class="n">hypothesis</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">hypothesis</span> <span class="o">-</span> <span class="n">y</span>
<span class="n">J</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">loss</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">m</span><span class="p">)</span>
<span class="k">return</span> <span class="n">J</span>
<span class="k">def</span> <span class="nf">gradientDescent</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">num_iters</span><span class="o">=</span><span class="mi">100</span><span class="p">):</span>
<span class="s">"""
Performs gradient descent to learn theta.
Args:
X: Input feature ndarray.
y: Output array
theta: Initial parameters for linear regression.
alpha: The learning rate.
num_iters: Number of iterations of gradient descent to be performed.
Returns:
theta: Updated parameters for linear regression.
J_history: An array that contains costs for every iteration.
"""</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="k">if</span> <span class="n">theta</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">J_history</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">num_iters</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">for</span> <span class="n">it</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_iters</span><span class="p">):</span>
<span class="n">hypothesis</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="n">loss</span> <span class="o">=</span> <span class="n">hypothesis</span> <span class="o">-</span> <span class="n">y</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">theta</span> <span class="o">-</span> <span class="p">(</span><span class="n">alpha</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">loss</span><span class="p">)</span>
<span class="n">J_history</span><span class="p">[</span><span class="n">it</span><span class="p">]</span> <span class="o">=</span> <span class="n">computeCost</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="k">return</span> <span class="n">theta</span><span class="p">,</span> <span class="n">J_history</span>
<span class="k">def</span> <span class="nf">normalEqn</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="s">"""
Computes the closed-form solution to linear regression using
normal equations.
Args:
X: Input feature ndarray.
y: Output array
Returns:
theta: Parameters for linear regression calculated using normal
equations.
"""</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">inv</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X</span><span class="p">)),</span> <span class="n">X</span><span class="p">.</span><span class="n">T</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span>
<span class="k">return</span> <span class="n">theta</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Loading data ...</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="c1"># Load data
</span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">genfromtxt</span><span class="p">(</span><span class="s">'data.txt'</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">','</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">1</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">reshape</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">reshape</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># Print out some data points
</span><span class="k">print</span><span class="p">(</span><span class="s">'First 10 examples from the dataset:'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'x = '</span><span class="p">,</span> <span class="n">X</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">10</span><span class="p">,</span> <span class="p">:],</span> <span class="s">"</span><span class="se">\n</span><span class="s">y = "</span><span class="p">,</span> <span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">10</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">Program paused. Press enter to continue.'</span><span class="p">)</span>
<span class="nb">input</span><span class="p">()</span>
<span class="c1"># Scale features
</span><span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">Normalizing Features ...'</span><span class="p">)</span>
<span class="n">X</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="n">featureNormalize</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="c1"># Add intercept term to X
</span><span class="n">ones_col</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">ones_col</span><span class="p">,</span> <span class="n">X</span><span class="p">))</span>
<span class="c1"># Running gradient descent
</span><span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">Running gradient descent ...'</span><span class="p">)</span>
<span class="c1"># Choose some alpha value
</span><span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.01</span>
<span class="n">num_iters</span> <span class="o">=</span> <span class="mi">100</span>
<span class="c1"># Init theta and run gradient descent
</span><span class="n">theta</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">((</span><span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">theta</span><span class="p">,</span> <span class="n">J_history</span> <span class="o">=</span> <span class="n">gradientDescent</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">theta</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">num_iters</span><span class="p">)</span>
<span class="c1"># Plot the convergence graph
</span><span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">J_history</span><span class="p">)),</span> <span class="n">J_history</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'Number of iterations'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Cost J'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'Cost at each iteration.png'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="c1"># Display gradient descent's result
</span><span class="k">print</span><span class="p">(</span><span class="s">'Theta computed from gradient descent:'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">()</span>
<span class="c1"># Estimate the price of a 1650 sq-ft, 3 br house
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1650</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">x_norm</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">/</span> <span class="n">mu</span>
<span class="n">x_norm</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">x_norm</span><span class="p">)).</span><span class="n">reshape</span><span class="p">((</span><span class="mi">1</span><span class="p">,</span> <span class="n">n</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span>
<span class="n">price</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">x_norm</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Predicted price of a 1650 sq-ft, 3 br house (using gradient descent):'</span><span class="p">,</span> <span class="n">price</span><span class="p">,</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">Program paused. Press enter to continue.'</span><span class="p">)</span>
<span class="nb">input</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">'</span><span class="se">\n</span><span class="s">Solving with normal equations ...</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
<span class="c1"># Load data
</span><span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">genfromtxt</span><span class="p">(</span><span class="s">'data.txt'</span><span class="p">,</span> <span class="n">delimiter</span><span class="o">=</span><span class="s">','</span><span class="p">)</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">data</span><span class="p">[:,</span> <span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="n">reshape</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># Add intercept term to X
</span><span class="n">ones_col</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">((</span><span class="n">m</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">hstack</span><span class="p">((</span><span class="n">ones_col</span><span class="p">,</span> <span class="n">X</span><span class="p">))</span>
<span class="n">theta</span> <span class="o">=</span> <span class="n">normalEqn</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="c1"># Display normal equation's result
</span><span class="k">print</span><span class="p">(</span><span class="s">'Theta computed from normal equations:'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">()</span>
<span class="c1"># Estimate the price of a 1650 sq-ft, 3 br house
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1650</span><span class="p">,</span> <span class="mi">3</span><span class="p">])</span>
<span class="n">price</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">theta</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Predicted price of a 1650 sq-ft, 3 br house (using normal equations):'</span><span class="p">,</span> <span class="n">price</span><span class="p">,</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="references">References</h2>
<ul>
<li>Machine Learning course taught by Andrew Ng on Coursera.</li>
</ul>Vaibhav SharmaLinear regression is one of the most basic algorithms in machine learning and statistics, and it is also one of the best understood algorithms out there. Here, we are going to study multivariate linear regression, which is just just a fancy name for linear regression when multiple independent variables are involved. We do this for two reasons: Multivariate Linear Regression is more general than Univariate Linear Regression. Practically you will find yourself using Multivariate Linear Regression more often than Univariate Linear Regression, as it involves learning from multiple features.Build a CNN on CIFAR-10 using TensorFlow2023-01-01T00:00:00+00:002023-01-01T00:00:00+00:00https://vbvsharma.com/misc/2023/01/01/CNN-on-CIFAR-10-using-TensorFlow<h2 id="introduction">Introduction</h2>
<p><strong>Note:</strong> You can find the code for this post <a href="https://github.com/vbvsharma/build-cnn-on-cifar10-using-tensorflow">here</a>.</p>
<p><a href="http://www.cs.toronto.edu/~kriz/cifar.html">The CIFAR-10 dataset</a> is a standard dataset used in computer vision and deep learning community. It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. The mapping of all 0-9 integers to class labels is listed below.:</p>
<ul>
<li>0 ~> Airplane</li>
<li>1 ~> Automobile</li>
<li>2 ~> Bird</li>
<li>3 ~> Cat</li>
<li>4 ~> Deer</li>
<li>5 ~> Dog</li>
<li>6 ~> Frog</li>
<li>7 ~> Horse</li>
<li>8 ~> Ship</li>
<li>9 ~> Truck</li>
</ul>
<p>It is a fairly simple dataset. Hence, it provides the flexibility to play with various techniques, suh as hyperparameter tuning, regularization, training-test split, parameter search, etc. Therefore, I encourage the reader to play with this dataset after reading this tutorial.</p>
<p>In this tutorial, we will build a convolutional neural network model from scratch using TensorFlow, train that model and then evaluate its performance on unseen data.</p>
<h2 id="explore-cifar-10-dataset">Explore CIFAR-10 dataset</h2>
<p>Let us load the dataset. The dataset is split into training and testing sets. The training set consists of 50000 images, with 5000 images of each class, and the testing set consists of 10000 images, with 1000 images from each class.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Import the CIFAR-10 dataset from keras' datasets
</span><span class="kn">from</span> <span class="nn">tensorflow.keras.datasets</span> <span class="kn">import</span> <span class="n">cifar10</span>
<span class="c1"># Import this PyPlot to visualize images
</span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="o">%</span><span class="n">matplotlib</span> <span class="n">inline</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="nn">sklearn.utils</span> <span class="kn">import</span> <span class="n">shuffle</span>
<span class="c1"># Load dataset
</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">),</span> <span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">Y_test</span><span class="p">)</span> <span class="o">=</span> <span class="n">cifar10</span><span class="p">.</span><span class="n">load_data</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Print the shapes of training and testing set
</span><span class="k">print</span><span class="p">(</span><span class="s">"X_train.shape ="</span><span class="p">,</span> <span class="n">X_train</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="s">"Y_train.shape ="</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"X_test.shape ="</span><span class="p">,</span> <span class="n">X_test</span><span class="p">.</span><span class="n">shape</span><span class="p">,</span> <span class="s">"Y_test.shape ="</span><span class="p">,</span> <span class="n">Y_test</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Output:
X_train.shape = (50000, 32, 32, 3) Y_train.shape = (50000, 1)
X_test.shape = (10000, 32, 32, 3) Y_test.shape = (10000, 1)
</code></pre></div></div>
<p>We can tell from the shapes that,</p>
<ul>
<li><strong>X_train</strong> has 50000 training images, each 32 pixel wide, 32 pixel high, and 3 color channels</li>
<li><strong>X_test</strong> has 10000 testing images, each 32 pixel wide, 32 pixel high, and 3 color channels</li>
<li><strong>Y_train</strong> has 50000 labels</li>
<li><strong>Y_test</strong> has 10000 labels</li>
</ul>
<p>Let us define constants for number of classes and its labels, to make the code more readable.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_CLASSES</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">CIFAR10_CLASSES</span> <span class="o">=</span> <span class="p">[</span><span class="s">"airplane"</span><span class="p">,</span> <span class="s">"automobile"</span><span class="p">,</span> <span class="s">"bird"</span><span class="p">,</span> <span class="s">"cat"</span><span class="p">,</span> <span class="s">"deer"</span><span class="p">,</span>
<span class="s">"dog"</span><span class="p">,</span> <span class="s">"frog"</span><span class="p">,</span> <span class="s">"horse"</span><span class="p">,</span> <span class="s">"ship"</span><span class="p">,</span> <span class="s">"truck"</span><span class="p">]</span>
</code></pre></div></div>
<p>Now, lets look at some random images from the training set. You can change the number of columns and rows to get more/less images.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># show random images from training set
</span><span class="n">cols</span> <span class="o">=</span> <span class="mi">8</span> <span class="c1"># Number of columns
</span><span class="n">rows</span> <span class="o">=</span> <span class="mi">4</span> <span class="c1"># Number of rows
</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">cols</span><span class="p">,</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">rows</span><span class="p">))</span>
<span class="c1"># Add subplot for each random image
</span><span class="k">for</span> <span class="n">col</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">cols</span><span class="p">):</span>
<span class="k">for</span> <span class="n">row</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">rows</span><span class="p">):</span>
<span class="n">random_index</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">Y_train</span><span class="p">))</span> <span class="c1"># Pick a random index for sampling the image
</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="p">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="n">rows</span><span class="p">,</span> <span class="n">cols</span><span class="p">,</span> <span class="n">col</span> <span class="o">*</span> <span class="n">rows</span> <span class="o">+</span> <span class="n">row</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="c1"># Add a sub-plot at (row, col)
</span> <span class="n">ax</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="n">b</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="c1"># Get rid of the grids
</span> <span class="n">ax</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">"off"</span><span class="p">)</span> <span class="c1"># Get rid of the axis
</span> <span class="n">ax</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">X_train</span><span class="p">[</span><span class="n">random_index</span><span class="p">,</span> <span class="p">:])</span> <span class="c1"># Show random image
</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="n">CIFAR10_CLASSES</span><span class="p">[</span><span class="n">Y_train</span><span class="p">[</span><span class="n">random_index</span><span class="p">][</span><span class="mi">0</span><span class="p">]])</span> <span class="c1"># Set title of the sub-plot
</span><span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span> <span class="c1"># Show the image
</span></code></pre></div></div>
<p><img src="/assets/images/2019-05-12-CNN-on-CIFAR-10-using-TensorFlow/output_9_0.png" alt="png" /></p>
<h2 id="prepare-training-and-testing-data">Prepare Training and Testing Data</h2>
<p>Before defining the model and training the model, let us prepare the training and testing data.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="k">print</span><span class="p">(</span><span class="s">"TensorFlow's version is"</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">__version__</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Keras' version is"</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">__version__</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Output:
TensorFlow's version is 1.13.1
Keras' version is 2.2.4-tf
</code></pre></div></div>
<p>Normalize the inputs, to train the model faster and prevent exploding gradients.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Normalize training and testing pixel values
</span><span class="n">X_train_normalized</span> <span class="o">=</span> <span class="n">X_train</span> <span class="o">/</span> <span class="mi">255</span> <span class="o">-</span> <span class="mf">0.5</span>
<span class="n">X_test_normalized</span> <span class="o">=</span> <span class="n">X_test</span> <span class="o">/</span> <span class="mi">255</span> <span class="o">-</span> <span class="mf">0.5</span>
</code></pre></div></div>
<p>Convert the labels to one-hot coded vectors.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Convert class vectors to binary class matrices.
</span><span class="n">Y_train_coded</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">Y_train</span><span class="p">,</span> <span class="n">NUM_CLASSES</span><span class="p">)</span>
<span class="n">Y_test_coded</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">Y_test</span><span class="p">,</span> <span class="n">NUM_CLASSES</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="define-convolutional-neural-network-model">Define Convolutional Neural Network Model</h2>
<p>Next, let us define a model that takes images as input, and outputs class probabilities.</p>
<p>You can learn more about the implementation details
https://keras.io.</p>
<p>We will define following layers in the model:</p>
<ul>
<li><strong>Convolutional layer</strong> which takes (32, 32, 3) shaped images as input, outputs 16 filters, and has a kernel size of (3, 3), with the same padding, and uses LeakyReLU as activation function</li>
<li><strong>Convolutional layer</strong> which takes (32, 32, 16) shaped tensor as input, outputs 32 filters, and has a kernel size of (3, 3), with the same padding, and uses LeakyReLU as activation function</li>
<li><strong>Max Pool layer</strong> with pool size of (2, 2), this outputs (16, 16, 16) tensor</li>
<li><strong>Dropout layer</strong> with the dropout rate of 0.25, to prevent overfitting</li>
<li><strong>Convolutional layer</strong> which takes (16, 16, 16) shaped tensor as input, outputs 32 filters, and has a kernel size of (3, 3), with the same padding, and uses LeakyReLU as activation function</li>
<li><strong>Convolutional layer</strong> which takes (16, 16, 32) shaped tensor as input, outputs 64 filters, and has a kernel size of (3, 3), with the same padding, and uses LeakyReLU as activation function</li>
<li><strong>Max Pool layer</strong> with pool size of (2, 2), this outputs (8, 8, 64) tensor</li>
<li><strong>Dropout layer</strong> with the dropout rate of 0.25, to prevent overfitting</li>
<li><strong>Dense layer</strong> which takes input from 8x8x64 neurons, and has 256 neurons</li>
<li><strong>Dropout layer</strong> with the dropout rate of 0.5, to prevent overfitting</li>
<li><strong>Dense layer</strong> with 10 neurons, and softmax activation, is the final layer</li>
</ul>
<p>As you can see, all the layers use LeakyReLU activations, except the last layer. This is a pretty good choice most of the time, but you change these as well to play with other activations such as tanh, sigmoid, ReLU, etc.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># import necessary building blocks
</span><span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">Sequential</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.layers</span> <span class="kn">import</span> <span class="n">Conv2D</span><span class="p">,</span> <span class="n">MaxPooling2D</span><span class="p">,</span> <span class="n">Flatten</span><span class="p">,</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Activation</span><span class="p">,</span> <span class="n">Dropout</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.layers</span> <span class="kn">import</span> <span class="n">LeakyReLU</span>
<span class="k">def</span> <span class="nf">make_model</span><span class="p">():</span>
<span class="s">"""
Define your model architecture here.
Returns `Sequential` model.
"""</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">16</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">,</span> <span class="n">input_shape</span><span class="o">=</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span> <span class="mi">32</span><span class="p">,</span> <span class="mi">3</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">LeakyReLU</span><span class="p">(</span><span class="mf">0.1</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">LeakyReLU</span><span class="p">(</span><span class="mf">0.1</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">rate</span><span class="o">=</span><span class="mf">0.25</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">LeakyReLU</span><span class="p">(</span><span class="mf">0.1</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">filters</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">padding</span><span class="o">=</span><span class="s">'same'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">LeakyReLU</span><span class="p">(</span><span class="mf">0.1</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">rate</span><span class="o">=</span><span class="mf">0.25</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flatten</span><span class="p">())</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">256</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">LeakyReLU</span><span class="p">(</span><span class="mf">0.1</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="n">rate</span><span class="o">=</span><span class="mf">0.5</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">10</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Activation</span><span class="p">(</span><span class="s">"softmax"</span><span class="p">))</span>
<span class="k">return</span> <span class="n">model</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># describe model
</span><span class="n">s</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">backend</span><span class="p">.</span><span class="n">clear_session</span><span class="p">()</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">make_model</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<h2 id="train-your-model">Train your model</h2>
<p>Next, we train the model that we defined above. We will use 0.005 as our initial learning rate, training batch size will be 64, we will train our model for 10 epochs. Feel free to change these hyperparameters, to dive deeper and know their effects. We use categorical cross entropy loss as our lass function and Adamax optimizer for convergence.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">INIT_LR</span> <span class="o">=</span> <span class="mf">5e-3</span> <span class="c1"># initial learning rate
</span><span class="n">BATCH_SIZE</span> <span class="o">=</span> <span class="mi">64</span>
<span class="n">EPOCHS</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">backend</span><span class="p">.</span><span class="n">clear_session</span><span class="p">()</span> <span class="c1"># clear default graph
# don't call K.set_learning_phase() !!! (otherwise will enable dropout in train/test simultaneously)
</span><span class="n">model</span> <span class="o">=</span> <span class="n">make_model</span><span class="p">()</span> <span class="c1"># define our model
</span>
<span class="c1"># prepare model for fitting (loss, optimizer, etc)
</span><span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span>
<span class="n">loss</span><span class="o">=</span><span class="s">'categorical_crossentropy'</span><span class="p">,</span> <span class="c1"># we train 10-way classification
</span> <span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adamax</span><span class="p">(</span><span class="n">lr</span><span class="o">=</span><span class="n">INIT_LR</span><span class="p">),</span> <span class="c1"># for SGD
</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">]</span> <span class="c1"># report accuracy during training
</span><span class="p">)</span>
</code></pre></div></div>
<p>We define a learning rate scheduler, which decays learning rate after each epoch.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># scheduler of learning rate (decay with epochs)
</span><span class="k">def</span> <span class="nf">lr_scheduler</span><span class="p">(</span><span class="n">epoch</span><span class="p">):</span>
<span class="k">return</span> <span class="n">INIT_LR</span> <span class="o">*</span> <span class="mf">0.9</span> <span class="o">**</span> <span class="n">epoch</span>
</code></pre></div></div>
<p>We also define a class that handles callbacks from keras. It prints out the learning rate used in that epoch.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># callback for printing of actual learning rate used by optimizer
</span><span class="k">class</span> <span class="nc">LrHistory</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">callbacks</span><span class="p">.</span><span class="n">Callback</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">on_epoch_begin</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">epoch</span><span class="p">,</span> <span class="n">logs</span><span class="o">=</span><span class="p">{}):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Learning rate:"</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">backend</span><span class="p">.</span><span class="n">get_value</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">optimizer</span><span class="p">.</span><span class="n">lr</span><span class="p">))</span>
</code></pre></div></div>
<p>Now, let us train our model on normalized X_train, <strong>X_train_normalized</strong>, and one-hot coded matrix, <strong>Y_train_coded</strong>. During training we will also keep validating on, <strong>X_test_normalized</strong> and <strong>Y_train_coded</strong>. In this way we can keep an eye on model performance.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># fit model
</span><span class="n">history</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
<span class="n">X_train_normalized</span><span class="p">,</span> <span class="n">Y_train_coded</span><span class="p">,</span> <span class="c1"># prepared data
</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">BATCH_SIZE</span><span class="p">,</span>
<span class="n">epochs</span><span class="o">=</span><span class="n">EPOCHS</span><span class="p">,</span>
<span class="n">callbacks</span><span class="o">=</span><span class="p">[</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">callbacks</span><span class="p">.</span><span class="n">LearningRateScheduler</span><span class="p">(</span><span class="n">lr_scheduler</span><span class="p">),</span>
<span class="n">LrHistory</span><span class="p">()],</span>
<span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">X_test_normalized</span><span class="p">,</span> <span class="n">Y_test_coded</span><span class="p">),</span>
<span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span>
<span class="n">initial_epoch</span><span class="o">=</span><span class="mi">0</span>
<span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">save_model</span><span class="p">(</span><span class="n">model</span><span class="p">):</span><span class="c1"># serialize model to JSON
</span> <span class="n">model_json</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">to_json</span><span class="p">()</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s">"model.json"</span><span class="p">,</span> <span class="s">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">json_file</span><span class="p">:</span>
<span class="n">json_file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">model_json</span><span class="p">)</span>
<span class="c1"># serialize weights to HDF5
</span> <span class="n">model</span><span class="p">.</span><span class="n">save_weights</span><span class="p">(</span><span class="s">"model.h5"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Saved model to disk"</span><span class="p">)</span>
<span class="n">save_model</span><span class="p">(</span><span class="n">model</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="evaluate-the-model">Evaluate the model</h2>
<p>Now that we have trained our model, let us see how it performs.</p>
<p>Let us load the saved model from disk.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">load_model</span><span class="p">():</span>
<span class="kn">from</span> <span class="nn">tensorflow.keras.models</span> <span class="kn">import</span> <span class="n">model_from_json</span>
<span class="c1"># load json and create model
</span> <span class="n">json_file</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s">'model.json'</span><span class="p">,</span> <span class="s">'r'</span><span class="p">)</span>
<span class="n">loaded_model_json</span> <span class="o">=</span> <span class="n">json_file</span><span class="p">.</span><span class="n">read</span><span class="p">()</span>
<span class="n">json_file</span><span class="p">.</span><span class="n">close</span><span class="p">()</span>
<span class="n">loaded_model</span> <span class="o">=</span> <span class="n">model_from_json</span><span class="p">(</span><span class="n">loaded_model_json</span><span class="p">)</span>
<span class="c1"># load weights into new model
</span> <span class="n">loaded_model</span><span class="p">.</span><span class="n">load_weights</span><span class="p">(</span><span class="s">"model.h5"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Loaded model from disk"</span><span class="p">)</span>
<span class="k">return</span> <span class="n">loaded_model</span>
<span class="n">model</span> <span class="o">=</span> <span class="n">load_model</span><span class="p">()</span>
</code></pre></div></div>
<p>Let us look at the learning curve during the training of our model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">history</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'loss'</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">history</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="s">'val_loss'</span><span class="p">])</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'model loss'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'loss'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'epoch'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">([</span><span class="s">'train'</span><span class="p">,</span> <span class="s">'test'</span><span class="p">],</span> <span class="n">loc</span><span class="o">=</span><span class="s">'upper left'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/assets/images/2019-05-12-CNN-on-CIFAR-10-using-TensorFlow/output_32_0.png" alt="png" /></p>
<p>Let us predict the classes for each image in testing set.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># make test predictions
</span><span class="n">Y_pred_test</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">X_test_normalized</span><span class="p">)</span> <span class="c1"># Predict probability of image belonging to a class, for each class
</span><span class="n">Y_pred_test_classes</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">Y_pred_test</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Class with highest probability from predicted probabilities
</span><span class="n">Y_test_classes</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">argmax</span><span class="p">(</span><span class="n">Y_test_coded</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Actual class
</span><span class="n">Y_pred_test_max_probas</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">max</span><span class="p">(</span><span class="n">Y_pred_test</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="c1"># Highest probability
</span></code></pre></div></div>
<p>Let us look at the confusion matrix to understand the performance of our model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># confusion matrix and accuracy
</span><span class="kn">from</span> <span class="nn">sklearn.metrics</span> <span class="kn">import</span> <span class="n">confusion_matrix</span><span class="p">,</span> <span class="n">accuracy_score</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">7</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Confusion matrix'</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">16</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">confusion_matrix</span><span class="p">(</span><span class="n">Y_test_classes</span><span class="p">,</span> <span class="n">Y_pred_test_classes</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span> <span class="n">CIFAR10_CLASSES</span><span class="p">,</span> <span class="n">rotation</span><span class="o">=</span><span class="mi">45</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">yticks</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span> <span class="n">CIFAR10_CLASSES</span><span class="p">,</span> <span class="n">fontsize</span><span class="o">=</span><span class="mi">12</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">colorbar</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Test accuracy:"</span><span class="p">,</span> <span class="n">accuracy_score</span><span class="p">(</span><span class="n">Y_test_classes</span><span class="p">,</span> <span class="n">Y_pred_test_classes</span><span class="p">))</span>
</code></pre></div></div>
<p><img src="/assets/images/2019-05-12-CNN-on-CIFAR-10-using-TensorFlow/output_36_0.png" alt="png" /></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Test accuracy: 0.7913
</code></pre></div></div>
<p>Test accuracy of ~ 80% isn’t bad for such a simple model. Now, Let us look at some random predictions from our model.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># inspect preditions
</span><span class="n">cols</span> <span class="o">=</span> <span class="mi">8</span>
<span class="n">rows</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">cols</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">3</span> <span class="o">*</span> <span class="n">rows</span> <span class="o">-</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">cols</span><span class="p">):</span>
<span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">rows</span><span class="p">):</span>
<span class="n">random_index</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">Y_test</span><span class="p">))</span>
<span class="n">ax</span> <span class="o">=</span> <span class="n">fig</span><span class="p">.</span><span class="n">add_subplot</span><span class="p">(</span><span class="n">rows</span><span class="p">,</span> <span class="n">cols</span><span class="p">,</span> <span class="n">i</span> <span class="o">*</span> <span class="n">rows</span> <span class="o">+</span> <span class="n">j</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">grid</span><span class="p">(</span><span class="n">b</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">X_test</span><span class="p">[</span><span class="n">random_index</span><span class="p">,</span> <span class="p">:])</span>
<span class="n">pred_label</span> <span class="o">=</span> <span class="n">CIFAR10_CLASSES</span><span class="p">[</span><span class="n">Y_pred_test_classes</span><span class="p">[</span><span class="n">random_index</span><span class="p">]]</span>
<span class="n">pred_proba</span> <span class="o">=</span> <span class="n">Y_pred_test_max_probas</span><span class="p">[</span><span class="n">random_index</span><span class="p">]</span>
<span class="n">true_label</span> <span class="o">=</span> <span class="n">CIFAR10_CLASSES</span><span class="p">[</span><span class="n">Y_test</span><span class="p">[</span><span class="n">random_index</span><span class="p">][</span><span class="mi">0</span><span class="p">]]</span>
<span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"pred: {}</span><span class="se">\n</span><span class="s">score: {:.3}</span><span class="se">\n</span><span class="s">true: {}"</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span>
<span class="n">pred_label</span><span class="p">,</span> <span class="n">pred_proba</span><span class="p">,</span> <span class="n">true_label</span>
<span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="/assets/images/2019-05-12-CNN-on-CIFAR-10-using-TensorFlow/output_38_0.png" alt="png" /></p>
<h2 id="summary">Summary</h2>
<p>In this tutorial, we discovered how to develop a convolutional neural network for CIFAR-10 classification from scratch using TensorFlow.</p>
<p>Specifically, we learned:</p>
<ul>
<li>How to load CIFAR-10 in your python program</li>
<li>How to look at random images in the dataset</li>
<li>How to define and train a model</li>
<li>How to save the learnt weights of the model to disk</li>
<li>How to predict clsses using the model</li>
</ul>
<p>These topics will be covered later:</p>
<ul>
<li>How to improve your model</li>
<li>How to thoroughly validate your model</li>
</ul>
<p>This is a pretty good model (if it is among your first few), but people have achieved around 99% accuracy in this dataset. You can checkout other people’s performance on this dataset <a href="https://benchmarks.ai/cifar-10">here</a>.</p>
<p>If you want to work on this model on your system, you can find the code <a href="https://github.com/vbvsharma/build-cnn-on-cifar10-using-tensorflow">here</a>.</p>Vaibhav SharmaIntroduction