Programming fundamental concepts: Modules and Numpy

Programming fundamental concepts: Modules and Numpy#

This assignment is not obligatory

Introduction#

In the last tutorial we have covered the basics of Python programming, including variables, data types, and control structures. We also introduced some fundamental programming concepts such as loops and conditionals. In this tutorial we will take a closer look at functions and how to use them effectively in your code. We will not only use built-in functions but also import them from external libraries.

Throughout this tutorial, we will also focus on file handling in Python, including reading from and writing to files. This is an essential skill, since we will be working a lot with data stored in files.

Useful tools#

During coding, you will often need to look up how certain functions work. There are several useful tools to help you with this:

The Python documentation or library documentation such as NumPy and Pandas are great resources.
AI tools like ChatGPT and Copilot.
Online forums and communities such as Stack Overflow.

Python packages#

for this assignment we will be using the following packages:

numpy: for numerical operations
pandas: for data manipulation and analysis
os: for interacting with the operating system

These should be installed in your Python environment. If you haven’t installed them yet, you can do so using conda:

conda install numpy

\(\text{Task 4.1:}\)

Let’s start by importing the necessary libraries.

import numpy as np
import pandas as pd
import os

import numpy as np
import pandas as pd
import os

Numpy#

Numpy (numpy documentation: https://numpy.org/doc/stable/) is one of the most popular libraries for numerical computing in Python. It provides support for arrays, matrices, and a wide range of mathematical functions to operate on these data structures. Numpy is particularly useful for handling large datasets and performing complex mathematical operations efficiently. You will be using this library extensively throughout MUDE.

We will start by creating a simple array using Numpy after which we will focus on indexing and slicing. Indexing and slicing works similarly to Python lists, but with some additional features.

A list of useful numpy functions:

np.array(): Create an array
np.arange(): Create an array with a range of values
np.zeros(): Create an array filled with zeros
np.ones(): Create an array filled with ones
np.eye(): Create an identity matrix
np.reshape(): Reshape an array
np.transpose(): Transpose an array
np.concatenate(): Concatenate arrays
np.split(): Split an array
np.mean(): Compute the mean of an array
np.sum(): Compute the sum of an array
np.max(): Find the maximum value in an array
np.min(): Find the minimum value in an array
np.std(): Compute the standard deviation of an array
np.dot(): Compute the dot product of two arrays
np.cross(): Compute the cross product of two arrays
np.linalg.inv(): Compute the inverse of a matrix
np.linalg.det(): Compute the determinant of a matrix
np.random.rand(): Generate random numbers
np.random.randn(): Generate random numbers from a standard normal distribution

# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print(array_1d)
print(f"array_1d has dtype {array_1d.dtype}, shape {array_1d.shape}, size {array_1d.size}")

# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)
print(f"array_2d has dtype {array_2d.dtype}, shape {array_2d.shape}, size {array_2d.size}")

# Append a new row to the 2D array
array_2d2 = np.append(array_2d, [[7, 8, 9]], axis=0)
print(array_2d2)

[1 2 3 4 5]
array_1d has dtype int64, shape (5,), size 5
[[1 2 3]
 [4 5 6]]
array_2d has dtype int64, shape (2, 3), size 6
[[1 2 3]
 [4 5 6]
 [7 8 9]]

\(\text{Task 4.2:}\)

We will now create a np.array with 10 rows and 3 columns. The first column is filled from 1 to 10, the second column is filled with the squares of the first column, and the third column is filled with the cubes of the first column. After we have created the array we will save it to a file.

I will show you two ways to do this:

Starting with an empty array and filling it in a loop.
creating 3 separate arrays for each column and then combining them.

# Method 1
my_array1 = np.zeros((10, 3))
first_col = np.arange(1, 11)
for i in range(3):
    my_array1[:, i] = first_col ** (i + 1)

print(my_array1)

# Method 2
first_col = np.arange(1, 11)
second_col = first_col ** 2
third_col = first_col ** 3
my_array2 = np.column_stack((first_col, second_col, third_col))
print(my_array2)

# Lets check
print(my_array1 == my_array2)

# save the file as csv
np.savetxt("my_array.csv", my_array1, delimiter=",", header="col1,col2,col3", comments='', fmt='%d')

[[   1.    1.    1.]
 [   2.    4.    8.]
 [   3.    9.   27.]
 [   4.   16.   64.]
 [   5.   25.  125.]
 [   6.   36.  216.]
 [   7.   49.  343.]
 [   8.   64.  512.]
 [   9.   81.  729.]
 [  10.  100. 1000.]]
[[   1    1    1]
 [   2    4    8]
 [   3    9   27]
 [   4   16   64]
 [   5   25  125]
 [   6   36  216]
 [   7   49  343]
 [   8   64  512]
 [   9   81  729]
 [  10  100 1000]]
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]
 [ True  True  True]]

\(\text{Task 4.3:}\)

Now we will use some of numpy’s powerful features to manipulate our data. Let’s try some matrix operations. First, we will create a 3x3 matrix and perform some basic operations on it.

my_array = np.array([[6, 3, 2], [9, 4, 2], [12, 5, 3]])

# Perform basic operations
print("Original array:")
print(my_array)

print("Transposed array:")
print(my_array.T)

print("Array multiplied by 2:")
print(my_array * 2)

# Matrix multiplication
my_array2 = np.array([[-1, -3], [2, -4], [6, -2]])
print("Matrix multiplication result:")
print(my_array @ my_array2)

# Compute the inverse and determinant
print("Inverse of the original array:")
print(np.linalg.inv(my_array))

print("Determinant of the original array:")
print(np.linalg.det(my_array))

Original array:
[[ 6  3  2]
 [ 9  4  2]
 [12  5  3]]
Transposed array:
[[ 6  9 12]
 [ 3  4  5]
 [ 2  2  3]]
Array multiplied by 2:
[[12  6  4]
 [18  8  4]
 [24 10  6]]
Matrix multiplication result:
[[ 12 -34]
 [ 11 -47]
 [ 16 -62]]
Inverse of the original array:
[[-0.66666667 -0.33333333  0.66666667]
 [ 1.          2.         -2.        ]
 [ 1.         -2.          1.        ]]
Determinant of the original array:
-2.9999999999999996

os#

Now that we have our data saved in a CSV file, we can use the os module to interact with the file system. For example, we can check if the file exists, remove it, or rename it.

some useful os functions:

os.path.exists(path): Check if a file or directory exists
os.remove(path): Remove a file
os.rename(old, new): Rename a file or directory
os.listdir(path): List all files and directories in a directory
os.getcwd(): Get the current working directory
os.chdir(path): Change the current working directory
os.mkdir(path): Create a new directory
os.makedirs(path): Create a new directory and all intermediate directories
os.path.join(path, *paths): Join one or more path components intelligently

# List all files in the current directory
files = os.listdir()
print(files)

# Get the current working directory
current_dir = os.getcwd()
print(current_dir)

['README.md', 'requirements.txt', '4_python_tutorial.ipynb', '1_VS_share.ipynb', 'LICENSE', '.git', 'test_intellisense.py', '.github', 'my_array.csv', 'CITATION.cff', '2_IntelliSense.ipynb', '3_matrix_vector_manipulations.ipynb']
/home/runner/work/PA1.3/PA1.3

Pandas#

Now we will use pandas to import our newly created CSV file and perform some data analysis. We could also do this with other libraries, such as numpy, but pandas is specifically designed for data manipulation and analysis.

Some useful pandas functions include:

pd.read_csv(): Read a CSV file into a DataFrame.
df.head(): Display the first few rows of a DataFrame.
df.describe(): Generate descriptive statistics.

\(\text{Task 4.4:}\)

Import our my_array.csv file using pandas and display the first few rows as well as the summary statistics.

# Import our my_array.csv file using pandas
my_df = pd.read_csv("my_array.csv")

# Display the first few rows
print(my_df.head()) 

# Display the summary statistics
print(my_df.describe())

# Transform the data
my_df["col4"] = my_df["col1"] * my_df["col3"]
print(my_df.head())

# Create numpy array from DataFrame
my_array = my_df.to_numpy()

print(my_array)

   col1  col2  col3
0     1     1     1
1     2     4     8
2     3     9    27
3     4    16    64
4     5    25   125
           col1        col2         col3
count  10.00000   10.000000    10.000000
mean    5.50000   38.500000   302.500000
std     3.02765   34.173577   343.728333
min     1.00000    1.000000     1.000000
25%     3.25000   10.750000    36.250000
50%     5.50000   30.500000   170.500000
75%     7.75000   60.250000   469.750000
max    10.00000  100.000000  1000.000000
   col1  col2  col3  col4
0     1     1     1     1
1     2     4     8    16
2     3     9    27    81
3     4    16    64   256
4     5    25   125   625
[[    1     1     1     1]
 [    2     4     8    16]
 [    3     9    27    81]
 [    4    16    64   256]
 [    5    25   125   625]
 [    6    36   216  1296]
 [    7    49   343  2401]
 [    8    64   512  4096]
 [    9    81   729  6561]
 [   10   100  1000 10000]]

By Berend Bouvy, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.