Programming fundamental concepts: Modules and Numpy#
This assignment is not obligatory
Introduction#
In the last tutorial we have covered the basics of Python programming, including variables, data types, and control structures. We also introduced some fundamental programming concepts such as loops and conditionals. In this tutorial we will take a closer look at functions and how to use them effectively in your code. We will not only use built-in functions but also import them from external libraries.
Throughout this tutorial, we will also focus on file handling in Python, including reading from and writing to files. This is an essential skill, since we will be working a lot with data stored in files.
Useful tools#
During coding, you will often need to look up how certain functions work. There are several useful tools to help you with this:
The Python documentation or library documentation such as NumPy and Pandas are great resources.
Online forums and communities such as Stack Overflow.
Python packages#
for this assignment we will be using the following packages:
numpy: for numerical operationspandas: for data manipulation and analysisos: for interacting with the operating system
These should be installed in your Python environment. If you haven’t installed them yet, you can do so using conda:
conda install numpy
\(\text{Task 4.1:}\)
Let’s start by importing the necessary libraries.
import numpy as np
import pandas as pd
import os
import numpy as np
import pandas as pd
import os
Numpy#
Numpy (numpy documentation: https://numpy.org/doc/stable/) is one of the most popular libraries for numerical computing in Python. It provides support for arrays, matrices, and a wide range of mathematical functions to operate on these data structures. Numpy is particularly useful for handling large datasets and performing complex mathematical operations efficiently. You will be using this library extensively throughout MUDE.
We will start by creating a simple array using Numpy after which we will focus on indexing and slicing. Indexing and slicing works similarly to Python lists, but with some additional features.
A list of useful numpy functions:
np.array(): Create an array
np.arange(): Create an array with a range of values
np.zeros(): Create an array filled with zeros
np.ones(): Create an array filled with ones
np.eye(): Create an identity matrix
np.reshape(): Reshape an array
np.transpose(): Transpose an array
np.concatenate(): Concatenate arrays
np.split(): Split an array
np.mean(): Compute the mean of an array
np.sum(): Compute the sum of an array
np.max(): Find the maximum value in an array
np.min(): Find the minimum value in an array
np.std(): Compute the standard deviation of an array
np.dot(): Compute the dot product of two arrays
np.cross(): Compute the cross product of two arrays
np.linalg.inv(): Compute the inverse of a matrix
np.linalg.det(): Compute the determinant of a matrix
np.random.rand(): Generate random numbers
np.random.randn(): Generate random numbers from a standard normal distribution
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print(array_1d)
print(f"array_1d has dtype {array_1d.dtype}, shape {array_1d.shape}, size {array_1d.size}")
# Create a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(array_2d)
print(f"array_2d has dtype {array_2d.dtype}, shape {array_2d.shape}, size {array_2d.size}")
# Append a new row to the 2D array
array_2d2 = np.append(array_2d, [[7, 8, 9]], axis=0)
print(array_2d2)
[1 2 3 4 5]
array_1d has dtype int64, shape (5,), size 5
[[1 2 3]
[4 5 6]]
array_2d has dtype int64, shape (2, 3), size 6
[[1 2 3]
[4 5 6]
[7 8 9]]
\(\text{Task 4.2:}\)
We will now create a np.array with 10 rows and 3 columns. The first column is filled from 1 to 10, the second column is filled with the squares of the first column, and the third column is filled with the cubes of the first column. After we have created the array we will save it to a file.
I will show you two ways to do this:
Starting with an empty array and filling it in a loop.
creating 3 separate arrays for each column and then combining them.
# Method 1
my_array1 = np.zeros((10, 3))
first_col = np.arange(1, 11)
for i in range(3):
my_array1[:, i] = first_col ** (i + 1)
print(my_array1)
# Method 2
first_col = np.arange(1, 11)
second_col = first_col ** 2
third_col = first_col ** 3
my_array2 = np.column_stack((first_col, second_col, third_col))
print(my_array2)
# Lets check
print(my_array1 == my_array2)
# save the file as csv
np.savetxt("my_array.csv", my_array1, delimiter=",", header="col1,col2,col3", comments='', fmt='%d')
[[ 1. 1. 1.]
[ 2. 4. 8.]
[ 3. 9. 27.]
[ 4. 16. 64.]
[ 5. 25. 125.]
[ 6. 36. 216.]
[ 7. 49. 343.]
[ 8. 64. 512.]
[ 9. 81. 729.]
[ 10. 100. 1000.]]
[[ 1 1 1]
[ 2 4 8]
[ 3 9 27]
[ 4 16 64]
[ 5 25 125]
[ 6 36 216]
[ 7 49 343]
[ 8 64 512]
[ 9 81 729]
[ 10 100 1000]]
[[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]
[ True True True]]
\(\text{Task 4.3:}\)
Now we will use some of numpy’s powerful features to manipulate our data. Let’s try some matrix operations. First, we will create a 3x3 matrix and perform some basic operations on it.
my_array = np.array([[6, 3, 2], [9, 4, 2], [12, 5, 3]])
# Perform basic operations
print("Original array:")
print(my_array)
print("Transposed array:")
print(my_array.T)
print("Array multiplied by 2:")
print(my_array * 2)
# Matrix multiplication
my_array2 = np.array([[-1, -3], [2, -4], [6, -2]])
print("Matrix multiplication result:")
print(my_array @ my_array2)
# Compute the inverse and determinant
print("Inverse of the original array:")
print(np.linalg.inv(my_array))
print("Determinant of the original array:")
print(np.linalg.det(my_array))
Original array:
[[ 6 3 2]
[ 9 4 2]
[12 5 3]]
Transposed array:
[[ 6 9 12]
[ 3 4 5]
[ 2 2 3]]
Array multiplied by 2:
[[12 6 4]
[18 8 4]
[24 10 6]]
Matrix multiplication result:
[[ 12 -34]
[ 11 -47]
[ 16 -62]]
Inverse of the original array:
[[-0.66666667 -0.33333333 0.66666667]
[ 1. 2. -2. ]
[ 1. -2. 1. ]]
Determinant of the original array:
-2.9999999999999996
os#
Now that we have our data saved in a CSV file, we can use the os module to interact with the file system. For example, we can check if the file exists, remove it, or rename it.
some useful os functions:
os.path.exists(path): Check if a file or directory existsos.remove(path): Remove a fileos.rename(old, new): Rename a file or directoryos.listdir(path): List all files and directories in a directoryos.getcwd(): Get the current working directoryos.chdir(path): Change the current working directoryos.mkdir(path): Create a new directoryos.makedirs(path): Create a new directory and all intermediate directoriesos.path.join(path, *paths): Join one or more path components intelligently
# List all files in the current directory
files = os.listdir()
print(files)
# Get the current working directory
current_dir = os.getcwd()
print(current_dir)
['README.md', 'requirements.txt', '4_python_tutorial.ipynb', '1_VS_share.ipynb', 'LICENSE', '.git', 'test_intellisense.py', '.github', 'my_array.csv', 'CITATION.cff', '2_IntelliSense.ipynb', '3_matrix_vector_manipulations.ipynb']
/home/runner/work/PA1.3/PA1.3
Pandas#
Now we will use pandas to import our newly created CSV file and perform some data analysis. We could also do this with other libraries, such as numpy, but pandas is specifically designed for data manipulation and analysis.
Some useful pandas functions include:
pd.read_csv(): Read a CSV file into a DataFrame.df.head(): Display the first few rows of a DataFrame.df.describe(): Generate descriptive statistics.
\(\text{Task 4.4:}\)
Import our my_array.csv file using pandas and display the first few rows as well as the summary statistics.
# Import our my_array.csv file using pandas
my_df = pd.read_csv("my_array.csv")
# Display the first few rows
print(my_df.head())
# Display the summary statistics
print(my_df.describe())
# Transform the data
my_df["col4"] = my_df["col1"] * my_df["col3"]
print(my_df.head())
# Create numpy array from DataFrame
my_array = my_df.to_numpy()
print(my_array)
col1 col2 col3
0 1 1 1
1 2 4 8
2 3 9 27
3 4 16 64
4 5 25 125
col1 col2 col3
count 10.00000 10.000000 10.000000
mean 5.50000 38.500000 302.500000
std 3.02765 34.173577 343.728333
min 1.00000 1.000000 1.000000
25% 3.25000 10.750000 36.250000
50% 5.50000 30.500000 170.500000
75% 7.75000 60.250000 469.750000
max 10.00000 100.000000 1000.000000
col1 col2 col3 col4
0 1 1 1 1
1 2 4 8 16
2 3 9 27 81
3 4 16 64 256
4 5 25 125 625
[[ 1 1 1 1]
[ 2 4 8 16]
[ 3 9 27 81]
[ 4 16 64 256]
[ 5 25 125 625]
[ 6 36 216 1296]
[ 7 49 343 2401]
[ 8 64 512 4096]
[ 9 81 729 6561]
[ 10 100 1000 10000]]
By Berend Bouvy, Delft University of Technology. CC BY 4.0, more info on the Credits page of Workbook.