Packaging Your Python Code#
Task 1: Create Project Structure#
You have already written some Python code for your PA2.2 assignment. Now, you will reuse that code to create a well-structured Python package.
Not that even if you haven’t completed PA2.2, you can still follow the tasks for this assingment by using the the original PA2.2 notebook.
In your project directory for PA2.5, organize your files as follows:
Create a directory called
src.Within this
srcdirectory, create another directory calledsparseforge(this will be your package name).Create an empty Python file called
__init__.pyin the same directory to mark it as a package.From your PA2.2, copy the Python files
mesh_utils.pyandbenchmark_operations.pyinto this directory calledsparseforge.
This sets up the basic structure of your package for the code part. It’s not done yet, as we currently only have the helper code in there and not the main code that makes the actual comparisons between the dense and sparse matrices.
Task 2: Go from Notebook to .py Files#
You have a Jupyter notebook in PA2.2 that demonstrates how to use your code. You will convert the relevant parts of this notebook into Python scripts that will be included in your Python package.
Note that below, the cell numbers correspond to cells that contain code in the original PA2.2 notebook, which you had in your personal github classroom repository too.
Create a new Python file in the
sparseforgedirectory calledmain.py.In
main.py, first copy the import statements from your PA2.2 notebook (this is the first cell in the notebook).Next, copy all the function definitions from the notebook into
main.py. Ensure that you only copy the function definitions. So, Cell-2 or the last lines in Cell-4 and 5 should not be copied, because they are calling the functions, not defining them.At the end of
main.py, add aif __name__ == "__main__":block. Thisifstatement ensures that the code inside it only runs when the script is executed directly, i.e. not when it is imported as a module or as a package.Inside this
ifblock, now copy the remaining code. This will include the code from Cell-2 (where you load the data), last line of Cell-4 (where you store the results), and Cell-5 (where you plot the results). Don’t forget to indent this code properly so that it is inside theifblock.
Nice! You have now restructured your code from a notebook into a Python file, which can be executed as a script or imported as a module within your package.
Tip: The way our main.py is structured now will probably give you the resulting plots in an interactive window when you run it. If you want to save these plots as image files instead, you can modify the plotting code to use plt.savefig('relvant_filename.png') instead of plt.show(). But our focus on in this assignment is on packaging and distribution, so we will not make these changes.
Task 3: Write Project Metadata in pyproject.toml#
Now that your code is organized nicely within the src/sparseforge directory, it’s time to create the pyproject.toml file that will contain all the metadata about your package.
Create a new file in the root of your repository called
pyproject.toml. Note that this file should be at the same level as thesrcdirectory.In
pyproject.toml, add the following sections and fields:[build-system] requires = ["setuptools>=42", "wheel"] build-backend = "setuptools.build_meta"
Next, add the project metadata:
[project] name = "give-the-package-name-here" # e.g., `sparseforge` version = "give-a-version-number-here" # e.g., `0.1.0` description = "what-does-your-package-do?" # e.g., `A package for comparing performance of sparse and dense matrix operations.` authors = [ { name = "<Your Name>", email = "<Your Email>" } ]
With this, you now have a TOML file in your project that describes your package and how to build it.
Task 4: Specify Dependencies#
The TOML should also specify any dependencies your package needs either to be built or for it to run. A good starting point would be to include the libraries you imported in your code.
Determine which external libraries your package depends on (e.g.,
numpy,scipy,matplotlib).In the
pyproject.tomlfile, under the[project]section, add adependenciesfield that lists out these required libraries. The format to do so can be understood from the instructional material.
Task 4: Create a basic README#
While not necessary for the package to function, a well-written README.md file is crucial for users to understand what your package does and how to use it. We’ll create a basic README.md file for your package by replacing the one provided in this repository.
Delete the content of the
README.mdfile in the root of your repository.In
README.md, you can add a header for a section by starting a line with#followed by a space and then the title of the section. For example,# SparseForge Package.Add a brief description of what your package does.
This is a minimal README.md file. In the real-world, code often comes with more detailed and useful information like installation instructions, usage examples, and contribution guidelines. However, for this assignment, a simple description will suffice.
Task 5: Build and Test Installation#
Now that we have all the necessary elements of a package, build your package using python -m build and verify installation:
Ensure you have
pip,setuptoolsandwheelinstalled in your environment. They should be part of you mude-base environment.Install the
buildpackage using pip (as this isn’t available on conda) in your mude-base environment:pip install build
In the root directory of your project (where
pyproject.tomlis located), run the following command to build your package:python -m build
Task 6: Push to GitHub#
Finally, push your code to your GitHub repository for submission. Note that as per standard practice, you should not push the dist/ or egg-info/ directories to your repository, as these are generated files. The provided .gitignore file already takes care of this for you.
Only the files you created (i.e., src/, pyproject.toml, and README.md) before the building step will be committed and pushed to your repository.
Eventually, you can delete this file 1_packaging.md from your repository, as it is only meant to guide you through the packaging process. Don’t delete the .github/ directory, as it contains the GitHub Classroom workflow for your assignment submission.
The pyproject.toml has enough information for others to build and install your package from the source code in your repository. That’s the nice thing about Python packaging! :)