Organizing Source
Table of Contents
Another brief digression into organizing source code and projects. Described here is my current ideal, subject to change, that has evolved over the years of being a student, a professional, a contractor, and now a student again.
Why this Discussion
Previously, we discussed how to organize research papers and related metadata. Today, I want to discuss how I personally organize my source code. I write this because I am occasionally asked about this; but even more personally, I wish more of these "boring" blog posts were available and easily accessible from others so that I could learn what/how others are going about this to see if there is anything I could learn or leverage in my own setup. My first step to correcting this is to add my own notes on how and why I organize my "workspace" the way I do/have.
Many developers and many languages will discuss how to structure an individual project, or what the best practice are around a specific language or technology. However, I have found very, very few discuss the overall structure of many projects, or more pointedly, or to organize all the different source trees. This discussion may implicitly exist in mono-repository vs poly-repository discussions, but repository organization is still too limiting to my overall goal of discussing how to organize every project, regardless of ecosystem or project set. That is, we set out to answer "how do you organize all the different projects and ecosystems you're working on or dependent on?"
Perhaps it's important to note that not all projects in a workspace are active projects but may be dependencies that needed to be adjusted to work with the current active project set.
I will introduce and discuss the highlights of the development of my organization and how I arrived or was influenced in a particular direction or another. Finally, I will discuss the current organization method.
Path to Organization
The madness of my organization is derivable through a series of tools and stages in my development career. Certainly, there is some revisionism here, nothing is ever as clean as we may make it.
Before University
When learning some programming in high-school, there was NO organization. Projects were loosely collected under whatever folder made sense in the moment, projects captured by "solutions" in Visual Studio.
When I started my first internship, code was collectively organized under
C:\\dev\
. This was in the initial developer checklists for configuring a new
machine: something along the lines of "checkout the CVS tree to C:\\dev\
."
As monstrous and scary as this may sound, this worked. But it worked for a few reasons that may make it seem untenable today: this was it. There was no other source to deal with. If it wasn't in our CVS tree, there really wasn't any concern for it at the time. Outside dependencies didn't really exist. Aside from the tooling and libraries of Microsoft, there wasn't much in the realm of outside dependencies. Something I find typical of certain enterprise structures: there is inherent risk associated with and aversion to anything not controlled by the capital of the enterprise itself.
I'm confident things have changed since my time there. Mercurial is being used instead of CVS, for example!
From this, I used the single tree construct as a foundation of organizing projects.
Eclipse
Starting university, our first tool was Eclipse. Being relatively spoiled by Visual Studio, Eclipse seemed archaic. But something that stuck out is the question it asked every time it started that never occurred to me before: "Select the directory as a workspace". A very initial hint towards an overall (meta) structure of projects.
Since I was using Eclipse for essentially all of my school projects, I would
always have a ~/workspace
directory, as a more natural location to place
source, I eventually started placing more programming projects under this
directory structure, even if the projects were not necessarily Java/Eclipse
projects.
Here, I began using the ~/workspace
folder as the foundation for all
organization after this.
Maven
As typical for editor envy, the desire to ditch Eclipse for Vim became more and more apparent, finding appropriate tooling for managing projects quickly became a necessity.
Since I was mainly developing Java projects, Maven was an obvious contender.
While, at the time I didn't really enjoy the opinionated approach of Maven, the
"correct" domain name ordering stuck out as an interesting idea for organizing
packages in Java. That is, Maven, by default, creates a package structure
similar to the following com.example.projectx
, which translates to a folder
structure of src/main/java/com/example/projectx
.
While I didn't end up using Maven for a little while longer, I did eventually
borrow a modified version of the domain name folder structure. For example, I
would use com.example
as a root folder for all example.com
projects.
Golang
Prior to 1.12 and Go modules, Golang imposed its own project structuring. I have had conflicting thoughts about this. On one hand, the imposed homogeneity of projects and meta-structure was serene since it was always so easy to find references and dependencies. On the other hand, it disrupted my current organization methods. Furthermore, since I had effectively two different working trees of source code, I had more mental burden when considering ownership issues with respect to client or employer code. With Golang projects, it was no longer possible to ensure all source belonging to a particular organization was under a single tree.
Similarly, related to the homogeneous folder structures was the encoding of the
source repository in the dependency or code base, e.g., projects from GitHub were
found in the folder path ${GOPATH}/src/github.com/
.
Now that Go Modules are the default and the ${GOPATH}
isn't really used too
much, I did still borrow some of the naming and organization constructs from
Golang. Chiefly, I now used ~/workspace/src
as the root of all of the projects
instead of simply ~/workspace
. Furthermore, while I'm not particularly
interested in the platform the source is hosted on (more on this later), I am
interested in the owning or overarching organization behind the code.
Personal Projects, Professional Development, and Contracting
As a developer, hobbyist or professional, there is a need to have a clear delineation between projects. This comes both for personal desires to be organized, but also arises because of contractual reasons; I would rather my own work not be reassigned to a company just because of being hired (which is what most developer employee contracts argue, though, obligatory "INAL"). Therefore, I need a clear mechanism for delineating ownership. Thankfully, this can be baked into the folder structure itself with relative ease!
I mostly stumbled into this since I had started using a personal computer for one internship and again later when hired full-time, and later when contracting. I have tried a few different techniques to sort out the issues of entangling source from different organizations.
The first I tried was having a separate user account on the machine for work and personal and this worked well enough but was a pain point for most everything else. Namely, my dotfiles game wasn't where it is today, there was a lot of replicated files between the two user accounts. Furthermore, this model does not scale as the number of "engagements" increases.
Another approach I had considered but never tried was using something like QubesOS since its virtual machine and isolation would functionally achieve separation very easily. However, I was worried about how I would accomplish backups if necessary. Furthermore, I was really happy with then Arch Linux and later Gentoo. However, this approach likely would scale better than using separate accounts, and in some circumstances may be the only (legally) safe way to achieve separation.
Forges, Platforms, and Working in Public
Notice, this is different than the mono-repository vs. poly-repository discussion.
As noted, I'm not particularly interested in the hosting platform or forge of a
code base or project. The source repository may be moved, or there may even be
several different "repositories" that host the code; this is made possible and
evident because Git's distributed nature: there typically just happens to be a
single, "blessed" remote repository that most work is started from, but by
virtue of distribution, Git does not necessitate nor require a single "remote".
Therefore, I prefer to use the owning organization/entity domain name as the
root of organizing ownership. This could be the groupId
in Java projects, this
could be the parent organization or foundation in the cases of projects under
Apache or GNOME, this could be the personal domain of the maintainer, etc.
That said, not every project I clone into my workspace seems to require the
same attention to detail. Therefore, there might be a settling period where
GitHub projects are cloned into the com/github/${user}/${repo}
structure before
being moved to a more permanent home, if at all.
Git itself is better suited and can better handle the tracking of the various remote repositories for a project. The filesystem and organization method, on the other hand, are not well suited to the complexities of remote repository management (nor should they be). That is, if I "fork" a project on GitHub, I can add my own remote repository and the upstream source repository into the same cloned version of the project.
Organizing Source
While the historical context may not be complete enough to allow someone to perfectly derive my own organization, I do hope it serves as a solid foundation for why the organization is the way it is.
In my home folder, I have a single directory, workspace
. Under this directory,
there is src
. Having src
further allows for the addition of docs
or pkgs
, but
I'm not currently using this. Finally under src
there is the top-level
domains of all the projects I have, e.g., com
, org
, io
, net
, etc. Under each
TLD, is the next domain part, e.g., io/devnulllabs
, com/github
, org/kernel
.
Under each of these parent structures is the actual project folders.
tl;dr:
tree -L 3 ~/workspace
workspace └── src ├── com │ ├── github │ └── kennyballou ├── dev │ └── minilab ├── edu │ ├── bgsu │ └── boisestate ├── fi │ └── liw ├── io │ └── devnulllabs ├── org │ ├── coreboot │ ├── gnu │ ├── kernel │ └── soot-oss └── us └── crashrec 20 directories, 0 files
Parting Thoughts
Now that it's all written, I'm not sure this discussion truly warranted its own post. I've wanted to read something like this from others before, but in seeing what it boils down to, I certainly see why no one talks about it.
It's fairly easy to describe the current state of things, it's fairly easy to demonstrate what it looks like, but it's immensely difficult to distill the motivation and influences that over 12 years bring us to today.