UP | HOME

The Art of Manually Editing Hunks

Table of Contents

There's a certain art to editing hunks, seemingly arcane. Hunks are blocks of changes typically found in unified diff patch files, or, more commonly today, found in Git patches.

Git uses its own variant of the unified diff format, but it isn't much different. The differences between the unified format and Git's are usually not significant. The patch files created with git-show or git-diff are consumable by the usual tools, patch, git, vimdiff, etc.

Short Introduction to Unified Diff

A unified diff may look something similar to (freely copied from the diffutils manual):

--- lao 2002-02-21 23:30:39.942229878 -0800
+++ tzu 2002-02-21 23:30:50.442260588 -0800
@@ -1,7 +1,6 @@
-The Way that can be told of is not the eternal Way;
-The name that can be named is not the eternal name.
 The Nameless is the origin of Heaven and Earth;
-The Named is the mother of all things.
+The named is the mother of all things.
+
 Therefore let there always be non-being,
   so we may see their subtlety,
 And let there always be being,
@@ -9,3 +8,6 @@
 The two are the same,
 But after they are produced,
   they have different names.
+They both may be called deep and profound.
+Deeper and more profound,
+The door of all subtleties!

The first two lines define the files that are input into the diff program, the first, lao, being the "source" file and the second, tzu, being the "new" file. The starting characters --- and +++ denote the lines from each.

+ denotes a line that will be added to the first file and - denotes a line that will be removed from the first file. Lines with no changes are preceded by a single space.

The @@ -1,7 +1,6 @@ and @@ -9,3 +8,6 @@ are the hunk identifiers. That is, diff hunks are the blocks identified by @@ -line number[,context] +line number[, context] @@ in the diff format. The context number is optional and occasionally not needed. However, it is always included in when using git-diff. The line numbers defines the number the hunk begins. The context number defines the number of lines in the hunk. Unlike the line number, it often differs between the two files. In the first hunk of the example above, the context numbers are 7 and 6, respectively. That is, lines preceded with a - and a space equals 7. Similarly, lines starting with a + and a space equals 6.

Lines starting with a space count towards the context of both files.

Since the second file has a smaller context, this means we are removing more (by one) lines than we are adding. To diff, updating a line is the same as removing the old line and adding a new line (with the changes).

Armed with this information, we can start editing hunks that can be cleanly applied.

Motivation

What might be the motivation for even wanting to edit hunk files? The biggest I see is when using git-add --patch. Particularly when the changes run together and cannot be split apart automatically. We can see this in the diff above.

The trivial case is being able to stage a single hunk of the above diff, nothing has to be done to stage the changes separately other than using the --patch option.

However, staging separate changes inside a hunk becomes slightly more complicated. Often, if the changes are broken up with a even just a single line (if it exists), they can be split. When they run together, it becomes more difficult to do.

Of course, a way to solve this problem, is to manually back out the changes (a series of "undos"), save the file, stage it, play back the changes (a series of "redos", perhaps). This can be very error prone and if you make any other changes during between undo and redo, you may have lost the changes. Therefore, being able to manually edit the specific hunk into the right shape, no changes are lost.

Hunk Editing Example

Let's walk through an example of staging some changes, and manually editing a hunk to stage them into the patches we want.

Create a temporary Git repository, this will be a just some basic stuff for testing.

% cd /tmp
% git init foo
% cd foo

From here on, we will assume the working directory to be /tmp/foo.

Inside this new Git repository, add a new file, quicksort.exs:

defmodule Quicksort do

  def sort(list) do
    _sort(list)
  end

  defp _sort([]), do: []
  defp _sort(list = [h|t]) do
    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
  end

end

Perform the usual actions, git-add and git-commit:

% git add quicksort.exs
% git commit -m 'initial commit'

Now, let's make some changes. For one, there's compiler warning about the unused variable t and the actually sorting seems a bit dense. Let's fix the warning and breakup the sorting:

defmodule Quicksort do

  def sort(list) do
    _sort(list)
  end

  defp _sort([]), do: []
  defp _sort(list = [h|_]) do
    (list |> Enum.filter(&(&1 < h)) |> _sort)
    ++ [h] ++
    (list |> Enum.filter(&(&1 > h)) |> _sort)
  end

end

Saving this version of the file should produce a diff similar to the following:

diff --git a/quicksort.exs b/quicksort.exs
index 97b60b4..ed2446b 100644
--- a/quicksort.exs
+++ b/quicksort.exs
@@ -5,8 +5,10 @@ defmodule Quicksort do
   end

   defp _sort([]), do: []
-  defp _sort(list = [h|t]) do
-    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
+  defp _sort(list = [h|_]) do
+    (list |> Enum.filter(&(&1 < h)) |> _sort)
+    ++ [h] ++
+    (list |> Enum.filter(&(&1 > h)) |> _sort)
   end

 end

However, since these changes are actually, argubly, two different changes, they should live in two commits. Let's stage the change for t to _:

% git add --patch

We will be presented with the diff from before:

diff --git a/quicksort.exs b/quicksort.exs
index 97b60b4..ed2446b 100644
--- a/quicksort.exs
+++ b/quicksort.exs
@@ -5,8 +5,10 @@ defmodule Quicksort do
   end

   defp _sort([]), do: []
-  defp _sort(list = [h|t]) do
-    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
+  defp _sort(list = [h|_]) do
+    (list |> Enum.filter(&(&1 < h)) |> _sort)
+    ++ [h] ++
+    (list |> Enum.filter(&(&1 > h)) |> _sort)
   end

 end
Stage this hunk [y,n,q,a,d,/,e,?]?

First thing we want to try is using the split(s) option. However, this is an invalid choice because Git does not know how to split this hunk and we will be presented with the available options and the hunk again. The option we then want is edit(e).

We will be dropped into our default editor, environment variable $EDITOR, Git core.editor setting. From there, we will be presented with something of the following:

# Manual hunk edit mode -- see bottom for a quick guide
@@ -5,8 +5,10 @@ defmodule Quicksort do
   end

   defp _sort([]), do: []
-  defp _sort(list = [h|t]) do
-    _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
+  defp _sort(list = [h|_]) do
+    (list |> Enum.filter(&(&1 < h)) |> _sort)
+    ++ [h] ++
+    (list |> Enum.filter(&(&1 > h)) |> _sort)
   end

 end
# ---
# To remove '-' lines, make them ' ' lines (context).
# To remove '+' lines, delete them.
# Lines starting with # will be removed.
#
# If the patch applies cleanly, the edited hunk will immediately be
# marked for staging. If it does not apply cleanly, you will be given
# an opportunity to edit again. If all lines of the hunk are removed,
# then the edit is aborted and the hunk is left unchanged.

From here, we want to replace the leading minus of the change removal to a space and remove the last three additions.

That is, we want the diff to look like:

@@ -5,8 +5,10 @@ defmodule Quicksort do
   end

   defp _sort([]), do: []
-  defp _sort(list = [h|t]) do
 sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
+  defp _sort(list = [h|_]) do
   end

 end

Saving and closing the editor now, Git will have staged the desired diff. We can check the staged changes via git-diff:

% git diff --cached
diff --git a/quicksort.exs b/quicksort.exs
index 97b60b4..94a5101 100644
--- a/quicksort.exs
+++ b/quicksort.exs
@@ -5,8 +5,8 @@ defmodule Quicksort do
   end

   defp _sort([]), do: []
-  defp _sort(list = [h|t]) do
     _sort(Enum.filter(list, &(&1 < h))) ++ [h] ++ _sort(Enum.filter(list, &(&1 > h)))
+  defp _sort(list = [h|_]) do
   end

 end

Notice, the hunk context data was updated correctly to match the new changes.

From here, commit the first change, and then add and commit the second change.

Something to watch out for is over zealously removing changed lines. For example, in Elixir quicksort example we have just did, if we entirely removed the second - from the diff and manually updated the hunk header, the patch will never apply cleanly. Therefore, be especially careful with removing - lines.