Emacs: writing a project.el backend

Introduction

Lately I've seen on the web, people talking about project.el, the solution to handle “projects” which comes built-in with GNU Emacs 27.1

I decided to experiment with it too, but I've met two issues:

  • The file coming with the base installation of the editor is outdated, at least on 27.1; you still need to install the latest from ELPA.
  • The project I wanted to use it with is sort-of incompatible with the built-in settings.

The first is easy to solve, but it was surprising at first. The next GNU Emacs version should have the updated version, looking at the commit history.

The second is the one which caused more difficulties and will be the subject of this article.

How project.el deals with projects by default

This description was written when ELPA had version 0.5.3 of the project (i.e. project.el) package. While unlikely, it is possible for maintainers to change the default behaviour in future versions.

The package is structured around functions defined with cl-defgeneric and cl-defmethod, so that the correct behaviour is performed by dispatching on the structure of the data type identifying the project (more on this later.)

The “entry point”, term used a bit improperly, is the generic function project-root: this function must return the root directory of the project, that is, the directory containing each file of the project and the parent directory of which is not related to the project.

As a quick example, assuming the currently visited file is ~/MyProject/src/utils/magic.c, project-root will return ~/MyProject/

The generic implementation will throw an error when executed, because according to the bits of documentation at the beginning of the file:

;; `project-root' must be defined for every project.

After obtaining the root directory, the remaining functions will return the appropriate values relative to this directory.

The interested functions are the following: project-files will return all the files in the project, so that it's possible to visit them easily and with completion at the prompt; project-ignores will return a list of patterns to remove from the set returned by project-files; project-external-roots will return a list of directories that are not part of the project itself, but which are still related to it in one way or another.

The package also provides a backend, instead of just the generic functions. This backend is based on VC, the Emacs built-in system to deal with different VCS.

Due to the ubiquity of VCSs, this backend, as basic as it might be, will satisfy the majority of use cases. Even though VC doesn't support every VCS out there by default, there are external packages to extend it.

However, the project I was dealing with was not managed by any VCS! Therefore, I couldn't use the full power of project.el, making it essentially useless for this specific case. Still, the codebase has a particular structure suitable for a new project.el backend.

Writing a new backend

Before we begin, let's describe the structure of the project at hand: starting at the root directory, the project is made of either files or other directories. Some directories must be ignored, for example because they contain artifacts generated during the build process. Similarily, some files must also be ignored according to some pattern, like the file extension.

The defining characteristic of the project is that each directory has 0 or 1 Makefile; the root directory will always have a Makefile.

Because of this, the new backend will describe a project according to these Makefiles.

A “backend” is a set of methods dispatching on a specific project instance. As such, the first step is to define a new structure: this structure must abide these rules:

;; - Choose the format of the value that represents a project for your
;; backend (we call it project instance).  Don't use any of the
;; formats from other backends.  The format can be arbitrary, as long
;; as the datatype is something `cl-defmethod' can dispatch on.  The
;; value should be stable (when compared with `equal') across
;; invocations, meaning calls to that function from buffers belonging
;; to the same project should return equal values.

You can read which data types cl-defmethod can dispatch on by reading its documentation. The simplest data type, however, is a cons (or a list), so we'll be using that to define our instance. The built-in VC backend also uses a cons, though with different contents.

Specifically, the format will be a list whose car is the symbol makefile and whose cadr is the root directory as a string.

For reference, the VC-based backend uses an actual cons, with the car being the symbol vc and the cdr the root directory, but for reasons listed later we're going to use a list.

Now that we have our format, we can specialize project-root accordingly:

(cl-defmethod project-root ((project (head makefile)))
  (car (cdr project)))

Now, whenever the project.el internal API receives an instance of a Makefile-based project, it will correctly return the root directory.

But how does project.el know wether a project is based on Makefiles or is using a VCS? The answer is in a special hook, project-find-functions.

This hook will execute all the functions listed within and will use the first non-nil result as the project instance.

The function called by the hook must take one argument: the current directory the user is in. The function must then build the project instance based only on this argument.

(defun project-makefile-try (dir)

We are going to use locate-dominating-file to search for the topmost Makefile, allowing us to quickly skip directories not containing any.

However, some projects, most notoriously those built using autotools, have recursive Makefiles, meaning that the path returned by locate-dominating-file is not necessarily the root directory.

As such, we're going to traverse the directory tree backwards until no more Makefiles are found:

  (let ((dominating (locate-dominating-file dir "Makefile")))
    (when dominating
      (let* ((above (file-name-directory (directory-file-name dominating)))
	     (dominating2 (locate-dominating-file above "Makefile")))
        (while dominating2
	  (setq dominating dominating2
	        above (file-name-directory (directory-file-name dominating))
	        dominating2 (locate-dominating-file above "Makefile"))))

This function will look only for files called ‘Makefile’, but it should be trivial to extend it to also search for e.g. ‘GNUMakefile’

Now here is a new thing: as said above, the backend can also deal with ignored files and external root directories. In fact, the main reason why I started writing a new backend was exactly because the generic functions could not be extended to include other elements in additions to the default values.

For reference, the VC-based backend uses the relative VCS (git, hg, etc.) to create a list of ignored files and a special function to list the external roots.

Because we are basing the projects on Makefiles only, we can't do like the VC backend, at least for ignored files, so we'll have to make do with something else. What we'll do is to save within the project instance the list of patterns to ignore and the list of external roots.

Initially, I wanted to use dir-local variables to perform this task, but unfortunately I met a couple of bugs I could not resolve, so for the time being we'll make due with an optional file called .project, where each pattern or directory is on a single line and is prefixed by : (a colon) and if the colon is also prefixed by # (a hash), then the pattern or directory will be placed in the list of ignored files. Implementing reading the list of ignored files from dir-locals-file is left as an excercise for the reader.

      (let ((igns nil) (extr nil) (dotfile (concat dominating ".project")))
        (when (and (file-exists-p dotfile) (file-readable-p dotfile))
	  (let ((linep t)
		(line ""))
	    (with-temp-buffer
	      (insert-file-contents-literally (concat dominating ".project"))
	      (goto-char (point-min))
	      (while linep
		(setq line (buffer-substring-no-properties
			    (line-beginning-position)
			    (line-end-position)))
		(let ((split (split-string line ":")))
		  (when (= 2 (length split))
		    (if (string= (car split) "#")
			(setq igns (cons (car (cdr split)) igns))
		      (setq extr (cons (car (cdr split)) extr)))))
		(setq linep (= 0 (forward-line 1)))))))

Now that we have our lists, we can finally return the project instance:

        (list 'makefile dominating igns extr)))))

The list of ignored patterns is the third element and the list of external roots the fourth element of the list.

Now that our instance is ready, we can specialize the remaining methods:

(cl-defmethod project-external-roots ((project (head makefile)))
  (car (cdr (cdr project))))

(cl-defmethod project-ignores ((project (head makefile)) _dir)
  (append (car (cdr project)) grep-find-ignored-files))

The grep-find-ignored-files variable contains some common patterns, like *~ or *.o, which are normally ignored in most circumstances, essentially giving us a default list of patterns in case the user does not create this “.project” file (or equivalent.)

This backend does not specialize project-files, but that's ok. Even though the documentation warns us that the generic function will get slower the more files the project has, we really can't do any better: it will use the “find” tool to scan the filesystem, using the list of patterns returned by project-ignores to exclude the files matching those patterns.

Unlike the VC-based backend, this Makefile-based project can't really use tools operating on data formatted in a particular way like VCSs do, so the only way to make it faster would be to use a tool faster than “find”.

The final step to make it all work is to place this call somewhere, like the init file:

(add-hook 'project-find-functions #'project-makefile-try t)

You should strive to keep the default VC-based backend first, as this Makefile-based instance will conflict in some cases, giving you undesirable results.

Conclusion

All in all, extending project.el is trivial and the only thing lacking is a broader user-facing API, but that's something that will eventually come to be the more people use it.

On the other hand, repeating what I said earlier, thanks to the fact that VCSs are everywhere, there's not really a need to write new backend but rather it would be more useful to write extensions to Emacs's VC package instead.