arrow_backBack to blog

A Beginner's Guide: Glob Patterns

A Beginner's Guide: Glob Patterns

A Beginner's Guide: Glob Patterns

Recently, one of my coworkers was having trouble because Jest wasn't running tests on a new folder he had created.

After some investigation, it turns out that the Jest configuration glob didn't include this whole folder of tests that weren't running! (Scary!)

Understanding how globs work was essential to understanding how to fix this problem, and there isn't a ton of documentation on it other than the Linux manual. Let's change that!

In this post, we'll go over the history of globs, how to use wildcard characters, and define the three main characters of wildcard matching.

What are globs?

Globs, also known as glob patterns are patterns that can expand a wildcard pattern into a list of pathnames that match the given pattern.

On the early versions of Linux, the command interpreters relied on a program that expanded these characters into unquoted arguments to a command: /etc/glob.

This command was later on provided as a library function, which is now used by tons of programs, including the shell. Several different tools and languages have adopted globs, putting their little spin on it. It's quite the extensive list:

  • Node.js
  • Go
  • Java
  • Haskell
  • Python
  • Ruby
  • PHP

Now that we know a little bit about the history of globs, let's get into the part that makes it useful: wildcard matching.

Wildcard Matching

A string can be considered a wildcard pattern if it contains one of the following characters: *, ?, or [.

Asterisks (*)

The most common wildcard that you'll see is the asterisk. This character is used in many ways but is mainly used to match any number of characters (like parts of a string).

The three main use cases of asterisks that I've seen used are:

  • * - On Linux, will match everything except slashes. On Windows, it will avoid matching backslashes as well as slashes.
  • ** - Recursively matches zero or more directories that fall under the current directory.
  • *(pattern_list) - Only matches if zero or one occurrence of any pattern is included in the pattern-list above

These use cases can also be used in conjunction with each other! For example, to find all Markdown files recursively that end with .md, the pattern would be **/*.md

Note: *.md would only return the file paths in the current directory, which is why we append **/ at the beginning.

Question Marks (?)

The question mark wildcard is commonly used to match any single character.

For example, let's say were given a list of files:

  • Cat.png
  • Bat.png
  • Rat.png
  • car.png
  • list.png
  • mom.jpg
  • cat.jpg

If you wanted to find all the files that contained _at in the folder, you could conveniently use a pattern like ?at which would return the following results:

  • Cat.png
  • Bat.png
  • Rat.png
  • cat.jpg

Note: A cool thing about this pattern is that it doesn't care about the case of the character. I've found this useful in scripts when trying to find files that I've marked with certain dates.

Character classes and Ranges ([)

The square brackets ( [ and ] ) can be used to denote a pattern that should match a single character that is enclosed inside of the brackets. These are called character classes.

An important thing to know is that the string inside of the brackets is not allowed to be empty. This can lead to misunderstandings of weird patterns like this: [][!]

This would match the first three characters in a string that had "[", "]", and "!".

For example, let's continue to use the same list we used in the previous example:

  • Cat.png
  • Bat.png
  • Rat.png
  • car.png
  • list.png
  • mom.jpg
  • cat.jpg

If you wanted to match only the title cased files in this list, you could use the pattern [CBR]at.

This would return the result:

  • Cat.png
  • Bat.png
  • Rat.png

Ranges

A cool feature that is available for globs are ranges, which are denoted by two characters that are separated by a dash '-'.

For example, the pattern [A-E] would match any starting character that included ABCDE. Ranges can be used in conjunction with each other to make powerful patterns.

A common pattern that you may have seen before is the pattern to match alphanumerical strings: [A-Za-z0-9 ]

This would match the following:

  • [A-Z] All uppercase letters from A to Z
  • [a-z] All lowercase letters from a to z
  • [0-9] All numbers from 0 to 9

This can be used for data validation in tons of different areas since ranges work in regex expressions as well!

Complementation

A feature worth mentioning is that globs can be used in complement with special characters that can change how the pattern works. The two complement characters that I see are exclamation marks (!) and backslashes (\).

The exclamation mark can negate a pattern that it is put in front of. In the character class example I shared above, we used the pattern [CBR]at.

If we wanted to explicitly filter the results we wanted, we could negate the pattern by placing the exclamation point in front of the class [!CBR]at.

Backslashes are used to remove the special meaning of single characters '?', '*', and '[', so that they can be used in patterns.

Why are globs useful?

I've found globs extremely useful for doing a lot of scripting and automation tasks in recent months. Being able to specify certain files recursively in a directory tree is invaluable - especially when working in CI environments where you don't have control over the names of root directories.

Something important that I want to note is that while wildcard patterns are similar to regex patterns, they are not explicitly the same for two main reasons:

  1. Globs are meant to match filenames rather than text
  2. Not all conventions are the same between them (example: * means zero or more copies of the same thing in regex)

Conclusion

Hopefully, this overview of globs provides some transparency when looking over different configuration files in the future. I know this is something that I struggled with understanding when trying to read webpack/typescript/jest configurations, so if this is helpful to you, let me know on Twitter!

See you in the next post.

Useful Links/Resources

http://www.globtester.com/ https://en.wikipedia.org/wiki/Glob_(programming) https://commandbox.ortusbooks.com/usage/parameters/globbing-patterns http://teaching.idallen.com/cst8207/15w/notes/190_glob_patterns.html http://man7.org/linux/man-pages/man7/glob.7.html

Two Job Sites for Finding Your Next Developer Job

Up until a few years ago, searching for Software Engineer positions includes skimming enormous job boards like Indeed, Monster, or CyberCoders. However, several newer sites are trying to alleviate that problem.

By Malik Browne ·

Welcome to my new blog!

It only took 2 months of nonstop coding, but the website/blog is FINALLY DONE.

By Malik Browne ·

© Malik Browne 2018. All rights reserved.