Tutorial: How to Use Ack and Grep on Ubuntu 14.04

Tutorial: How to Use Ack and Grep

Search and you shall find. On a Linux system, there are numerous search tools for quickly and precisely finding certain local data.

We could use the locate and find commands to find files by their name, type, timestamps, owner, or size. The find command can also search the file contents, but in most cases, there is an easier tool for that called grep. If we wanted to search a file or directory for some relevant content string, we could use the grep command, or its newer alternative ack.

The name “grep” stands for “global / regular expression / print.” The “g” is an abbreviation for “global search” on Unix. Grep can be used to see if the file input it receives matches a specified pattern; such patterns are called regular expressions, and you have likely seen some of them before in other software tools. In this tutorial, we will only be using the basics of regular expressions, but be sure to explore their “deeper waters,” if needed.

The full power of grep and similar tools really starts to show when we combine its search and filtering operations with other Linux commands.

Step 1: Get Some Sample Data Files

To get started with some common file data, lets download the jQuery source code from their Github repository.

First, we need to install Git, so that we can download projects from Github:

Now we can download the jquery source code to our home directory:

Then, go into the directory we just downloaded:

Let’s have a look at the files in this directory using the ls command:

We see a list of different file types and a few directories:

Let’s see how we could find content in this source code.

Step 2a: Using Grep

Grep comes already installed on every Linux system, so there is no need for manual installation.

Grep Command Options

This is a summary of the grep command options we will use in this tutorial:

  • -i does case-insensitive character matching
  • -r reads all files under each directory recursively
  • -n shows the line number of each match
  • -c shows the match count
  • -v inverts the matching by selecting the non-matching lines
  • -o prints only the matched parts of a matching line, with each part on a separate output line
  • -w only matches on whole words

Basic Examples

If you wanted to find the files that contained the string “John Resig” for every file in the current directory, you would type:

The resulting output would be:

The “*” tells grep to match all files in the current directory. If our search pattern contains any spaces, we need to put quotes around the search string (single quotes or double quotes).

If you wanted to find the files that contained the string “Authors” for every file in the current directory, you would type:

The resulting output would be:

Grep found one matching file and printed the line that matched our “Author” pattern. Note that grep is not matching the file name here, only the content of the file.

If we had typed this instead:

We would see a different matched file, because grep is sensitive to character casing by default.

We could use a grep command line -i option to turn on case-insensitive character-matching instead to ignore any casing sensitivity:

Now we can see all matches regardless of any character casing combination we could have used in our search pattern.

To do the same search throughout all the directories (in our current directory), we can add the -r recursive option:

Now grep will search all the directories and their recursions until it is done.

This same command can be shortened by combining the options, producing the same result:

To see the line numbers of the matching results, we add the -n option:

To search the AUTHORS.txt file for lines with a “gmail.com” domain:

If we wanted to count all the matches of the previous search, we would add the -c option:

We would see a number printed, indicating the number of matched lines.

To invert our a previous “gmail.com” search pattern, we would use the -v option:

Now we see all the lines without the “gmail.com” string —a pretty handy feature.

We can search for whole word matches as well. Lets search, case-insensitively, for the word “bug.”

The -w option forces our pattern to only be matched on whole words, so words containing the string “bug” (e.g., “bugs”) would not be a valid match.

If we wanted to find out the number of times the word “jquery” was mentioned all throughout the source code, we would pipe “|” and then put the wc wordcount command with a -l filter, so we only count the lines – not the number of words or characters. The -o option is used to print each matching part on a separate output line, or our count would not be correct.

If we do a search that returns many matches, we can pipe the grep output to less. Less is a paging tool that makes it easy to scroll through all the output using either the , , “page-up,” or “page-down” keys, or the SPACE bar.

We can also chain several grep commands together to do easy filtering of the results of each previous command.

Advanced Examples

To create much more precise matching patterns, we will need to use regular expressions.

For example, say we wanted to find the authors with a first name of “Chris” or “John,” but not “Christopher,” “Christian” or any other first name pattern.

And voilà, we see all the authors with a first name of Chris or John.

The -E option tells grep to interpret our search pattern as an extended regular expression. This pattern contains two match parts “(^Chris )” and “(^John )” that are separated by the pipe symbol:”|”, which represents a logical or function. If any of the two parts match, we print the result. To only search for the first names, we use the caret “^” symbol that represents a start-of-line function. So we only want our name patterns to match at the beginning of the lines.

If you would like to learn more about using grep with regular expression, see this tutorial. Mastering regular expressions is a skill worth working on.

Step 2b: Using Ack

Ack is a search tool like just grep, but it’s optimized for searching in source code trees. Ack does almost all that grep does, but it differs in the following ways.

Ack was designed to:

  • Search directories recursively by default
  • Easily exclude certain file types or only search for certain file types
  • Ignore the common version control directories by default; these are directories with names like: .git, .gitignore, .svn
  • Ignore binary files by default; these are files like: binary executables, image/music/video files, gzip/zip/tar archive files
  • Have better highlighting of matches and also to format the output a bit more cleanly

That being said, one case in which grep often is quicker than ack is if you are searching through very big files looking using regular expressions.

Installing Ack

To get started, the first step is to install the ack tool on your machine.

On an Ubuntu or Debian machine, this is as simple as installing the utility from the default repositories. The package is called ack-grep:

Is the program called ack-grep or ack?

The name of the program is “ack.” Some packagers have called it “ack-grep” when creating packages, because there’s already a package out there called “ack” that has nothing to do with this ack. We can tell our Linux system to shorten this command to “ack” if we would like by typing this command:

Now, the tool will respond to the name “ack” instead of “ack-grep.”

Ack Command Options

This is a summary of the ack command options we will use in his tutorial:

  • -i does case-insensitive character matching
  • -f–X only prints the files that would be searched, without actually doing any searching, where “X” denotes the filetype (e.g., “–html”)
  • -n does not descend into any subdirectories.
  • -w only matches whole words
  • –type=noX excludes certain filetypes from the search, where “X” denotes the filetype to be excluded (e.g., “–type=nophp” to exclude PHP files)

Basic Examples

Let’s do some searching on our jQuery source tree again to see how ack optimizes code searching.

We see this result:

Compare the above output to the grep version of this search:

We see this result:

Note how the ack search is done recursively by default, and each match is printed on its own line with a line number by default. The formatting is a bit easier to read, especially when there are many matches.

These defaults and formatting are nice when you often search through code trees.

Ack can do more than that, though. Lets find all HTML files in the source tree.

The -f option  only prints the files that would be searched without actually doing any searching. The –html option is a special feature of ack. Ack understands many file types, and by specifying this option, you ask it to only search for HTML files.

Let’s search all JavaScript files, case-insensitively, for the word “bug.”

The –js option tells ack to only search in JavaScript files. You can search for all kinds of other file types, e.g. –php, –python, –perl, et cetera. This file type-based filtering will make your searches much faster, especially on bigger source trees.

Sometimes we don’t want to do a recursive search. To search in the current directory only for the word “bug,” we type:

The -n option tells ack not to descend into any subdirectories.

Let’s do a recursive search for the word “css,” but exclude any JavaScript files:

The –type=noX option allows for the exclusion of file types known by ack, where “X” denotes the file type to be excluded.

Advanced Examples

The same regular expression that we used with grep will also work for ack:

Ack has a lot more to offer than what was shown in here. See this tutorial and the official documentation for a more in-depth look at using ack.

Other grep-like Tools

Here are some other great search tools that are worth exploring:

  • zgrep Grep tool that can search compressed files (e.g., compressed log files)
  • agrep Grep-like tool with support for approximate patterns
  • jq – Command line tool to search in JSON files and structure the resulting output (as valid JSON)
  • xgrep, xmlgrep, xmlstar – These are similar command line tools to search the content of XML files
  • pdfgrep – Command line tool to search the content of PDF files
  • git grep – Built-in search tool of the Git versioning system
Ryan Frankel

Questions or Comments? Ask Ryan!

Ask a question and Ryan will respond to you. We strive to provide the best advice on the net and we are here to help you in any way we can.

  • Al

    Hi Ryan, reading this post with great hope. What I know about scripts, bash, grep, ack etc is the tip of an iceberg compared to yourself! I have been searching for help creating a script to check a list of websites for ‘mobile responsive’ and output the yes ones to a file but so far no luck until now, I feel your comments above say it can be done. If so can you help?
    Thanks for any time.
    Al.

  • Al

    Hello again Ryan. Wondering what happened to my question I posted? Unsure if I have gone about this the correct way, all I can say is when posted a message came up that it was up for moderation.

    Please just let me know one way or the other, thank you.

    Al.

    • frankel0

      Hey Al,

      I actually have been thinking about this and didn’t have a great reply formulated yet. I do have a few questions back:

      1. When you say “Mobile Responsive” do you just mean that the site has a mobile version, or you specifically want to know if it responsive?

      2. How many sites are you wanting to do this for?

      If #2 is a reasonable number you may just want to go the route of using the Google Mobile-Friendly test tool:

      https://www.google.com/webmasters/tools/mobile-friendly/

  • frankel0

    Hmmm, the biggest problem I am having here is that just because something has a @media query doesn’t really mean it is a mobile (or responsive) version. How important is this to be accurate?