Tutorial: How to Use Ack and Grep on Ubuntu 14.04

Tutorial Use Ack Grep Linux

Search and you shall find. On a Linux system, there are numerous search tools for quickly and precisely finding certain local data.

We could use the locate and find commands to find files by their name, type, timestamps, owner, or size. The find command can also search the file contents, but in most cases, there is an easier tool for that called grep. If we wanted to search a file or directory for some relevant content string, we could use the grep command, or its newer alternative ack.

The name “grep” stands for “global / regular expression / print.” The “g” is an abbreviation for “global search” on Unix. Grep can be used to see if the file input it receives matches a specified pattern; such patterns are called regular expressions, and you have likely seen some of them before in other software tools. In this tutorial, we will only be using the basics of regular expressions, but be sure to explore their “deeper waters,” if needed.

The full power of grep and similar tools really starts to show when we combine its search and filtering operations with other Linux commands.

Step 1: Get Some Sample Data Files

To get started with some common file data, lets download the jQuery source code from their Github repository.

First, we need to install Git, so that we can download projects from Github:

sudo apt-get install git

Now we can download the jquery source code to our home directory:

cd ~
git clone https://github.com/jquery/jquery.git

Then, go into the directory we just downloaded:

cd jquery

Let’s have a look at the files in this directory using the ls command:

ls

We see a list of different file types and a few directories:

AUTHORS.txt bower.json build CONTRIBUTING.md external Gruntfile.js LICENSE.txt package.json README.md src test

Let’s see how we could find content in this source code.

Step 2a: Using Grep

Grep comes already installed on every Linux system, so there is no need for manual installation.

Grep Command Options

This is a summary of the grep command options we will use in this tutorial:

  • -i does case-insensitive character matching
  • -r reads all files under each directory recursively
  • -n shows the line number of each match
  • -c shows the match count
  • -v inverts the matching by selecting the non-matching lines
  • -o prints only the matched parts of a matching line, with each part on a separate output line
  • -w only matches on whole words

Basic Examples

If you wanted to find the files that contained the string “John Resig” for every file in the current directory, you would type:

grep 'John Resig' *

The resulting output would be:

AUTHORS.txt:John Resig 
grep: build: Is a directory
grep: external: Is a directory
grep: src: Is a directory
grep: test: Is a directory

The “*” tells grep to match all files in the current directory. If our search pattern contains any spaces, we need to put quotes around the search string (single quotes or double quotes).

If you wanted to find the files that contained the string “Authors” for every file in the current directory, you would type:

grep Authors *

The resulting output would be:

AUTHORS.txt:Authors ordered by first contribution.
grep: build: Is a directory
grep: external: Is a directory
grep: src: Is a directory
grep: test: Is a directory

Grep found one matching file and printed the line that matched our “Author” pattern. Note that grep is not matching the file name here, only the content of the file.

If we had typed this instead:

grep authors *

We would see a different matched file, because grep is sensitive to character casing by default.

We could use a grep command line -i option to turn on case-insensitive character-matching instead to ignore any casing sensitivity:

grep -i authors

Now we can see all matches regardless of any character casing combination we could have used in our search pattern.

To do the same search throughout all the directories (in our current directory), we can add the -r recursive option:

grep -i -r authors *

Now grep will search all the directories and their recursions until it is done.

This same command can be shortened by combining the options, producing the same result:

grep -ir authors *

To see the line numbers of the matching results, we add the -n option:

grep -irn authors *

To search the AUTHORS.txt file for lines with a “gmail.com” domain:

grep -i gmail.com AUTHORS.txt

If we wanted to count all the matches of the previous search, we would add the -c option:

grep -ic gmail.com AUTHORS.txt

We would see a number printed, indicating the number of matched lines.

To invert our a previous “gmail.com” search pattern, we would use the -v option:

grep -iv gmail.com

Now we see all the lines without the “gmail.com” string —a pretty handy feature.

We can search for whole word matches as well. Lets search, case-insensitively, for the word “bug.”

grep -i -w bug *

The -w option forces our pattern to only be matched on whole words, so words containing the string “bug” (e.g., “bugs”) would not be a valid match.

If we wanted to find out the number of times the word “jquery” was mentioned all throughout the source code, we would pipe “|” and then put the wc wordcount command with a -l filter, so we only count the lines – not the number of words or characters. The -o option is used to print each matching part on a separate output line, or our count would not be correct.

grep -iro jquery * | wc -l

If we do a search that returns many matches, we can pipe the grep output to less. Less is a paging tool that makes it easy to scroll through all the output using either the , , “page-up,” or “page-down” keys, or the SPACE bar.

grep -ir jquery * | less

We can also chain several grep commands together to do easy filtering of the results of each previous command.

grep -ir jquery * | grep -i json | less

Advanced Examples

To create much more precise matching patterns, we will need to use regular expressions.

For example, say we wanted to find the authors with a first name of “Chris” or “John,” but not “Christopher,” “Christian” or any other first name pattern.

grep -E "(^Chris )|(^John )" AUTHORS.txt

And voilà, we see all the authors with a first name of Chris or John.

The -E option tells grep to interpret our search pattern as an extended regular expression. This pattern contains two match parts “(^Chris )” and “(^John )” that are separated by the pipe symbol:”|”, which represents a logical or function. If any of the two parts match, we print the result. To only search for the first names, we use the caret “^” symbol that represents a start-of-line function. So we only want our name patterns to match at the beginning of the lines.

If you would like to learn more about using grep with regular expression, see this tutorial. Mastering regular expressions is a skill worth working on.

Step 2b: Using Ack

Ack is a search tool like just grep, but it’s optimized for searching in source code trees. Ack does almost all that grep does, but it differs in the following ways.

Ack was designed to:

  • Search directories recursively by default
  • Easily exclude certain file types or only search for certain file types
  • Ignore the common version control directories by default; these are directories with names like: .git, .gitignore, .svn
  • Ignore binary files by default; these are files like: binary executables, image/music/video files, gzip/zip/tar archive files
  • Have better highlighting of matches and also to format the output a bit more cleanly

That being said, one case in which grep often is quicker than ack is if you are searching through very big files looking using regular expressions.

Installing Ack

To get started, the first step is to install the ack tool on your machine.

On an Ubuntu or Debian machine, this is as simple as installing the utility from the default repositories. The package is called ack-grep:

sudo apt-get update
sudo apt-get install ack-grep

Is the program called ack-grep or ack?

The name of the program is “ack.” Some packagers have called it “ack-grep” when creating packages, because there’s already a package out there called “ack” that has nothing to do with this ack. We can tell our Linux system to shorten this command to “ack” if we would like by typing this command:

sudo dpkg-divert --local --divert /usr/bin/ack --rename --add /usr/bin/ack-grep

Now, the tool will respond to the name “ack” instead of “ack-grep.”

Ack Command Options

This is a summary of the ack command options we will use in his tutorial:

  • -i does case-insensitive character matching
  • -f–X only prints the files that would be searched, without actually doing any searching, where “X” denotes the filetype (e.g., “–html”)
  • -n does not descend into any subdirectories.
  • -w only matches whole words
  • –type=noX excludes certain filetypes from the search, where “X” denotes the filetype to be excluded (e.g., “–type=nophp” to exclude PHP files)

Basic Examples

Let’s do some searching on our jQuery source tree again to see how ack optimizes code searching.

ack -i Authors *

We see this result:

RS.txt
1:Authors ordered by first contribution.

bower.json
12: "AUTHORS.txt",

external/sizzle/MIT-LICENSE.txt
18:NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE

external/qunit/MIT-LICENSE.txt
18:NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE

LICENSE.txt
27:NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE

package.json
10: "url": "https://github.com/jquery/jquery/blob/master/AUTHORS.txt"
41: "grunt-git-authors": "1.2.0",

Compare the above output to the grep version of this search:

grep -i Authors *

We see this result:

AUTHORS.txt:Authors ordered by first contribution.
bower.json: "AUTHORS.txt",
grep: build: Is a directory
grep: external: Is a directory
LICENSE.txt:NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
package.json: "url": "https://github.com/jquery/jquery/blob/master/AUTHORS.txt"
package.json: "grunt-git-authors": "1.2.0",
grep: src: Is a directory
grep: test: Is a directory

Note how the ack search is done recursively by default, and each match is printed on its own line with a line number by default. The formatting is a bit easier to read, especially when there are many matches.

These defaults and formatting are nice when you often search through code trees.

Ack can do more than that, though. Lets find all HTML files in the source tree.

ack -f --html

The -f option only prints the files that would be searched without actually doing any searching. The –html option is a special feature of ack. Ack understands many file types, and by specifying this option, you ask it to only search for HTML files.

Let’s search all JavaScript files, case-insensitively, for the word “bug.”

ack -i -w --js bug

The –js option tells ack to only search in JavaScript files. You can search for all kinds of other file types, e.g. –php, –python, –perl, et cetera. This file type-based filtering will make your searches much faster, especially on bigger source trees.

Sometimes we don’t want to do a recursive search. To search in the current directory only for the word “bug,” we type:

ack -n -w bug

The -n option tells ack not to descend into any subdirectories.

Let’s do a recursive search for the word “css,” but exclude any JavaScript files:

ack -w --type=nojs css

The –type=noX option allows for the exclusion of file types known by ack, where “X” denotes the file type to be excluded.

Advanced Examples

The same regular expression that we used with grep will also work for ack:

ack "(^Chris )|(^John )" AUTHORS.txt

Ack has a lot more to offer than what was shown in here. See the official documentation for a more in-depth look at using ack.

Other grep-like Tools

Here are some other great search tools that are worth exploring:

  • zgrep Grep tool that can search compressed files (e.g., compressed log files)
  • agrep Grep-like tool with support for approximate patterns
  • jq – Command line tool to search in JSON files and structure the resulting output (as valid JSON)
  • xgrep, xmlgrep, xmlstar – These are similar command line tools to search the content of XML files
  • pdfgrep – Command line tool to search the content of PDF files
  • git grep – Built-in search tool of the Git versioning system
Advertiser Disclosure

HostingAdvice.com is a free online resource that offers valuable content and comparison services to users. To keep this resource 100% free, we receive compensation from many of the offers listed on the site. Along with key review factors, this compensation may impact how and where products appear across the site (including, for example, the order in which they appear). HostingAdvice.com does not include the entire universe of available offers. Editorial opinions expressed on the site are strictly our own and are not provided, endorsed, or approved by advertisers.

Our Editorial Review Policy

Our site is committed to publishing independent, accurate content guided by strict editorial guidelines. Before articles and reviews are published on our site, they undergo a thorough review process performed by a team of independent editors and subject-matter experts to ensure the content’s accuracy, timeliness, and impartiality. Our editorial team is separate and independent of our site’s advertisers, and the opinions they express on our site are their own. To read more about our team members and their editorial backgrounds, please visit our site’s About page.