How to Get a Web Page from the Command Line on Linux

Here we cover the two excellent ways to get a web page from the command line on Linux. On Linux & Unix systems, the two common utilities to download web pages are curl and wget.

I’ve used both of these utilities to great success on CentOS, Ubuntu, Debian and even Mac OS X.

Wget & Curl can be used to make adhoc downloads of webpages. These free download utilities can also be used to download other files on websevers – you can download software, images, music and movies with them.

Curl and wget can also be used from scripts to automate download of webpages.

Is Curl Installed?

Let’s see if you have Curl installed. From a terminal session on your Linux server, try the curl command just by itself (i.e. without any command line arguments):

$ curl

If you get a response like this, you have curl installed:

curl: try 'curl --help' or 'curl --manual' for more information

If you get a response like this, you probably don’t have Curl installed:

-bash: curl: command not found

(Even if you get this response, it is still possible your Linux system has curl installed. It may be that the Curl executable isn’t in the PATH of your shell.)

If you don’t have Curl installed, install it or try Wget instead (below).

Download a Web Page using Curl

Okay! Here’s how to download a webpage with Curl. In this example of downloading a webpage, we download the home page of sclinux.com:

$ curl http://sclinux.com -o sclinux.com.html

By default, Curl will output the downloaded page to the terminal. The ’-o‘ option write the webpage to the file sclinux.com.html. You can use shell redirection instead if you wish to send the output of Curl to a file, for example:

curl http://sclinux.com >sclinux.com.html

If Curl detects it’s sending its output to a file, it will show and update a status display for the download. This is useful if you’re downloading something larger, such as WordPress, or a Linux distribution. Curl’s status display looks like this:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                 Dload  Upload   Total   Spent    Left  Speed
100  5013    0  5013    0     0   2914     0 --:--:--  0:00:01 --:--:--  7161

Checking the Curl Download was Successful

To check the download was successful, we can examine the downloaded file. Here we use the ls (list files) command with the options ‘-l‘ (long listing) and ‘-h‘ (human readable file sizes):

$ ls -lh sclinux.com.html
total 16-rw-r--r--  1 taz  staff   4.9K  6 May 20:34 sclinux.com.html

We can see the downloaded web page’s size is 4.9 KB.

To check the download’s success, we can also look at the contents of the webpage. The head command will display the first ten lines of the file:

$ head sclinux.com.html
<!DOCTYPE html>
<html dir="ltr" lang="en-US"><head
><meta charset="UTF-8" />
<title>S.C. Linux | Working with Linux</title>
<link rel="profile" href="http://gmpg.org/xfn/11" />
<link rel="stylesheet" type="text/css" media="all" href="http://sclinux.com/wp-content/themes/tazzle/style.css" />
<link rel="pingback" href="http://sclinux.com/xmlrpc.php" />
<link rel="alternate" type="application/rss+xml" title="S.C. Linux &raquo; Feed" href="http://sclinux.com/feed" />
<link rel="alternate" type="application/rss+xml" title="S.C. Linux &raquo; Comments Feed" href="http://sclinux.com/comments/feed" />

Okay, the webpage downloaded successfully with Curl – great!

Download a Webpage from the Command Line using Wget

Wget is installed by default on Ubuntu, at least in Ubuntu 10.10 Desktop. Here’s how to get the same S.C. Linux homepage using the wget utility:

$ wget http://sclinux.com

With Wget, an output file doesn’t need to be specified. Wget names the file automatically, in this case to index.html.

$ ls -lh index.html
-rw-r--r--  1 taz  staff   4.9K  6 May 21:54 index.html

The web page retrieved with wget is 4.9 kilobytes, which is the same size as when retrieved by curl. Lucky! As with Curl, you can also check the contents of the downloaded webpage with utilities like head to check the download is as expected.

Wget Shows More Information

By default, Wget gives more information than Curl when it downloads a file.

Wget’s output looks like this:

--2011-05-06 21:54:11--  http://sclinux.com/
Resolving sclinux.com (sclinux.com)... 66.147.240.179
Connecting to sclinux.com (sclinux.com)|66.147.240.179|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html'
[  <=>                                                                                                         ] 5,013       21.7K/s   in 0.2s
2011-05-06 21:54:12 (21.7 KB/s) - `index.html' saved [5013]

Wget gives the IP address of the server (here 66.147.240.179) , the HTTP status code for retrieving the page (here 200 – which means OK), the MIME type of the web page (text/html) and the length of the file (5013 bytes).

Advanced Wget & Curl Features

Curl and Wget have much more power, such as:

  • downloading websites,
  • handling webpage redirects,
  • completing web server authorization requests,
  • using secure HTTP (HTTPS) or FTP to download,
  • being used as libraries from programming languages like PHP.

While there is significant overlap between the two download commands, Curl & Wget have different features and strengths.

If you’re interested in these features, write a comment below, and I’ll give more examples of how to take full advantage of these powerful command line download utilities.

Thanks for visiting!

Tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>