Here we cover the two excellent ways to get a web page from the command line on Linux. On Linux & Unix systems, the two common utilities to download web pages are
I’ve used both of these utilities to great success on CentOS, Ubuntu, Debian and even Mac OS X.
Wget & Curl can be used to make adhoc downloads of webpages. These free download utilities can also be used to download other files on websevers – you can download software, images, music and movies with them.
Curl and wget can also be used from scripts to automate download of webpages.
Is Curl Installed?
Let’s see if you have Curl installed. From a terminal session on your Linux server, try the curl command just by itself (i.e. without any command line arguments):
If you get a response like this, you have curl installed:
curl: try 'curl --help' or 'curl --manual' for more information
If you get a response like this, you probably don’t have Curl installed:
-bash: curl: command not found
(Even if you get this response, it is still possible your Linux system has curl installed. It may be that the Curl executable isn’t in the
PATH of your shell.)
If you don’t have Curl installed, install it or try Wget instead (below).
Download a Web Page using Curl
Okay! Here’s how to download a webpage with Curl. In this example of downloading a webpage, we download the home page of sclinux.com:
$ curl http://sclinux.com -o sclinux.com.html
By default, Curl will output the downloaded page to the terminal. The ’
-o‘ option write the webpage to the file
sclinux.com.html. You can use shell redirection instead if you wish to send the output of Curl to a file, for example:
curl http://sclinux.com >sclinux.com.html
If Curl detects it’s sending its output to a file, it will show and update a status display for the download. This is useful if you’re downloading something larger, such as WordPress, or a Linux distribution. Curl’s status display looks like this:
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5013 0 5013 0 0 2914 0 --:--:-- 0:00:01 --:--:-- 7161
Checking the Curl Download was Successful
To check the download was successful, we can examine the downloaded file. Here we use the
ls (list files) command with the options ‘
-l‘ (long listing) and ‘
-h‘ (human readable file sizes):
$ ls -lh sclinux.com.html total 16-rw-r--r-- 1 taz staff 4.9K 6 May 20:34 sclinux.com.html
We can see the downloaded web page’s size is 4.9 KB.
To check the download’s success, we can also look at the contents of the webpage. The
head command will display the first ten lines of the file:
$ head sclinux.com.html <!DOCTYPE html> <html dir="ltr" lang="en-US"><head ><meta charset="UTF-8" /> <title>S.C. Linux | Working with Linux</title> <link rel="profile" href="http://gmpg.org/xfn/11" /> <link rel="stylesheet" type="text/css" media="all" href="http://sclinux.com/wp-content/themes/tazzle/style.css" /> <link rel="pingback" href="http://sclinux.com/xmlrpc.php" /> <link rel="alternate" type="application/rss+xml" title="S.C. Linux » Feed" href="http://sclinux.com/feed" /> <link rel="alternate" type="application/rss+xml" title="S.C. Linux » Comments Feed" href="http://sclinux.com/comments/feed" />
Okay, the webpage downloaded successfully with Curl – great!
Download a Webpage from the Command Line using Wget
Wget is installed by default on Ubuntu, at least in Ubuntu 10.10 Desktop. Here’s how to get the same S.C. Linux homepage using the wget utility:
$ wget http://sclinux.com
With Wget, an output file doesn’t need to be specified. Wget names the file automatically, in this case to
$ ls -lh index.html -rw-r--r-- 1 taz staff 4.9K 6 May 21:54 index.html
The web page retrieved with wget is 4.9 kilobytes, which is the same size as when retrieved by curl. Lucky! As with Curl, you can also check the contents of the downloaded webpage with utilities like
head to check the download is as expected.
Wget Shows More Information
By default, Wget gives more information than Curl when it downloads a file.
Wget’s output looks like this:
--2011-05-06 21:54:11-- http://sclinux.com/ Resolving sclinux.com (sclinux.com)... 184.108.40.206 Connecting to sclinux.com (sclinux.com)|220.127.116.11|:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] Saving to: `index.html' [ <=> ] 5,013 21.7K/s in 0.2s 2011-05-06 21:54:12 (21.7 KB/s) - `index.html' saved 
Wget gives the IP address of the server (here 18.104.22.168) , the HTTP status code for retrieving the page (here 200 – which means OK), the MIME type of the web page (text/html) and the length of the file (5013 bytes).
Advanced Wget & Curl Features
Curl and Wget have much more power, such as:
- downloading websites,
- handling webpage redirects,
- completing web server authorization requests,
- using secure HTTP (HTTPS) or FTP to download,
- being used as libraries from programming languages like PHP.
While there is significant overlap between the two download commands, Curl & Wget have different features and strengths.
If you’re interested in these features, write a comment below, and I’ll give more examples of how to take full advantage of these powerful command line download utilities.
Thanks for visiting!