Howto: Troubleshoot Network Problems Ubuntu Linux Command Line

I’ve been playing with lots of networking scenarios over the last few weeks and I’ve picked up a number of useful techniques and tools for solving certain problems. This post will briefly outline the thinking behind my network troubleshooting, and will include my step-by-step responses to particular problems.

General Strategies: External Network Access

I recently was configuring a server that had three network interface cards (NICs) and three networks to connect to. One of the networks was public, with an internet connection, while the other two were entirely private. If you’re in a similar situation, or your regular external network isn’t working properly, try this:

Step 1 – Confirm your interfaces are correctly configured

The usual FIRST first step is a simple ping test, where we run the command

$ ping google.ca

and if things go well, we see something like this:

$ ping google.ca
PING google.ca (74.125.226.87) 56(84) bytes of data.
64 bytes from yyz06s07-in-f23.1e100.net (74.125.226.87): icmp_seq=1 ttl=53 time=2.82 ms
64 bytes from yyz06s07-in-f23.1e100.net (74.125.226.87): icmp_seq=2 ttl=53 time=2.62 ms
64 bytes from yyz06s07-in-f23.1e100.net (74.125.226.87): icmp_seq=3 ttl=53 time=3.06 ms
...

But if you’re reading this, things probably didn’t go well. This is why the first step is to make sure (on Ubuntu anyway) that your interfaces file (/etc/network/interfaces) is correctly configured to connect to your external network.

Two useful tools for this are ip link, which shows the state of all interfaces on the machine, and ip addr, which shows the addresses assigned to the interfaces on the machine.

Step 2 – Confirm your dns settings are correct

You may stumble on this gem:

$ ping google.ca
ping: unknown host google.ca

…which usually means a problem with your DNS server (or lack of one). Ensure that your interfaces file includes a line specifying which DNS server you are to be using. Pay attention to line 7:

# Primary interface
auto eth0
  iface eth0 inet static
  address 10.0.0.2
  netmask 255.255.255.0
  gateway 10.0.0.1
  dns-nameservers 8.8.8.8

If you had this line there, with a different IP for the DNS server, switch it to the one listed in the example above. 8.8.8.8 is the IP address for Google’s public DNS server and should always work.

Still not working?

Step 3 – Confirm your kernel has a route to the server you are pinging

At this point, the kind of error you’re experiencing probably looks like this:

$ ping google.ca
ping: no route to host 

Ubuntu Linux comes with a command called route which shows the kernel routing table – a set of basic rules on where to send packets. Running the command will show you something like this:

$ route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.0.0        0.0.0.0         255.255.255.0   U     1      0        0 eth0
192.168.122.0   0.0.0.0         255.255.255.0   U     0      0        0 virbr0
0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 eth0

Line 6 is my default gateway. We know this because the destination is 0.0.0.0 – which represents any and all IP addresses. So what does this mean? In plain english, the rule above is:

If a TCP/IP packet has a destination that is not otherwise specified on this list, send the packet to 10.0.0.1 and it’ll get there okay.

So what happens if you don’t have a default gateway (that is, a rule with the destination 0.0.0.0)? Or, as was my case, you have more than one?

First, make sure you have an IP address on the public network using ip addr:

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:1c:c0:72:48:b9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.10/24 brd 10.0.0.255 scope global eth0
    inet6 fe80::21c:c0ff:fe72:48b9/64 scope link 
       valid_lft forever preferred_lft forever

The highlighted lines are the IPs assigned to the interfaces. Assume that line 9 is a connection on my public network. So far, this means that I am connected to it. Next, add the default gateway manually:

$ route add default gw 10.0.0.1

You can delete gateways using the same syntax, but “del” instead of “add”. Check the man pages for more information.

Conclusion

These tools are your friends:

  1. ip link
  2. ip addr
  3. ifconfig (which I didn’t cover, but shows all active interfaces)
  4. route
  5. vi /etc/network/interfaces
  6. ping

Happy networking!

Solution: Glance Errors – OpenStack Folsom Basic Install Ubuntu Linux Test Deployment

I’ve been trying to follow this tutorial to set up a test deployment of OpenStack, and I’ve been having nothing but problems. At least one of them, however, was entirely my doing.

In this post (way down at the bottom) I’d talked about a couple of specific errors I was receiving when trying to test Glance:

glance image-list

Error communicating with /v1/images: [Errno 111] Connection refused

and

glance image-create 
	    --location http://uec-images.ubuntu.com/releases/12.04/release/ubuntu-12.04-server-cloudimg-amd64-disk1.img 
	    --is-public true --disk-format qcow2 --container-format bare --name "Ubuntu"

Error communicating with /v1/images: [Errno 110] Connection timed out 

Glance, as a rule, uses REST commands. Since this is an HTTP protocol, this means that the problem has something to do with an incorrectly configured network setting somewhere.

Stage 1: Test Glance using REST

To make sure Glance itself wasn’t the problem, we tested it by using the Links browser to do a direct RESTful API call to the Glance service:

links 'http://localhost:9292/v1/images/detail'

Which returned:

{"images": []}

The fact that we got anything back at all meant the API call was successful. Its result indicated that the default image store was empty.

Stage 2: Finding the problem

This really came down to due diligence.

The tutorial provided a script to populate the Keystone database with endpoints. It was written assuming I, the user, was following the networking specifics from the tutorial – like the IP ranges for the different networks. I wasn’t.

The beginning of the script looks like this:

#!/bin/sh
#
# Keystone Endpoints
#
# Description: Create Services Endpoints

# Mainly inspired by http://www.hastexo.com/resources/docs/installing-openstack-essex-20121-ubuntu-1204-precise-pangolin
# Written by Martin Gerhard Loschwitz / Hastexo
# Modified by Emilien Macchi / StackOps
#
# Support: openstack@lists.launchpad.net
# License: Apache Software License (ASL) 2.0
#


# MySQL definitions
MYSQL_USER=keystone
MYSQL_DATABASE=keystone
MYSQL_HOST=localhost
MYSQL_PASSWORD=password

# Keystone definitions
KEYSTONE_REGION=RegionOne
SERVICE_TOKEN=password
SERVICE_ENDPOINT="http://localhost:35357/v2.0"

# other definitions
MASTER="192.168.0.1"

Line 28 assigns the IP address of my controller node to a variable called master, only I’d changed the IP address of the controller node to something more suitable to my environment.

This meant I should have checked the script and changed this line to reflect the actual IP address of my controller node.

Stage 3: Solution

This was tedious because the script had already been successfully run (with the wrong settings). It had populated the Keystone “endpoints” table with what was now junk data.

First, we got rid of it using MySQL:

mysql -u glance -ppassword

mysql>> use keystone;           // Select keystone DB
mysql>> delete from endpoint;   // Purges data from 'endpoint' table
mysql>> select * from endpoint; // Confirms table is empty

And, really quickly, here are the commands to view databases and tables at the MySQL prompt:

mysql>> show databases; // Shows... databases.
mysql>> show tables;    // Shows... tables.

So after purging the table, I needed to replace the offending IP address in the script with the actual IP address of my controller node. Then, it was as simple as running the script again.

After doing that, it worked like a charm!

Solution: Ubuntu Linux Multiple Networks No Internet Connection

In a previous post I talked about some trouble with internet connectivity I’d had on an Ubuntu installation where I’d configured my /etc/network/interfaces file for multiple networks. Even though two of the four networks the computer was connected to had internet connections, the computer was using one of the static networks to try to connect to the web.

In short, I’d specified too many default gateways in my interfaces file, and had to comment out the ones referring to the static networks.

My interfaces file looked like this:

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

# OpenStack - Management Network
auto eth1
  iface eth1 inet static
  address 192.168.100.2
  netmask 255.255.252.0
  gateway 192.168.100.1

# OpenStack - API/Public Network
auto eth2
  iface eth2 inet static
  address 192.168.200.2
  netmask 255.255.252.0
  gateway 192.168.200.1

# OpenStack - Data Network
auto eth3
  iface eth3 inet static
  address 192.168.220.2
  netmask 255.255.252.0
  gateway 192.168.220.1

My supervisor’s first throught was to check the routing with the command route -n, which produced:

CENSORED@openstack-controller:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.100.1   0.0.0.0         UG    100    0        0 eth1
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.100.0   0.0.0.0         255.255.252.0   U     0      0        0 eth1
192.168.200.0   0.0.0.0         255.255.252.0   U     0      0        0 eth2
192.168.220.0   0.0.0.0         255.255.252.0   U     0      0        0 eth3

This showed that the default gateway, highlighted on line 4, is the gateway for one of my static networks. We tried adding the proper gateway with route add default gw 192.168.200.1, which caused route -n to display:

CENSORED@openstack-controller:~$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.100.1   0.0.0.0         UG    0      0        0 eth1
0.0.0.0         192.168.200.1   0.0.0.0         UG    100    0        0 eth2
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.100.0   0.0.0.0         255.255.252.0   U     0      0        0 eth1
192.168.200.0   0.0.0.0         255.255.252.0   U     0      0        0 eth2
192.168.220.0   0.0.0.0         255.255.252.0   U     0      0        0 eth3

So now we had two default gateways, and still no connection.

We deleted the gateway for the static network (route del default gw 192.168.100.1) and the internet worked perfectly.

In the end we solved the problem by commenting out the “gateway” setting for each static network in the /etc/network/interfaces file (highlighted lines):

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0
iface eth0 inet dhcp

# OpenStack - Management Network
auto eth1
  iface eth1 inet static
  address 192.168.100.2
  netmask 255.255.252.0
#  gateway 192.168.100.1

# OpenStack - API/Public Network
auto eth2
  iface eth2 inet static
  address 192.168.200.2
  netmask 255.255.252.0
  gateway 192.168.200.1

# OpenStack - Data Network
auto eth3
  iface eth3 inet static
  address 192.168.220.2
  netmask 255.255.252.0
#  gateway 192.168.220.1

After rebooting the only default gateway displayed by the route -n command was the one we wanted:

CENSORED@openstack-controller:/home/senecacd# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.200.1   0.0.0.0         UG    100    0        0 eth2
10.0.0.0        0.0.0.0         255.255.255.0   U     0      0        0 eth0
192.168.100.0   0.0.0.0         255.255.252.0   U     0      0        0 eth1
192.168.200.0   0.0.0.0         255.255.252.0   U     0      0        0 eth2
192.168.220.0   0.0.0.0         255.255.252.0   U     0      0        0 eth3

Our ping test confirmed the internet was back, and there was much merriment and rejoicing!