Introduction to Network Troubleshooting In AWS
AWS provides a number of networking features to connect within the AWS Cloud, outside the AWS Cloud to the Internet, and in a hybrid manner to an on-premises environment. This chapter discusses tools and techniques for troubleshooting networking issues that can arise with these connections. In addition, the chapter discusses a number of common troubleshooting scenarios. Knowledge of AWS troubleshooting tools and how to troubleshoot common scenarios are both required skills for the exam, which are highlighted in this chapter
Methodology for Troubleshooting
Troubleshooting can follow either a bottom-up approach (traversing through the Open Systems Interconnection [OSI] model one by one) or a top-down approach (working through likely areas that can cause issues). There are scenarios where each has its own merits, and they can be used in combination to help resolve issues quickly. The approach can also change based on the environment in which the troubleshooting is occurring. Traversing through the OSI model systematically from Layer 1 through Layer 7 is often a useful way to pinpoint issues. Such a method can be optimized by taking into account the environment. For example, there is implicit routing for all subnets by default within a Virtual Private Cloud (VPC). This can rule out Layer 2 and Layer 3 communication issues within a VPC; thus, troubleshooting should start at Layer 4. In another example, when custom routing is set up through Amazon Elastic Compute Cloud (Amazon EC2) instances, Layer 3 troubleshooting may be required to ensure that routing is occurring as expected. Stepping back and taking a top-down approach to pinpointing potential areas for network issues is also a valuable way to troubleshoot. Knowing service limits, for example, can help resolve otherwise difficult issues to fix. Being able to recognize security group and network Access Control List (ACL) issues without having to dig through the network stack layer-by-layer is also another example of how this approach can be helpful
Network Troubleshooting Tools
AWS offers a rich set of tools that can be combined with traditional tools to help troubleshoot networking connectivity issues. Both traditional tools and AWS-native tools are discussed in this section
Traditional Tools
In this section, we discuss traditional network troubleshooting tools, many of which you may be already familiar with
Packet Captures
For troubleshooting when deep packet inspection is necessary, packet captures can be useful. Packet capture tools like Wireshark (Windows/Linux) and tcpdump (Linux) can be run on an Amazon EC2 instance. By listening at the interface level, these tools are able to view the packets as they are sent to and received from the network, revealing both packet header and payload.
ping
is a utility that records network round trip times using the Internet Control Message Protocol (ICMP). It is commonly used to test if a host is up and responsive on a network. ping can be useful for troubleshooting within AWS. It is important to note that network ACLs, security groups, and operating system firewalls must all be configured to allow ICMP traffic for this tool to be useful. Note that ICMP traffic is typically not enabled by default on many network devices and operating systems.
traceroute
is a utility that discovers the path to a destination IP or hostname. This tool can be helpful in verifying the route that traffic is following through a network. It works by sending out an ICMP packet with increasing Time-To-Live (TTL) values. Note that not all devices in a network path will respond to the ICMP request, so there may not be a value for all hops in the route. In addition, this tool will not provide meaningful results within a VPC or across VPC peering links because each is only one network hop away.
Telnet
is a text-based TCP utility. While the default telnet port is 23, telnet can be set to initiate a TCP connection on any user-specified port. This can be very helpful for troubleshooting if a service is running on a port and responding to traffic