Skip to content

kellencataldo/tcpip_assembler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Windows TCP/IPv4 conversation assembler and packet analyzer

Overview

This is a tool which will reconstruct TCP/IPv4 conversations between all IP's or a specific IP and across all ports or a specific port from data collected in a PCAP file. The tool will sort individual TCP conversations by packets and SEQ number and organize them between upload data and download data. Along with sorting the connections and packets, anomalous events detected in the TCP conversations such as an incorrect TCP teardown, set RESET bits, and duplicate packets being sent will be highlighted in the output file. This tool can be used to analyze network behavior coming into a Windows system in order to track malicious behavior.

How to use

I have uploaded the source code for this project as well as the .sln file. If you have Visual Studio the source code can be downloaded and opened in Visual Studio. If not, simply download the .h and .cpp files separately and compile them. Note: This program was designed for Windows machines and therefore makes it extremely unportable. It will not compile on a Linux machine using GNU unfortunately. I have also attached the exe which can be downloaded separately.

This tool operates on PCAP files. PCAP files are generated by certain programs which moniter all network traffic coming in and out of a system and generate a file of packets and raw data. Some popular PCAP file generating programs for Windows are WinPcap and Wireshark. Simple run on of these programs and network traffic will be monitered until the program is terminated.

This tool is a command line application and can take several arguments in order to change the desired output value. By default, the output is a file containing a global header providing an overview of all packets examined,conversation headers which display information about a specific conversation between two IP's on two ports, and individual packet headers. More information about these headers and the information they generate will be included in the What to look for section.

All of these can be turned off individually along with setting several other options. In total there are xx commands:

Toggle global header: -gh

Toggle conversation headers: -ch

Toggle packet headers: -ph

Filter by a specific port: -port #

Filter by a specifi IP (human readable format): -ip #

Do not display data: -nd

By default, the maximum size of the data is 10 MB, this can be manually increased by megabyte number to a maximum of 2 GB. The command for this is

Resize maximum data: -rs #

All of these commands can be given in any order. For example to examine packets from a specific IP on a specific port without displaying connection headers and packet headers and resizing the maximum data amount to 15 megabytes, this command would be used:

ip_convo.exe -ip 127.0.0.1 -port 80 -ch -ph -rs 15

In order to input a file, the stream must be directed to the stdin of the exe using <. This is done after adding any inputs. For example, if the path to the PCAP file was C:\path\to\example.pcap then the previous command would become:

ip_convo.exe -ip 127.0.0.1 -port 80 -ch -ph -rs 15 < C:\path\to\example.pcap

This will print the output values to the Windows cmd prompt. You probably don't want that. You really, really don't want that. In order to redirect the output stream use > and point to the path of the desired output file (there does not have to be a file in this location previous to running the command). For example, if the desired output path was C:\path\to\output.txt the previous command now becomes:

ip_convo.exe -ip 127.0.0.1 -port 80 -ch -ph -rs 15 < C:\path\to\example.pcap > C:\path\to\output.txt

What to look for

This is an example output generated from network activity

example

This example includes the global header begining with Total unique TCP/IP connections, the conversation header beginning with Connection between, the packet headers beginning with Relative sequence number, and the data, which can be seen beneath the third packet.

This network activity appears to be a relatively normal session with nothing out of the ordinary occuring. Starting with the global header, to unique TCP/IP conversations were detected. One conversation is distinguished from another conversation by the fourtuple associated with that conversation (source port, destination port, source IP, destination IP). During a browsing session, the same source IP from the same source port could connect to the same destination IP on two different ports and this will register as two seperate connections as the TCP handshake process must be done on both these ports. One type of attack to be aware of is a malicious port scan. A port scan involved a potentially malicious system repeatedly attempting to secure a connection across numerous TCP ports in order to establish a TCP handshake thereby detected vulnerable ports. If the filters for a specific IP has been set, and the total number of connections is still high, this is evidence of a potential port scan on the host system.

The total bytes field accounts for all data exchanged across all conversations. Similarly, the total unique packets accounts for all packets exchanged in the session. Across all ports, these numbers can be essentially anything without cause for suspicion. However, when examining a specific port, if these numbers are large and the cause is unknown, this most likely warrants further investigation. Its important to note that this is for only TCP/IP dat and packets, not all packets and data in the session. For example, audio and video streaming services mostly use the UDP protocol, so if streaming was occuring during the session, this would make the PCAP file relatively large, but the total TCP packets and bytes would be smaller in size.

The out of order packets field should almost always be zero. Since TCP will reject out of order packets based on the ACK_SEQ, recieving TCP packets that are not in the correct order happens extremely rarely. If, however, this field is not set, it is most likely in error with the network adapter card, and not necessarily malicious activity. The next field, duplicate packets, is the number of duplicate packets sent or recieved in the session. This is followed by total reset packets sent or recieved and then by the total unsuccessful TCP teardown. A TCP teardown is essentially a TCP handshake in reverse, in which a FIN bit is set instead of a SYN bit. Many common types of malicious attacks will be detected by these fields. A Syn flood DoS attack is one such type of attack in which a malicious system sends a SYN packet to a number of ports in order to initialize a TCP handshake. The host will faithfully send an ACK packet in return. When it does not recieve a similar ACK packet back, the host will continue sending duplicate packets, each time increasing the time to live (TTL) of the packet. On Windows systems, the default is five packets, on linux it can be anywhere from 8 to 15. This essentially holds the port hostage and therefore it will reject SYN packets from other valid connections. This type of attack will appear as an exponentially large amount of reset packets sent as well as incorrect TCP connection teardowns.

Another type of attack that will be detected by these fields is a SYN-ACK flood, which is slightly different from a SYN flood. These type of attack is mostly targeted at firewalls instead of denying service to other users. In a SYN-ACK flood, many spoofed TCP packets featuring random SEQ numbers will be sent to a system. The firewall on the system could potentially be overwhelmed while attempting to determine the cause of the out of order packets and thus might make the downstream system vulnerable. If this is the case on the host system (extremely rare), the unsuccessful TCP teardown field will be exponentially high.

The last type of attack worth mentioning that will be detected by these fields is a TCP reset attack. A TCP reset packet is an empty packet sent from one host to another with the RST flag set which will then immediately close the conversation. A reset attack exploites this by sending a forged TCP packet to a host which will close a TCP conversation to another system. In this scenario, the reset field would detect the total times that all conversation have been closed by a reset flag being set. However, this does not necessarily mean that a malicious reset packet was sent. In certain scenarios, a valid host will send a reset packet. A large amount would be cause for investigation.

Following the global header is the converstaion header. Each field of the global header has a mirror in the conversation header which tracks the behavior of specific connections. After the conversation header, the data is divided into upload packets and download packets. Note: the upload data corrosponds to which IP initiated the connection, not the data being uploaded from the host system. Most times this will also be the host system's upload data. However, in certain scenarios it could be the other way around. Despite how easy it would seem, it is actually fairly difficult to determine a local system's IP. At the least the system will have two IPs and in many cases will have even more. Each packet will also display the size of the packet in bytes, if a reset bit was set, and if a possible duplicate was deteceted. Empty packets are not cause for alarm. As can be seen in the example image, the first two packets sent contained no data. This is because a TCP connection was being initiated. After that, a packet containing an HTTP query was sent to a server. An example of a type of attack which can be detected in the data segments is a DoS attack on a POP3 server. This will cause many commands to be sent to a host on port 110. These commands will be seen in the data segment.

Note: I wanted to include support for changing the encoding of the output data. As it stands, the data is output in Windows text. It turns out that because of a bug, changing the output stream to UTF8 or UTF16 will crash the Windows command line. Therefore, some data segments will contain various unintelligible symbols.

How it works

I chose c++ for this project as it has an extremely robust stream system which makes it easy to run through the file stream and gather data.

Every conversation is given a unique ID. This ID is generated from two IPs involved in the conversation and the two ports used by each system which are then placed into a pairing function. The specific method used is a bitwise Z-order curve which will take two non-negative numbers and output a unique value. I used this method and not Georg Cantor's more well known pairing function as the Cantor paring function is quadratic, and therefore two numbers, each of size N, are not always bound to a number less than 2N. This unique ID made storing and sorting packets an easy process; if an ID for a packet was present in the list of already stored ID's, the packet was added to that conversation. If the ID is unique, a new conversation is initialized with this packet as the first member.

The data for every packet is not stored individually. I chose to do this as I was afraid that many malloc() calls would invariably lead to memory leakage and a static array storing each packet would also waste a considerable amount of space. Instead, malloc()ed one large block of memory. When a packet is read in, its data is added to the memory block and an integer value corresponding to that data's location in the memory block is stored. For example, the first packet processed will always have a data offset value of 0. If this packet is 100 bytes the data offset value will be incremented by 100 and this will be the next packet's data offset value. This makes the data structure of the packets extremely small while it is a simple function of pointer arithmatic to recover their data which is stored in the main data memory block.

To sort the connections I am using a pretty straightforwards bubble sort algorithm. Bubble sort works very well on nearly sorted data sets which is useful since the packets will almost always be in the correct sequence order.

Also, if you are wondering why there are so many bitwise comparisons between what seem like random integers, these are checking flags set by either the user of the IP header. One of my professors was not a fan of preprocessor directives and I guess that stuck with me.

IPv6 support has not been added yet, although it should not be too hard of a process to implement. IPv6 packets are often encapsulated under IPv4 so as to path through IPv4 only networks. This process is called a 6to4 transmission. If the packet has reached the end system, the encapsulation will be dropped. If not, the IPv6 packet will remain encapsulated with a protocol of type 41. If you are interested in examining this feature yourself, I suggest using structure templates to sort between IPv4 packets, IPv6 packets, and encapsulated IPv6 packets.

About

Utility that can be used to reassemble TCP/IP packets and detect anomalous behavior.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages