Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Freed memory is being used #1110

Open
ATDOR-RIH opened this issue Feb 27, 2024 · 28 comments
Open

[BUG] Freed memory is being used #1110

ATDOR-RIH opened this issue Feb 27, 2024 · 28 comments
Assignees
Labels
bug Something isn't working

Comments

@ATDOR-RIH
Copy link

Describe the bug
This bug releates to #570 which was solved in #707.

The bug was fixed by clearing pPassQueued and pPassAccept.
By calling

vSocketCloseNextTime( pxSocket );

and
vSocketCloseNextTime( NULL );

the socket gets freed, however the parent socket might still have a reference to the socket.
When the user task calls FreeRTOS_accept it will check if the pPassAccept bit is set. If not FreeRTOS_accept will return NULL and everything is fine.
But the memory is freed already, so there is no guarantee that pPassAccept is still cleared.

Target

  • Development board: [e.g. HiFive11 RevB]
  • Instruction Set Architecture: ARM CortexA7
  • IDE and version: Eclipse + CMake
  • Toolchain and version: arm-none-eabi-gcc 10.3.1

Host

  • Host OS: Ubuntu
  • Version: 18.04

To Reproduce
The device acts as web server and the error happened by continuously refreshing the web site using https.
It did not happen when using http.

Expected behavior
Do not leave references to freed memory.

Screenshots
None

Wireshark logs
None

Additional context
Added log lines:

logll_debug is a function which writes the given text to a serial interface which is connected to the computer used for debugging.
Format of the log line: (file name provided by macro BASE_FILE_NAME);(log level 1 in case of debug);(monotonic time in ms);(log message)

Log lines where the described bug happens:
Creation of socket.
D:FreeRTOS_Sockets.c;1;38081;FreeRTOS_socket 0x7035c548
D:FreeRTOS_TCP_IP.c;1;38113;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70284130 0x7035c548 0x70284130
D:FreeRTOS_TCP_IP.c;1;38113;xParent->u.xTCP.pxPeerSocket = pxSocket; 0x70284130 0x7035c548

IP-Task is closing the socket.
D:FreeRTOS_TCP_IP.c;1;38130;vSocketCloseNextTime set xSocketToClose 0x0 0x7035c548
D:FreeRTOS_Sockets.c;1;38130;vSocketClose 0x7035c548
D:FreeRTOS_TCP_IP.c;1;38130;vSocketCloseNextTime 0x7035c548 0x0

Call FreeRTOS_accept from user task. The memory was not overwritten yet, so prvAcceptWaitClient returns NULL.
D:FreeRTOS_Sockets.c;1;38706;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x70284130 0x7035c548

Full logs
logs.txt

Config
ipconfig.txt

@ATDOR-RIH ATDOR-RIH added the bug Something isn't working label Feb 27, 2024
@kstribrnAmzn
Copy link
Member

Thanks for raising this bug. Tagging @shubnil and @tony-josi-aws for review

@tony-josi-aws
Copy link
Member

@ATDOR-RIH
Thanks for the detailed bug report.
Wondering if you would be able to help with testing if we provided you with a patch for this issue.

@ATDOR-RIH
Copy link
Author

@tony-josi-aws
Yes, I can help with testing.

@tony-josi-aws
Copy link
Member

@ATDOR-RIH Thanks

In your project, is the macro ipconfigTCP_HANG_PROTECTION kept at its default definition?

If so, could you please define it as 0?:
#define ipconfigTCP_HANG_PROTECTION ( 0 )
in your FreeRTOSIPConfig.h and try to reproduce the issue?

@ATDOR-RIH
Copy link
Author

I tried it with #define ipconfigTCP_HANG_PROTECTION ( 0 ).
It is different but there is still an access to freed memory.

D:FreeRTOS_Sockets.c;1;173709;FreeRTOS_socket 0x702b5a60
D:FreeRTOS_TCP_IP.c;1;173722;vSocketCloseNextTime set xSocketToClose 0x0 0x702b5a60
D:FreeRTOS_Sockets.c;1;173722;vSocketClose 0x702b5a60
D:FreeRTOS_TCP_IP.c;1;173722;vSocketCloseNextTime 0x702b5a60 0x0
...
D:FreeRTOS_Sockets.c;1;173963;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x70283b30 0x702b5a60

@tony-josi-aws
Copy link
Member

tony-josi-aws commented Mar 7, 2024

@ATDOR-RIH,

Just wondering how its verified that 0x702b5a60 is a freed pointer that is being used.
Its normal to have the malloc return the same pointer that was previously released while a new socket is being created.

I would try associating each socket creation with a random ID when created and then comparing the IDs when the socket pointers happen to match.

@ATDOR-RIH
Copy link
Author

ATDOR-RIH commented Mar 7, 2024

@tony-josi-aws
I did what you suggested but instead of a random ID I used the system time of when the socket was created. Therefore I added ulCreationTime to xSOCKET.
`struct xSOCKET
{
EventBits_t xEventBits; /< The eventbits to keep track of events. */
EventGroupHandle_t xEventGroup; /
< The event group for this socket. */

/* Most compilers do like bit-flags */
struct
{
    uint32_t bIsIPv6 : 1; /**< Non-zero in case the connection is using IPv6. */
    uint32_t bSomeFlag : 1;
}
bits;

ListItem_t xBoundSocketListItem;       /**< Used to reference the socket from a bound sockets list. */
TickType_t xReceiveBlockTime;          /**< if recv[to] is called while no data is available, wait this amount of time. Unit in clock-ticks */
TickType_t xSendBlockTime;             /**< if send[to] is called while there is not enough space to send, wait this amount of time. Unit in clock-ticks */

IP_Address_t xLocalAddress;            /**< Local IP address */
uint16_t usLocalPort;                  /**< Local port on this machine */
uint8_t ucSocketOptions;               /**< Socket options */
uint8_t ucProtocol;                    /**< choice of FREERTOS_IPPROTO_UDP/TCP */
uint64_t ulCreationTime;
#if ( ipconfigSOCKET_HAS_USER_SEMAPHORE == 1 )
    SemaphoreHandle_t pxUserSemaphore; /**< The user semaphore */
#endif /* ipconfigSOCKET_HAS_USER_SEMAPHORE */
#if ( ipconfigSOCKET_HAS_USER_WAKE_CALLBACK == 1 )
    SocketWakeupCallback_t pxUserWakeCallback; /**< Pointer to the callback function. */
#endif /* ipconfigSOCKET_HAS_USER_WAKE_CALLBACK */

#if ( ipconfigSUPPORT_SELECT_FUNCTION == 1 )
    struct xSOCKET_SET * pxSocketSet; /**< Pointer to the socket set structure */
    EventBits_t xSelectBits;          /**< User may indicate which bits are interesting for this socket. */

    EventBits_t xSocketBits;          /**< These bits indicate the events which have actually occurred.
                                       * They are maintained by the IP-task */
#endif /* ipconfigSUPPORT_SELECT_FUNCTION */
struct xNetworkEndPoint * pxEndPoint; /**< The end-point to which the socket is bound. */

/* This field is only only by the user, and can be accessed with
 * vSocketSetSocketID() / vSocketGetSocketID().
 * All fields of a socket will be cleared by memset() in FreeRTOS_socket().
 */
void * pvSocketID;

/* TCP/UDP specific fields: */
/* Before accessing any member of this structure, it should be confirmed */
/* that the protocol corresponds with the type of structure */

union
{
    IPUDPSocket_t xUDP;     /**< Union member: UDP socket*/
    #if ( ipconfigUSE_TCP == 1 )
        IPTCPSocket_t xTCP; /**< Union member: TCP socket */
    #endif /* ipconfigUSE_TCP */
}
u; /**< Union of TCP/UDP socket */

};`

ulCreationTime is set in the function FreeRTOS_socket.
pxSocket->ulCreationTime = monotonicClock_get();

pxSocket->ucProtocol = ( uint8_t ) xProtocolCpy; /* protocol: UDP or TCP */

I then added ulCreationTime to the logs I mentioned in the description above.
logll_debug( BASE_FILE_NAME, "FreeRTOS_socket 0x%x %llu", xReturn, xReturn->ulCreationTime );
logll_debug( BASE_FILE_NAME, "vSocketClose 0x%x %llu", pxSocket, pxSocket->ulCreationTime );
logll_debug( BASE_FILE_NAME, "pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x%x 0x%x %llu 0x%x %llu", pxClientSocket, pxParentSocket, pxParentSocket->ulCreationTime, pxParentSocket->u.xTCP.pxPeerSocket, pxParentSocket->u.xTCP.pxPeerSocket->ulCreationTime );

First I tried it with ipconfigTCP_HANG_PROTECTION set to 1. (log_ipconfigTCP_HANG_PROTECTION_1.txt)

D:FreeRTOS_Sockets.c;1;47391;FreeRTOS_socket 0x7030cdc8 47391
D:FreeRTOS_TCP_IP.c;1;47407;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349180 0x7030cdc8 47391 0x70349180 2361
D:FreeRTOS_TCP_IP.c;1;47407;xParent->u.xTCP.pxPeerSocket = pxSocket; 0x70349180 2361 0x7030cdc8 47391
D:FreeRTOS_TCP_IP.c;1;47424;vSocketCloseNextTime set xSocketToClose 0x0 0x7030cdc8
D:FreeRTOS_Sockets.c;1;47426;vSocketClose 0x7030cdc8 47391
D:FreeRTOS_TCP_IP.c;1;47426;vSocketCloseNextTime 0x7030cdc8 0x0
...
D:FreeRTOS_Sockets.c;1;48123;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x70349180 2361 0x7030cdc8 0

You can see that in the log line pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x70349180 2361 0x7030cdc8 0 ulCreationTime is 0. I assume that the memory was already taken by something else.

I removed vPortFreeSocket( pxSocket ); from the function vSocketClose and tried again. (log_ipconfigTCP_HANG_PROTECTION_1_do_not_free_memory.txt)

vPortFreeSocket( pxSocket );

D:FreeRTOS_Sockets.c;1;31698;FreeRTOS_socket 0x7033dab8 31698
D:FreeRTOS_TCP_IP.c;1;31704;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349c20 0x7033dab8 31698 0x70349c20 2360
D:FreeRTOS_TCP_IP.c;1;31704;xParent->u.xTCP.pxPeerSocket = pxSocket; 0x70349c20 2360 0x7033dab8 31698
D:FreeRTOS_TCP_IP.c;1;31720;vSocketCloseNextTime set xSocketToClose 0x0 0x7033dab8
D:FreeRTOS_Sockets.c;1;31720;vSocketClose 0x7033dab8 31698
...
D:FreeRTOS_Sockets.c;1;33358;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x70349c20 2360 0x7033dab8 31698

Then I tried it with ipconfigTCP_HANG_PROTECTION set to 0. (log_ipconfigTCP_HANG_PROTECTION_0_do_not_free_memory.txt)

D:FreeRTOS_Sockets.c;1;210483;FreeRTOS_socket 0x7034eca8 210483
D:FreeRTOS_TCP_IP.c;1;210498;vSocketCloseNextTime set xSocketToClose 0x0 0x7034eca8
D:FreeRTOS_Sockets.c;1;210498;vSocketClose 0x7034eca8 210483
D:FreeRTOS_TCP_IP.c;1;210498;vSocketCloseNextTime 0x7034eca8 0x0
...
D:FreeRTOS_Sockets.c;1;215348;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x7027d298 2371 0x7034eca8 210483

Summary
The socket is being used after it was closed, even if the memory of the socket was not freed. So it cannot be a new socket.

log_ipconfigTCP_HANG_PROTECTION_1.txt
log_ipconfigTCP_HANG_PROTECTION_1_do_not_free_memory.txt
log_ipconfigTCP_HANG_PROTECTION_0_do_not_free_memory.txt

@shubnil
Copy link
Member

shubnil commented Mar 21, 2024

Hi @ATDOR-RIH,
We were analyzing the logs and have some observations:

Data:

Cycle 1:
FreeRTOS_Sockets.c;1;31698;FreeRTOS_socket 0x7033dab8 31698
FreeRTOS_TCP_IP.c;1;31704;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349c20 0x7033dab8 31698 0x70349c20 2360
FreeRTOS_TCP_IP.c;1;31704;xParent->u.xTCP.pxPeerSocket = pxSocket; 0x70349c20 2360 0x7033dab8 31698
FreeRTOS_TCP_IP.c;1;31720;vSocketCloseNextTime set xSocketToClose 0x0 0x7033dab8
FreeRTOS_Sockets.c;1;31720;vSocketClose 0x7033dab8 31698
FreeRTOS_TCP_IP.c;1;31720;vSocketCloseNextTime 0x7033dab8 0x0

Cycle 2:
FreeRTOS_Sockets.c;1;31853;FreeRTOS_socket 0x702f3cb8 31853
FreeRTOS_TCP_IP.c;1;31859;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349c20 0x702f3cb8 31853 0x70349c20 2360
FreeRTOS_TCP_IP.c;1;31869;vSocketCloseNextTime set xSocketToClose 0x0 0x702f3cb8
FreeRTOS_Sockets.c;1;31869;vSocketClose 0x702f3cb8 31853
FreeRTOS_TCP_IP.c;1;31869;vSocketCloseNextTime 0x702f3cb8 0x0

Cycle 3:
FreeRTOS_Sockets.c;1;32004;FreeRTOS_socket 0x7025edf0 32004
FreeRTOS_TCP_IP.c;1;32010;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349c20 0x7025edf0 32004 0x70349c20 2360
FreeRTOS_TCP_IP.c;1;32020;vSocketCloseNextTime set xSocketToClose 0x0 0x7025edf0
FreeRTOS_Sockets.c;1;32020;vSocketClose 0x7025edf0 32004
FreeRTOS_TCP_IP.c;1;32020;vSocketCloseNextTime 0x7025edf0 0x0

Cycle 4:
FreeRTOS_Sockets.c;1;32214;FreeRTOS_socket 0x702843d0 32214
FreeRTOS_TCP_IP.c;1;32220;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349c20 0x702843d0 32214 0x70349c20 2360
FreeRTOS_TCP_IP.c;1;32231;vSocketCloseNextTime set xSocketToClose 0x0 0x702843d0
FreeRTOS_Sockets.c;1;32231;vSocketClose 0x702843d0 32214
FreeRTOS_TCP_IP.c;1;32231;vSocketCloseNextTime 0x702843d0 0x0

Cycle 5:
FreeRTOS_Sockets.c;1;32779;FreeRTOS_socket 0x7033ace0 32779
FreeRTOS_TCP_IP.c;1;32791;xParent = pxSocket->u.xTCP.pxPeerSocket; 0x70349c20 0x7033ace0 32779 0x70349c20 2360
FreeRTOS_Sockets.c;1;33358;FreeRTOS_closesocket - 0x7027d2d8 31569
FreeRTOS_Sockets.c;1;33358;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x70349c20 2360 0x7033dab8 31698
FreeRTOS_Sockets.c;1;33358;pxClientSocket = NULL;

Observation 1:
The socket buffer with address "0x7033dab8" seem to appear with an older timestamp. Now this can be either because of one of the following reasons:

  1. Stale buffer is still present
  2. The "ulCreationTime" field is not cleared properly which can happen if the print statement is before the socket memory is memset to 0.

There is less possibility for a to happen because of the following prints:
FreeRTOS_Sockets.c;1;31720;vSocketClose 0x7033dab8 31698
FreeRTOS_TCP_IP.c;1;31720;vSocketCloseNextTime 0x7033dab8 0x0

Can you please confirm if point no b is not the case.

Observation 2:
Sometimes there seems to be a late close call from the application like the print "FreeRTOS_Sockets.c;1;33358;FreeRTOS_closesocket - 0x7027d2d8 31569"
Can you please ensure if the application has the right priority to maintain a steady state between opening and closing of sockets.

Will it be ok for you to share the code with the above prints. This will help us better co-relate the print sequence and the code sequence.

Thanks,
Shub

@htibosch
Copy link
Contributor

I would like to propose the following change in vTCPStateChange():

 FreeRTOS_printf( ( "vTCPStateChange: Closing socket\n" ) );
 
 if( pxSocket->u.xTCP.bits.bReuseSocket == pdFALSE_UNSIGNED )
 {
     configASSERT( xIsCallingFromIPTask() != pdFALSE );
+ 
+    if( ( xParent != NULL ) &&
+        ( xParent->u.xTCP.pxPeerSocket == pxSocket ) )
+    {
+        xParent->u.xTCP.pxPeerSocket = NULL;
+    }
 
     vSocketCloseNextTime( pxSocket );
 }

So we'll clear the field pxPeerSocket of the parent socket before the child socket is deleted.

@ATDOR-RIH
Copy link
Author

Hi @shubnil and @htibosch

attached you will find the following files:

  • changes.patch: Patch file with the changes I made based on V4.0.0. (relative path: source)
  • changes_suggestion_htibosch.patch: Patch file with the changes suggested by @htibosch.
  • logs_suggestion_htibosch.txt: Logs with the changes suggested by @htibosch. (Only a small snippet.)

changes.patch
changes_suggestion_htibosch.patch
logs_suggestion_htibosch.txt

The suggested changes do not solve the issue.
D:FreeRTOS_TCP_IP.c;1;658742;vSocketCloseNextTime xParent 0x7027b4d8 pxSocket 0x7027b4d8 0x0
D:FreeRTOS_TCP_IP.c;1;658742;vSocketCloseNextTime set xSocketToClose 0x0 0x7027b4d8
D:FreeRTOS_Sockets.c;1;658742;vSocketClose 0x7027b4d8 658696
D:FreeRTOS_TCP_IP.c;1;658742;vSocketCloseNextTime 0x7027b4d8 0x0
...
D:FreeRTOS_Sockets.c;1;660547;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x702f5068 2247 0x7027b4d8 658696

I added this log "vSocketCloseNextTime xParent 0x7027b4d8 pxSocket 0x7027b4d8 0x0" right before the suggested changes.
Here you can see that xParent is the same as pxSocket and pxSocket->u.xTCP.pxPeerSocket is NULL.

Regarding priorities:
The IP-Task has the same priority as the client tasks.

@htibosch
Copy link
Contributor

@ATDOR-RIH wrote:

Regarding priorities:
The IP-Task has the same priority as the client tasks.

Normally we recommend to use these priorities:

  • highest: the EMAC / Ethernet task
  • high: the IP-task with ipconfigIP_TASK_PRIORITY
  • normal: all tasks that make use of TCP/IP

Tasks that have nothing to do with TCP/IP are free to choose a priority.

I will comment pxPeerSocket in a later post.

@ATDOR-RIH
Copy link
Author

Here are the extended logs:
logs_suggestion_htibosch_extended.txt

At timestamp 642815 the socket (created at timestamp 639023) is assigned as peer socket of the parent.

Is it possible to set the peer socket of pxFound in xTCPCheckNewClient or is this a bad idea?
if( ( pxFound->ucProtocol == ( uint8_t ) FREERTOS_IPPROTO_TCP ) && ( pxFound->u.xTCP.bits.bPassAccept != pdFALSE_UNSIGNED ) )
{
logll_debug( BASE_FILE_NAME, "pxSocket->u.xTCP.pxPeerSocket = pxFound; 0x%x %llu 0x%x %llu", pxSocket,
pxSocket->ulCreationTime, pxFound, pxFound->ulCreationTime );
pxSocket->u.xTCP.pxPeerSocket = pxFound;
pxFound->u.xTCP.pxPeerSocket = pxSocket;
FreeRTOS_debug_printf( ( "xTCPCheckNewClient[0]: client on port %u\n", pxSocket->usLocalPort ) );
xResult = pdTRUE;
break;
}

@htibosch
Copy link
Contributor

htibosch commented Mar 26, 2024

When I look at your latest logs, I see lots of calls to FreeRTOS_socket(), and all those sockets end up being closed in vSocketCloseNextTime()?
That function is only called as long as a socket has not been passed to the application.
Can you see for what reason vSocketCloseNextTime() is called? Does it reach a timeout? Does it get a bad response (a RST) from the peer?
Could you create and send a PCAP with a few TCP connections? Please zip and attach it.

Another thing: could you run the test while the application has a lower priority than ipconfigIP_TASK_PRIORITY?

@shubnil
Copy link
Member

shubnil commented Mar 27, 2024

Hi @ATDOR-RIH ,
Please check once if the application priority is the lowest. A large number of "FreeRTOS_socket()" call seem to point to the fact that the application priority is not the lowest in the system.

@ATDOR-RIH
Copy link
Author

I think vSocketCloseNextTime() is called because of a RST, but I am not sure.

Changes:

  • Added logs at every call to vTCPStateChange with eCLOSED or eCLOSE_WAIT.
  • Added pxParentSocket to struct xSOCKET and log it when vSocketCloseNextTime is called.

pcap_and_changed_priorities.zip

pcap_and_changed_priorities.zip contains the following folders and files:

  • original_priorities: Logs and Pcap file of a run with the priorities as they were.
  • changed_priorities: Logs and Pcap file of a run with changed priorities.
  • changes.patch: changes based from version 4.0.0
  • changes_from_last_time.patch: changes based from the last patch I sent

folder contents:

  • log.txt
  • original_priorities.pcap/changed_priorities.pcap
  • info.txt: Timestamps and used priorities.

Changing the priorities did not help.
I tried to run the test for around 90 seconds.
With the changed priorities I could not make it past 70 seconds without receiving a hard fault/bus fault.
I also get hard faults with the original priorities but not that frequently.

@htibosch
Copy link
Contributor

Thanks for these tests. This gives a new insight: the client (browser) manages to get a connection, and then immediately it is closing it. This lasts like 15 ms.

Here is an example:

image

So it's not a timeout, it is not cause by an unexpected RST packet. The client is closing the connection immediately for some reason.

I will think about this and come back to it.

@htibosch
Copy link
Contributor

htibosch commented Mar 31, 2024

I made a simulation and get the same pattern of events:

image

When debug printf enabled, I see this logging:

/* Server side: */
       IP   SS  MS  US
x.x.x.104   47.746.997 [IP-Task] Gain: Socket 23 now has 1 / 4 child
x.x.x.104   47.747.053 [IP-Task] prvSocketSetMSS: 1460 bytes for 192.168.2.107 port 50709
x.x.x.104   47.747.125 [IP-Task] prvWinScaleFactor: uxRxWinSize 1 MSS 1460 Factor 0
x.x.x.104   47.747.673 [IP-Task] TCP: passive 23 => 107.2.168.192 port 50709 set ESTAB (scaling 1)
x.x.x.104   47.747.785 [SvrWork] xTelnetRead: new client from 192.168.2.107:50709
x.x.x.104   47.748.433 [SvrWork] xTelnetRead: client 20014ad0 disconnected (rc -128)
x.x.x.104   47.748.495 [IP-Task] Lost: Socket 23 now has 0 / 4 children
x.x.x.104   47.748.578 [IP-Task] FreeRTOS_closesocket[0.0.0.0 port 23 to 192.168.2.107 port 50709]: buffers 14 socks 1

/* Clients side: */
x.x.x.107     78.643.976 [SvrWork] FreeRTOS_connect: 50709 to 192.168.2.104:23
x.x.x.107     78.644.059 [SvrWork] Socket 50709 -> [192.168.2.104]:23 State eCLOSED->eCONNECT_SYN
x.x.x.107     78.644.174 [IP-task] prvSocketSetMSS: 1460 bytes for 192.168.2.104 port 23
x.x.x.107     78.644.255 [IP-task] prvWinScaleFactor: uxRxWinSize 1 MSS 1460 Factor 0
x.x.x.107     78.644.347 [IP-task] Connect[192.168.2.104:23]: next timeout 1: 3000 ms
x.x.x.107     78.644.714 [IP-task] TCP: active 50709 => 104.2.168.192 port 23 set ESTAB (scaling 1)
x.x.x.107     78.644.927 [IP-task] Socket 50709 -> [192.168.2.104]:23 State eCONNECT_SYN->eESTABLISHED
x.x.x.107     78.645.157 [SvrWork] FreeRTOS_connect returns 0
x.x.x.107     78.645.251 [IP-task] Socket 50709 -> [192.168.2.104]:23 State eESTABLISHED->eFIN_WAIT_1
x.x.x.107     78.645.582 [IP-task] vTCPStateChange: Closing socket (Queued 0, Accept 0 Reuse 0)
x.x.x.107     78.645.662 [IP-task] Socket 50709 -> [192.168.2.104]:23 State eFIN_WAIT_1->eCLOSE_WAIT
x.x.x.107     78.645.994 [IP-task] FreeRTOS_closesocket[0.0.0.0 port 50709 to 192.168.2.104 port 23]: buffers 12 socks 3

I made some extra logging in FreeRTOS_TCP_IP.c: vTCPStateChange():

if( ( eTCPState == eCLOSED ) ||
    ( eTCPState == eCLOSE_WAIT ) )
{
    /* Socket goes to status eCLOSED because of a RST.
     * When nobody owns the socket yet, delete it. */
    FreeRTOS_printf( ( "vTCPStateChange: Closing socket (Queued %d, Accept %d Reuse %d)\n",
        pxSocket->u.xTCP.bits.bPassQueued,
        pxSocket->u.xTCP.bits.bPassAccept,
        pxSocket->u.xTCP.bits.bReuseSocket ) );
    ...
}

Would you mind to send me the same type of logging, i.e. by enabling the following in your FreeRTOSIPConfig.h :

/* Your printing routine that doesn't need Ethernet. */
extern void logll_debug( const char *pcFormatString, ... );

#define ipconfigHAS_DEBUG_PRINTF    1

#if( ipconfigHAS_DEBUG_PRINTF == 1 )
    #define FreeRTOS_debug_printf( X )    ( void ) logll_debug X
#endif

#define ipconfigHAS_PRINTF            1

#if( ipconfigHAS_PRINTF == 1 )
    #define FreeRTOS_printf(X)            ( void ) logll_debug X
#endif

I can not replicate the problem. It would help to understand the order of events.
Who is calling vTCPStateChange() with the parameter eCLOSED or eCLOSE_WAIT?

EDIT : I'm one step further: when I delay the call to FreeRTOS_accept(), the IP-task will attempt to close the socket.

@htibosch
Copy link
Contributor

htibosch commented Apr 1, 2024

Hi @ATDOR-RIH, after attempts to reproduce your problem, here is a possible solution:

Source_hein.zip

Please merge them into you repo using a source compare program.

It looks like it is possible that the field pxPeerSocket is still set when the application calls FreeRTOS_accept().

EDIT whereas pxPeerSocket was already released by FreeRTOS_closesocket().

I'm not claiming that all is solved, but I would be very curious if you can show the logging produced by the TCP/IP stack.

@ATDOR-RIH
Copy link
Author

ATDOR-RIH commented Apr 2, 2024

Hi @htibosch, logs.zip contains three different logs.

  • additional_logs: Here I just activated the logs you suggested in the previous comment. I also removed some of the logs I added.
  • changes_from_Source_hein: Here I applied the changes from Source_hein.zip. I am working on 4.0.0 and I think the files you provided are based on main. I compared the changes to main and copied them to the version I am currently on. The changed files are also in the folder. I hope I did not forget anything. I got a bus fault with this version.
  • change_condition: Here I changed a condition in FreeRTOS_TCP_IP.c (file is included). This one also resulted in a bus fault.

logs.zip

@htibosch
Copy link
Contributor

htibosch commented Apr 5, 2024

There are too many differences between your and my source code. Please find my complete directory "source":

complete_sources_hein.zip

except for the directory "portable".

I tested this version with real hardware, and the problem looks "solved".
Could you please try it and send me the logging output?

Thanks for you patience and corporation, Hein

@ATDOR-RIH
Copy link
Author

Hi @htibosch, sorry for the delay.

There was no fault but freed memory is still being accessed. But it seems to happen a lot less.

Here are the logs of three runs as well as the source files I used. I kept the logs I added but otherwise it should be the same as your files.

logs.zip

@htibosch
Copy link
Contributor

Can you please point out where in the logfile you can see that freed memory is being accessed?
Which statement/function?

@ATDOR-RIH
Copy link
Author

This is from log1.txt. It is similar for the other log files.

Here 0x703c7018 is set to be closed the next time in FreeRTOS_TCP_IP.c vTCPStateChange(FreeRTOS_Socket_t * pxSocket, enum eTCP_STATE eTCPState). 0x702e8920 is the parent of this socket.

D:FreeRTOS_TCP_IP.c;1;241165;vSocketCloseNextTime xParent 0x703c7018 pxSocket 0x703c7018 0x0 0x702e8920

Here the socket is closed.

D:FreeRTOS_Sockets.c;1;241165;vSocketClose 0x703c7018 240696

Here 0x703c7018 is still the peer socket of 0x702e8920 and is accessed here: if( pxClientSocket->u.xTCP.bits.bPassAccept != pdFALSE_UNSIGNED ) in FreeRTOS_Sockets.c prvAcceptWaitClient(FreeRTOS_Socket_t * pxParentSocket, struct freertos_sockaddr * pxAddress, socklen_t * pxAddressLength)

D:FreeRTOS_Sockets.c;1;243123;pxClientSocket = pxParentSocket->u.xTCP.pxPeerSocket; 0x0 0x702e8920 2368 0x703c7018 240696

pxClientSocket (0x703c7018) is already free when accessing pxClientSocket->u.xTCP.bits.bPassAccept. In this case the memory is still free and still has the correct value, but there is no guarantee that it does not get allocated and overwritten.

@htibosch
Copy link
Contributor

How strange, both with your source and my sources, I can not replicate the problem you mention.

As you know:

  • for a TCP server, pxPeerSocket points to a TCP client
  • for a TCP client, pxPeerSocket points to its TCP server

In your case, ``server->pxPeerSocket` doesn't get cleared.

Please find a proposal to further investigate this: FreeRTOS_TCP_IP_hein.zip

I added a function:

BaseType_t vTCPRemoveTCPChild( const FreeRTOS_Socket_t * pxChildSocket )

It will iterate through all sockets to find a socket the is the parent of pxChildSocket. When found, the reference will be clear ( set to NULL ).

@ATDOR-RIH
Copy link
Author

Hi @htibosch,
yes with this change it is working.

log.txt

server->pxPeerSocket doesn't get cleared because client->pxPeerSocket is NULL at some point when vSocketCloseNextTime is called.
Log: vSocketCloseNextTime xParent
In this log line you can see that client->pxPeerSocket is NULL but client->pxParentSocket has the the address of the server.
But I don't know why client->pxPeerSocket is NULL.
I thought it was because of this:

/* Don't need to access the parent socket anymore, so the
* reference 'pxPeerSocket' may be cleared. */
pxSocket->u.xTCP.pxPeerSocket = NULL;

But this is already removed in the source I am using and it is still happening.

I will try to find out where client->pxPeerSocket is set to NULL or if it is not set at all. But this might take some time.

@htibosch
Copy link
Contributor

htibosch commented Apr 20, 2024

That is good news!
EDIT
Unfortunately I do not have the time to try to reproduce it. Maybe you should also focus on the application code that uses the library.
If you have a new finding, please report here, and we can change the above proposals into a PR. Thanks so far, Hein

@htibosch
Copy link
Contributor

Could it be that you do not have ipconfigTCP_HANG_PROTECTION defined?

In that case, see FreeRTOS_TCP_State_Handling.c, around line 995 :

    BaseType_t prvTCPSocketCopy( FreeRTOS_Socket_t * pxNewSocket,
                                 FreeRTOS_Socket_t * pxSocket )
    {
    ...
    #if ( ipconfigTCP_HANG_PROTECTION == 1 )
        {
            /* Only when there is anti-hanging protection, a socket may become an
             * orphan temporarily.  Once this socket is really connected, the owner of
             * the server socket will be notified. */

            /* When bPassQueued is true, the socket is an orphan until it gets
             * connected. */
            pxNewSocket->u.xTCP.bits.bPassQueued = pdTRUE_UNSIGNED;
            pxNewSocket->u.xTCP.pxPeerSocket = pxSocket;
        }
        #else
        {
            /* A reference to the new socket may be stored and the socket is marked
             * as 'passable'. */

            /* When bPassAccept is true, this socket may be returned in a call to
             * accept(). */
            pxNewSocket->u.xTCP.bits.bPassAccept = pdTRUE_UNSIGNED;
+            /* Assign the parent 'pxSocket' to the child socket. */
+            pxNewSocket->u.xTCP.pxPeerSocket = pxSocket;

            if( pxSocket->u.xTCP.pxPeerSocket == NULL )
            {
                pxSocket->u.xTCP.pxPeerSocket = pxNewSocket;
            }
        }
        #endif /* if ( ipconfigTCP_HANG_PROTECTION == 1 ) */

Could you test the above patch while disabling the new function vTCPRemoveTCPChild()? This only makes sense if you have ipconfigTCP_HANG_PROTECTION = 0.

Thanks

@ATDOR-RIH
Copy link
Author

Hi @htibosch,

yes you are right. ipconfigTCP_HANG_PROTECTION is 0.
I even tested with with ipconfigTCP_HANG_PROTECTION set to 0 and 1 on March 7 and then forgot about it somewhen.

logs.zip

logs.zip contains logs with ipconfigTCP_HANG_PROTECTION set to 1 for different versions.

  • log_log_disabled_vTCPRemoveTCPChild.txt: The changes you provided on April 15 with vTCPRemoveTCPChild() disabled.
  • log_complete_sources_hein.txt: The version you provided on April 5.
  • log_sources_hein.txt: The changes you provided on April 1.
  • log_march_26.txt: The version I used before getting your changes, where I changed the priorities according to your suggestion on March 26.

With the version from March 26 the problem still exists with ipconfigTCP_HANG_PROTECTION set to 1.
Since the changes from April 1 the problem is solved, if ipconfigTCP_HANG_PROTECTION is set to 1.

Thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants