{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":127325712,"defaultBranch":"master","name":"linux","ownerLogin":"Netflix-Skunkworks","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2018-03-29T17:35:09.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/1728142?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1711996593.0","currentOid":""},"activityList":{"items":[{"before":"fbf1c7cfb7898d83d61527c4f43cf5c64ef22f69","after":"3b96fcc60b0aea671a857679049413aa65b66e7d","ref":"refs/heads/hli/fix_tcp_window","pushedAt":"2024-04-08T21:56:29.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: increase the default TCP scaling ratio\n\nAfter commit dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nwe noticed an application-level timeout due to reduced throughput.\n\nBefore the commit, for a client that sets SO_RCVBUF to 65k, it takes\naround 22 seconds to transfer 10M data. After the commit, it takes 40\nseconds. Because our application has a 30-second timeout, this\nregression broke the application.\n\nThe reason that it takes longer to transfer data is that\ntp->scaling_ratio is initialized to a value that results in ~0.25 of\nrcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which\ntranslates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k\ninitial receive window.\n\nLater, even though the scaling_ratio is updated to a more accurate\nskb->len/skb->truesize, which is ~0.66 in our environment, the window\nstays at ~0.25 * rcvbuf. This is because tp->window_clamp does not\nchange together with the tp->scaling_ratio update. As a result, the\nwindow size is capped at the initial window_clamp, which is also ~0.25 *\nrcvbuf, and never grows bigger.\n\nThis patch increases the initial scaling_ratio from ~25% to 50% in order\nto be backward compatible with the original sysctl_tcp_adv_win_scale\nsysctl.\n\nFixes: dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\")\nSigned-off-by: Hechao Li \nReviewed-by: Tycho Andersen ","shortMessageHtmlLink":"tcp: increase the default TCP scaling ratio"}},{"before":"7821e49df1ddeaa78c773da3120e445f0c504631","after":"fbf1c7cfb7898d83d61527c4f43cf5c64ef22f69","ref":"refs/heads/hli/fix_tcp_window","pushedAt":"2024-04-02T00:55:33.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: update window_clamp together with scaling_ratio\n\nAfter commit dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nwe noticed an application-level timeout due to reduced throughput. This\ncan be reproduced by the following minimal client and server program.\n\nserver:\n\nint main(int argc, char *argv[]) {\n int sockfd;\n char buffer[256];\n struct sockaddr_in srv_addr;\n\n // Create socket\n sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);\n if (sockfd < 0) {\n perror(\"server: socket()\");\n return -1;\n }\n bzero((char *) &srv_addr, sizeof(srv_addr));\n srv_addr.sin_family = AF_INET;\n srv_addr.sin_addr.s_addr = htonl(INADDR_ANY);\n srv_addr.sin_port = htons(8080);\n // Bind socket\n if (bind(sockfd, (struct sockaddr *) &srv_addr,\n\t sizeof(srv_addr)) < 0) {\n perror(\"server: bind()\");\n close(sockfd);\n return -1;\n }\n // Listen for connections\n listen(sockfd,5);\n\n while(1) {\n int filefd = -1, newsockfd = -1;\n struct sockaddr_in cli_addr;\n socklen_t cli_len = sizeof(cli_addr);\n\n // Accept connection\n newsockfd = accept(sockfd, (struct sockaddr *)&cli_addr, &cli_len);\n if (newsockfd < 0) {\n perror(\"server: accept()\");\n goto end;\n }\n // Read filename from client\n bzero(buffer, sizeof(buffer));\n ssize_t n = read(newsockfd,buffer,sizeof(buffer)-1);\n if (n < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Open file\n filefd = open(buffer, O_RDONLY);\n if (filefd < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Get file size\n struct stat file_stat;\n if(fstat(filefd, &file_stat) < 0) {\n perror(\"server: fstat()\");\n goto end;\n }\n // Send file\n off_t offset = 0;\n ssize_t bytes_sent = 0, bytes_left = file_stat.st_size;\n while ((bytes_sent = sendfile(newsockfd, filefd,\n\t\t\t\t &offset, bytes_left)) > 0) {\n bytes_left -= bytes_sent;\n }\n\nend:\n // Close file and client socket\n if (filefd > 0) {\n close(filefd);\n }\n if (newsockfd > 0) {\n close(newsockfd);\n }\n }\n close(sockfd);\n return 0;\n}\n\nclient:\n\nint main(int argc, char *argv[]) {\n int sockfd, filefd;\n char *server_addr = argv[1];\n char *filename = argv[2];\n struct sockaddr_in sockaddr;\n char buffer[256];\n ssize_t n;\n\n if ((sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == -1) {\n perror(\"client: socket()\");\n return -1;\n }\n\n sockaddr.sin_family = AF_INET;\n inet_pton(AF_INET, server_addr, &sockaddr.sin_addr);\n sockaddr.sin_port = htons(8080);\n\n int val = 65536;\n if (setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF,\n\t\t &val, sizeof(val)) < 0) {\n perror(\"client: setockopt(SO_RCVBUF)\");\n return -1;\n }\n if (connect(sockfd, (struct sockaddr*)&sockaddr,\n\t\tsizeof(sockaddr)) == -1) {\n close(sockfd);\n perror(\"client: connect()\");\n return -1;\n }\n\n // Send filename to server\n n = write(sockfd, filename, strlen(filename));\n if (n < 0) {\n perror(\"client: write()\");\n return -1;\n }\n // Open file\n filefd = open(filename, O_WRONLY | O_CREAT, 0666);\n if(filefd < 0) {\n perror(\"client: open()\");\n return -1;\n }\n // Read file from server\n while((n = read(sockfd, buffer, sizeof(buffer))) > 0) {\n write(filefd, buffer, n);\n }\n // Close file and socket\n close(filefd);\n close(sockfd);\n return 0;\n}\n\nBefore the commit, it takes around 22 seconds to transfer 10M data.\nAfter the commit, it takes 40 seconds. Because our application has a\n30-second timeout, this regression broke the application.\n\nThe reason that it takes longer to transfer data is that\ntp->scaling_ratio is initialized to a value that results in ~0.25 of\nrcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which\ntranslates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k\ninitial receive window.\n\nLater, even though the scaling_ratio is updated to a more accurate\nskb->len/skb->truesize, which is ~0.66 in our environment, the window\nstays at ~0.25 * rcvbuf. This is because tp->window_clamp does not\nchange together with the tp->scaling_ratio update. As a result, the\nwindow size is capped at the initial window_clamp, which is also ~0.25 *\nrcvbuf, and never grows bigger.\n\nThis patch updates window_clamp along with scaling_ratio. It changes the\ncalculation of the initial rcv_wscale as well to make sure the scale\nfactor is also not capped by the initial window_clamp.\n\nSigned-off-by: Hechao Li \nFixes: dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\")","shortMessageHtmlLink":"tcp: update window_clamp together with scaling_ratio"}},{"before":"1f18f84b00b77970bbd43de244ebd268d7d26d59","after":"7821e49df1ddeaa78c773da3120e445f0c504631","ref":"refs/heads/hli/fix_tcp_window","pushedAt":"2024-04-01T23:52:39.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: update window_clamp together with scaling_ratio\n\nAfter commit dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nwe noticed an application-level timeout due to reduced throughput. This\ncan be reproduced by the following minimal client and server program.\n\nserver:\n\nint main(int argc, char *argv[]) {\n int sockfd;\n char buffer[256];\n struct sockaddr_in srv_addr;\n\n // Create socket\n sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);\n if (sockfd < 0) {\n perror(\"server: socket()\");\n return -1;\n }\n bzero((char *) &srv_addr, sizeof(srv_addr));\n srv_addr.sin_family = AF_INET;\n srv_addr.sin_addr.s_addr = htonl(INADDR_ANY);\n srv_addr.sin_port = htons(8080);\n // Bind socket\n if (bind(sockfd, (struct sockaddr *) &srv_addr,\n\t sizeof(srv_addr)) < 0) {\n perror(\"server: bind()\");\n close(sockfd);\n return -1;\n }\n // Listen for connections\n listen(sockfd,5);\n\n while(1) {\n int filefd = -1, newsockfd = -1;\n struct sockaddr_in cli_addr;\n socklen_t cli_len = sizeof(cli_addr);\n\n // Accept connection\n newsockfd = accept(sockfd, (struct sockaddr *)&cli_addr, &cli_len);\n if (newsockfd < 0) {\n perror(\"server: accept()\");\n goto end;\n }\n // Read filename from client\n bzero(buffer, sizeof(buffer));\n ssize_t n = read(newsockfd,buffer,sizeof(buffer)-1);\n if (n < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Open file\n filefd = open(buffer, O_RDONLY);\n if (filefd < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Get file size\n struct stat file_stat;\n if(fstat(filefd, &file_stat) < 0) {\n perror(\"server: fstat()\");\n goto end;\n }\n // Send file\n off_t offset = 0;\n ssize_t bytes_sent = 0, bytes_left = file_stat.st_size;\n while ((bytes_sent = sendfile(newsockfd, filefd,\n\t\t\t\t &offset, bytes_left)) > 0) {\n bytes_left -= bytes_sent;\n }\n\nend:\n // Close file and client socket\n if (filefd > 0) {\n close(filefd);\n }\n if (newsockfd > 0) {\n close(newsockfd);\n }\n }\n close(sockfd);\n return 0;\n}\n\nclient:\n\nint main(int argc, char *argv[]) {\n int sockfd, filefd;\n char *server_addr = argv[1];\n char *filename = argv[2];\n struct sockaddr_in sockaddr;\n char buffer[256];\n ssize_t n;\n\n if ((sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == -1) {\n perror(\"client: socket()\");\n return -1;\n }\n\n sockaddr.sin_family = AF_INET;\n inet_pton(AF_INET, server_addr, &sockaddr.sin_addr);\n sockaddr.sin_port = htons(8080);\n\n int val = 65536;\n if (setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF,\n\t\t &val, sizeof(val)) < 0) {\n perror(\"client: setockopt(SO_RCVBUF)\");\n return -1;\n }\n if (connect(sockfd, (struct sockaddr*)&sockaddr,\n\t\tsizeof(sockaddr)) == -1) {\n close(sockfd);\n perror(\"client: connect()\");\n return -1;\n }\n\n // Send filename to server\n n = write(sockfd, filename, strlen(filename));\n if (n < 0) {\n perror(\"client: write()\");\n return -1;\n }\n // Open file\n filefd = open(filename, O_WRONLY | O_CREAT, 0666);\n if(filefd < 0) {\n perror(\"client: open()\");\n return -1;\n }\n // Read file from server\n while((n = read(sockfd, buffer, sizeof(buffer))) > 0) {\n write(filefd, buffer, n);\n }\n // Close file and socket\n close(filefd);\n close(sockfd);\n return 0;\n}\n\nBefore the commit, it takes around 22 seconds to transfer 10M data.\nAfter the commit, it takes 40 seconds. Because our application has a\n30-second timeout, this regression broke the application.\n\nThe reason that it takes longer to transfer data is that\ntp->scaling_ratio is initialized to a value that results in ~0.25 of\nrcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which\ntranslates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k\ninitial receive window.\n\nLater, even though the scaling_ratio is updated to a more accurate\nskb->len/skb->truesize, which is ~0.66 in our environment, the window\nstays at ~0.25 * rcvbuf. This is because tp->window_clamp does not\nchange together with the tp->scaling_ratio update. As a result, the\nwindow size is capped at the initial window_clamp, which is also ~0.25 *\nrcvbuf, and never grows bigger.\n\nThis patch updates window_clamp along with scaling_ratio. It changes the\ncalculation of the initial rcv_wscale as well to make sure the scale\nfactor is also not capped by the initial window_clamp.\n\nSigned-off-by: Hechao Li \nFixes: dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\")","shortMessageHtmlLink":"tcp: update window_clamp together with scaling_ratio"}},{"before":"239d3ccefcc8323bebbcbe38fd27c7a279b8e993","after":"1f18f84b00b77970bbd43de244ebd268d7d26d59","ref":"refs/heads/hli/fix_tcp_window","pushedAt":"2024-04-01T23:26:31.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: update window_clamp together with scaling_ratio\n\nAfter commit dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nwe noticed an application-level timeout due to reduced throughput. This\ncan be reproduced by the following minimal client and server program.\n\nserver:\n\nint main(int argc, char *argv[]) {\n int sockfd;\n char buffer[256];\n struct sockaddr_in srv_addr;\n\n // Create socket\n sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);\n if (sockfd < 0) {\n perror(\"server: socket()\");\n return -1;\n }\n bzero((char *) &srv_addr, sizeof(srv_addr));\n srv_addr.sin_family = AF_INET;\n srv_addr.sin_addr.s_addr = htonl(INADDR_ANY);\n srv_addr.sin_port = htons(8080);\n // Bind socket\n if (bind(sockfd, (struct sockaddr *) &srv_addr,\n\t sizeof(srv_addr)) < 0) {\n perror(\"server: bind()\");\n close(sockfd);\n return -1;\n }\n // Listen for connections\n listen(sockfd,5);\n\n while(1) {\n int filefd = -1, newsockfd = -1;\n struct sockaddr_in cli_addr;\n socklen_t cli_len = sizeof(cli_addr);\n\n // Accept connection\n newsockfd = accept(sockfd, (struct sockaddr *)&cli_addr, &cli_len);\n if (newsockfd < 0) {\n perror(\"server: accept()\");\n goto end;\n }\n // Read filename from client\n bzero(buffer, sizeof(buffer));\n ssize_t n = read(newsockfd,buffer,sizeof(buffer)-1);\n if (n < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Open file\n filefd = open(buffer, O_RDONLY);\n if (filefd < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Get file size\n struct stat file_stat;\n if(fstat(filefd, &file_stat) < 0) {\n perror(\"server: fstat()\");\n goto end;\n }\n // Send file\n off_t offset = 0;\n ssize_t bytes_sent = 0, bytes_left = file_stat.st_size;\n while ((bytes_sent = sendfile(newsockfd, filefd,\n\t\t\t\t &offset, bytes_left)) > 0) {\n bytes_left -= bytes_sent;\n }\n\nend:\n // Close file and client socket\n if (filefd > 0) {\n close(filefd);\n }\n if (newsockfd > 0) {\n close(newsockfd);\n }\n }\n close(sockfd);\n return 0;\n}\n\nclient:\n\nint main(int argc, char *argv[]) {\n int sockfd, filefd;\n char *server_addr = argv[1];\n char *filename = argv[2];\n struct sockaddr_in sockaddr;\n char buffer[256];\n ssize_t n;\n\n if ((sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == -1) {\n perror(\"client: socket()\");\n return -1;\n }\n\n sockaddr.sin_family = AF_INET;\n inet_pton(AF_INET, server_addr, &sockaddr.sin_addr);\n sockaddr.sin_port = htons(8080);\n\n int val = 65536;\n if (setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF,\n\t\t &val, sizeof(val)) < 0) {\n perror(\"client: setockopt(SO_RCVBUF)\");\n return -1;\n }\n if (connect(sockfd, (struct sockaddr*)&sockaddr,\n\t\tsizeof(sockaddr)) == -1) {\n close(sockfd);\n perror(\"client: connect()\");\n return -1;\n }\n\n // Send filename to server\n n = write(sockfd, filename, strlen(filename));\n if (n < 0) {\n perror(\"client: write()\");\n return -1;\n }\n // Open file\n filefd = open(filename, O_WRONLY | O_CREAT, 0666);\n if(filefd < 0) {\n perror(\"client: open()\");\n return -1;\n }\n // Read file from server\n while((n = read(sockfd, buffer, sizeof(buffer))) > 0) {\n write(filefd, buffer, n);\n }\n // Close file and socket\n close(filefd);\n close(sockfd);\n return 0;\n}\n\nBefore the commit, it takes around 22 seconds to transfer 10M data.\nAfter the commit, it takes 40 seconds. Because our application has a\n30-second timeout, this regression broke the application.\n\nThe reason that it takes longer to transfer data is that\ntp->scaling_ratio is initialized to a value that results in ~0.25 of\nrcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which\ntranslates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k\ninitial receive window.\n\nLater, even though the scaling_ratio is updated to a more accurate\nskb->len/skb->truesize, which is ~0.66 in our environment, the window\nstays at ~0.25 * rcvbuf. This is because tp->window_clamp does not\nchange together with the tp->scaling_ratio update. As a result, the\nwindow size is capped at the initial window_clamp, which is also ~0.25 *\nrcvbuf, and never grows bigger.\n\nThis patch updates window_clamp along with scaling_ratio. It changes the\ncalculation of the initial rcv_wscale as well to make sure the scale\nfactor is also not capped by the initial window_clamp.\n\nSigned-off-by: Hechao Li \nFixes: dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\")","shortMessageHtmlLink":"tcp: update window_clamp together with scaling_ratio"}},{"before":"373f227c4e9fd551008797e22fb213aff7fd34ba","after":"239d3ccefcc8323bebbcbe38fd27c7a279b8e993","ref":"refs/heads/hli/fix_tcp_window","pushedAt":"2024-04-01T21:52:19.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: update window_clamp together with scaling_ratio\n\nAfter commit dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nwe noticed an application-level timeout due to reduced throughput. This\ncan be reproduced by the following minimal client and server program.\n\nserver:\n\nint main(int argc, char *argv[]) {\n int sockfd;\n char buffer[256];\n struct sockaddr_in srv_addr;\n\n // Create socket\n sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);\n if (sockfd < 0) {\n perror(\"server: socket()\");\n return -1;\n }\n bzero((char *) &srv_addr, sizeof(srv_addr));\n srv_addr.sin_family = AF_INET;\n srv_addr.sin_addr.s_addr = htonl(INADDR_ANY);\n srv_addr.sin_port = htons(8080);\n // Bind socket\n if (bind(sockfd, (struct sockaddr *) &srv_addr,\n\t sizeof(srv_addr)) < 0) {\n perror(\"server: bind()\");\n close(sockfd);\n return -1;\n }\n // Listen for connections\n listen(sockfd,5);\n\n while(1) {\n int filefd = -1, newsockfd = -1;\n struct sockaddr_in cli_addr;\n socklen_t cli_len = sizeof(cli_addr);\n\n // Accept connection\n newsockfd = accept(sockfd, (struct sockaddr *)&cli_addr, &cli_len);\n if (newsockfd < 0) {\n perror(\"server: accept()\");\n goto end;\n }\n // Read filename from client\n bzero(buffer, sizeof(buffer));\n ssize_t n = read(newsockfd,buffer,sizeof(buffer)-1);\n if (n < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Open file\n filefd = open(buffer, O_RDONLY);\n if (filefd < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Get file size\n struct stat file_stat;\n if(fstat(filefd, &file_stat) < 0) {\n perror(\"server: fstat()\");\n goto end;\n }\n // Send file\n off_t offset = 0;\n ssize_t bytes_sent = 0, bytes_left = file_stat.st_size;\n while ((bytes_sent = sendfile(newsockfd, filefd,\n\t\t\t\t &offset, bytes_left)) > 0) {\n bytes_left -= bytes_sent;\n }\n\nend:\n // Close file and client socket\n if (filefd > 0) {\n close(filefd);\n }\n if (newsockfd > 0) {\n close(newsockfd);\n }\n }\n close(sockfd);\n return 0;\n}\n\nclient:\n\nint main(int argc, char *argv[]) {\n int sockfd, filefd;\n char *server_addr = argv[1];\n char *filename = argv[2];\n struct sockaddr_in sockaddr;\n char buffer[256];\n ssize_t n;\n\n if ((sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == -1) {\n perror(\"client: socket()\");\n return -1;\n }\n\n sockaddr.sin_family = AF_INET;\n inet_pton(AF_INET, server_addr, &sockaddr.sin_addr);\n sockaddr.sin_port = htons(8080);\n\n int val = 65536;\n if (setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF,\n\t\t &val, sizeof(val)) < 0) {\n perror(\"client: setockopt(SO_RCVBUF)\");\n return -1;\n }\n if (connect(sockfd, (struct sockaddr*)&sockaddr,\n\t\tsizeof(sockaddr)) == -1) {\n close(sockfd);\n perror(\"client: connect()\");\n return -1;\n }\n\n // Send filename to server\n n = write(sockfd, filename, strlen(filename));\n if (n < 0) {\n perror(\"client: write()\");\n return -1;\n }\n // Open file\n filefd = open(filename, O_WRONLY | O_CREAT, 0666);\n if(filefd < 0) {\n perror(\"client: open()\");\n return -1;\n }\n // Read file from server\n while((n = read(sockfd, buffer, sizeof(buffer))) > 0) {\n write(filefd, buffer, n);\n }\n // Close file and socket\n close(filefd);\n close(sockfd);\n return 0;\n}\n\nBefore the commit, it takes around 22 seconds to transfer 10M data.\nAfter the commit, it takes 40 seconds. Because our application has a\n30-second timeout, this regression broke the application.\n\nThe reason that it takes longer to transfer data is that\ntp->scaling_ratio is initialized to a value that results in ~0.25 of\nrcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which\ntranslates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k\ninitial receive window.\n\nLater, even though the scaling_ratio is updated to a more accurate\nskb->len/skb->truesize, which is ~0.66 in our environment, the window\nstays at ~0.25 * rcvbuf. This is because tp->window_clamp does not\nchange together with the tp->scaling_ratio update. As a result, the\nwindow size is capped at the initial window_clamp, which is also ~0.25 *\nrcvbuf, and never grows bigger.\n\nThis patch updates window_clamp along with scaling_ratio. It changes the\ncalculation of the initial rcv_wscale as well to make sure the scale\nfactor is also not capped by the initial window_clamp.\n\nSigned-off-by: Hechao Li \nFixes: dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\")","shortMessageHtmlLink":"tcp: update window_clamp together with scaling_ratio"}},{"before":null,"after":"70ecca0b6071ff7d3aaa95753dd887249f85b90e","ref":"refs/heads/hli/nflx-v6.7.9","pushedAt":"2024-04-01T18:36:33.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"perf build: fix out of tree build\n\nIt seems that a previous modification to sysreg-defs, which corrected\nemitting the headaer to the specified output directory, exposed missing\nsubdir, prefix variables. This breaks out of tree builds of perf as the\nfile is now built into the output directory, but still tries to descend\ninto output directory as a subdir.\n\nFixes: a29ee6aea7\nSigned-off-by: Ethan Adams ","shortMessageHtmlLink":"perf build: fix out of tree build"}},{"before":null,"after":"373f227c4e9fd551008797e22fb213aff7fd34ba","ref":"refs/heads/hli/fix_tcp_window","pushedAt":"2024-04-01T18:11:07.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: update window_clamp together with scaling_ratio\n\nAfter commit dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nwe noticed an application-level timeout due to reduced throughput. This\ncan be reproduced by the following minimal client and server program.\n\nserver:\n\nint main(int argc, char *argv[]) {\n int sockfd;\n char buffer[256];\n struct sockaddr_in srv_addr;\n\n // Create socket\n sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);\n if (sockfd < 0) {\n perror(\"server: socket()\");\n return -1;\n }\n bzero((char *) &srv_addr, sizeof(srv_addr));\n srv_addr.sin_family = AF_INET;\n srv_addr.sin_addr.s_addr = htonl(INADDR_ANY);\n srv_addr.sin_port = htons(8080);\n // Bind socket\n if (bind(sockfd, (struct sockaddr *) &srv_addr,\n\t sizeof(srv_addr)) < 0) {\n perror(\"server: bind()\");\n close(sockfd);\n return -1;\n }\n // Listen for connections\n listen(sockfd,5);\n\n while(1) {\n int filefd = -1, newsockfd = -1;\n struct sockaddr_in cli_addr;\n socklen_t cli_len = sizeof(cli_addr);\n\n // Accept connection\n newsockfd = accept(sockfd, (struct sockaddr *)&cli_addr, &cli_len);\n if (newsockfd < 0) {\n perror(\"server: accept()\");\n goto end;\n }\n // Read filename from client\n bzero(buffer, sizeof(buffer));\n ssize_t n = read(newsockfd,buffer,sizeof(buffer)-1);\n if (n < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Open file\n filefd = open(buffer, O_RDONLY);\n if (filefd < 0) {\n perror(\"server: read()\");\n goto end;\n }\n // Get file size\n struct stat file_stat;\n if(fstat(filefd, &file_stat) < 0) {\n perror(\"server: fstat()\");\n goto end;\n }\n // Send file\n off_t offset = 0;\n ssize_t bytes_sent = 0, bytes_left = file_stat.st_size;\n while ((bytes_sent = sendfile(newsockfd, filefd,\n\t\t\t\t &offset, bytes_left)) > 0) {\n bytes_left -= bytes_sent;\n }\n\nend:\n // Close file and client socket\n if (filefd > 0) {\n close(filefd);\n }\n if (newsockfd > 0) {\n close(newsockfd);\n }\n }\n close(sockfd);\n return 0;\n}\n\nclient:\n\nint main(int argc, char *argv[]) {\n int sockfd, filefd;\n char *server_addr = argv[1];\n char *filename = argv[2];\n struct sockaddr_in sockaddr;\n char buffer[256];\n ssize_t n;\n\n if ((sockfd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) == -1) {\n perror(\"client: socket()\");\n return -1;\n }\n\n sockaddr.sin_family = AF_INET;\n inet_pton(AF_INET, server_addr, &sockaddr.sin_addr);\n sockaddr.sin_port = htons(8080);\n\n int val = 65536;\n if (setsockopt(sockfd, SOL_SOCKET, SO_RCVBUF,\n\t\t &val, sizeof(val)) < 0) {\n perror(\"client: setockopt(SO_RCVBUF)\");\n return -1;\n }\n if (connect(sockfd, (struct sockaddr*)&sockaddr,\n\t\tsizeof(sockaddr)) == -1) {\n close(sockfd);\n perror(\"client: connect()\");\n return -1;\n }\n\n // Send filename to server\n n = write(sockfd, filename, strlen(filename));\n if (n < 0) {\n perror(\"client: write()\");\n return -1;\n }\n // Open file\n filefd = open(filename, O_WRONLY | O_CREAT, 0666);\n if(filefd < 0) {\n perror(\"client: open()\");\n return -1;\n }\n // Read file from server\n while((n = read(sockfd, buffer, sizeof(buffer))) > 0) {\n write(filefd, buffer, n);\n }\n // Close file and socket\n close(filefd);\n close(sockfd);\n return 0;\n}\n\nBefore the commit, it takes around 22 seconds to transfer 10M data.\nAfter the commit, it takes 40 seconds. Because our application has a\n30-second timeout, this regression broke the application.\n\nThe reason that it takes longer to transfer data is that\ntp->scaling_ratio is initialized to a value that results in ~0.25 of\nrcvbuf. In our case, SO_RCVBUF is set to 65536 by the application, which\ntranslates to 2 * 65536 = 131,072 bytes in rcvbuf and hence a ~28k\ninitial receive window.\n\nLater, even though the scaling_ratio is updated to a more accurate\nskb->len/skb->truesize, which is ~0.66 in our environment, the window\nstays at ~0.25 * rcvbuf. This is because tp->window_clamp does not\nchange together with the tp->scaling_ratio update. As a result, the\nwindow size is capped at the initial window_clamp, which is also ~0.25 *\nrcvbuf, and never grows bigger.\n\nThis patch updates window_clamp along with scaling_ratio. It changes the\ncalculation of the initial rcv_wscale as well to make sure the scale\nfactor is also not capped by the initial window_clamp.\n\nSigned-off-by: Hechao Li \nFixes: dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\")","shortMessageHtmlLink":"tcp: update window_clamp together with scaling_ratio"}},{"before":"cc1a016737154f2cb226d4293657e343e1d8321b","after":"d5fd3f9ae5a1bbd6c7b94fdab4a4ccbe42cc7ce4","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T08:34:31.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"2874e59849db999df4e1100def3d6f744660eec6","after":"cc1a016737154f2cb226d4293657e343e1d8321b","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T08:21:38.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"29b0f13087740f75d359296e5e081a968879d8d1","after":"2874e59849db999df4e1100def3d6f744660eec6","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T07:14:29.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"590697d1546b9e376f3d7ec0d6af734e4cdfeffb","after":"29b0f13087740f75d359296e5e081a968879d8d1","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T06:26:17.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"4618725173e58b3873f3e4ed0e8fd23c5dc8f144","after":"590697d1546b9e376f3d7ec0d6af734e4cdfeffb","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T05:59:23.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"9d5937ad75a3340eac96ff14fc746f8eb3354e65","after":"4618725173e58b3873f3e4ed0e8fd23c5dc8f144","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T04:35:47.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"ace570f5b9185029bc10137dad23b3dc77a8f2ed","after":"9d5937ad75a3340eac96ff14fc746f8eb3354e65","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T02:46:25.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"06875bb3b732c0942c4948cce2bdbc727f0c057d","after":"ace570f5b9185029bc10137dad23b3dc77a8f2ed","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T02:26:02.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"8bc450177fec89b6aead593a09082da5ab1ee49b","after":"06875bb3b732c0942c4948cce2bdbc727f0c057d","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T02:09:51.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"8a5bfa185c0be30b19a69fa5863eb201c1be8715","after":"8bc450177fec89b6aead593a09082da5ab1ee49b","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T00:14:19.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP_DEFAULT_SCALING_RATIO","shortMessageHtmlLink":"Increase TCP_DEFAULT_SCALING_RATIO"}},{"before":"add3ba8b41286d093d644ca8c5e29c8be4412a8b","after":"8a5bfa185c0be30b19a69fa5863eb201c1be8715","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-28T00:11:16.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Increase TCP window initial scaling ratio","shortMessageHtmlLink":"Increase TCP window initial scaling ratio"}},{"before":"097f32c5a8da07ebab3fa35ea000070e2d287aa8","after":"e52cdd7677abaaf48561bc5826cf3e51f1128c14","ref":"refs/heads/hli/tcp_scaling_ratio","pushedAt":"2024-03-27T00:46:24.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: bring back tcp_adv_win_scale\n\nIn the patch dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nthe TCP receive window size calculation was changed.\n\nBefore the patch, the receive window is calculated by\n\nstatic inline int tcp_win_from_space(const struct sock *sk, int space)\n{\n\tint tcp_adv_win_scale =\n\t\tREAD_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);\n\n\treturn tcp_adv_win_scale <= 0 ?\n\t\t(space>>(-tcp_adv_win_scale)) :\n\t\tspace - (space>>tcp_adv_win_scale);\n}\n\nWhen sysctl_tcp_adv_win_scale is set to 1, the TCP receive window is\n(space - (space >> 1)) = space - 1/2 * space = 1/2 * space.\n\nAfter the patch, the receive window is calculated by\n\nstatic inline int tcp_win_from_space(const struct sock *sk, int space)\n{\n\ts64 scaled_space = (s64)space * tcp_sk(sk)->scaling_ratio;\n\n\treturn scaled_space >> TCP_RMEM_TO_WIN_SCALE;\n}\n\nAnd scaling_ratio is initialized to\n\nstatic inline void tcp_scaling_ratio_init(struct sock *sk)\n{\n\t/* Assume a conservative default of 1200 bytes of payload per 4K page.\n\t * This may be adjusted later in tcp_measure_rcv_mss().\n\t */\n\ttcp_sk(sk)->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) /\n\t\t\t\t SKB_TRUESIZE(4096);\n}\nWith TCP_RMEM_TO_WIN_SCALE = 8, the initial scaling_ratio = 65 so that\nthe TCP receive window is (space * 65) >> 8 ~= 1/4 * space.\n\nHere space == rcv_buf.\n\nIn our environment, we used to have sysctl_tcp_adv_win_scale=1 and hence\nhad a window size to be 1/2 of rcv_buf. But after a kernel upgrade, the\nnew change causes the initial TCP window size to be 1/4 of the rcv_buf\nvalue. In one of our kafka applications, the client sets sockopt\nSO_RCVBUF to 65536, which in turn sets rcv_buf to 2 * 65536 = 131072.\nThis results in a smaller TCP receive window (32k v.s. old 62k) and\nhence slower data transfer. Originally, the app could transfer 10MB data\nwithin 30 seconds. But after that patch, the transfer times out after 30\nseconds (application-level timeout).\n\nIn short, even though that patch is trying to scale the TCP window\ncorrectly, it does break certain applications. This patch brings back\nsysctl_tcp_adv_win_scale and also adds a new\nsysctl_tcp_adv_win_scale_enabled to decide whether to use\nsysctl_tcp_adv_win_scale. When sysctl_tcp_adv_win_scale_enabled = 0,\nthen sysctl_tcp_adv_win_scale value is ignored and scaling_ratio is used\nas the original patch does.\n\nNote that the initial scaling ratio is also increased. Because with\nMTU=1500, we observed skb->len/skb->truesize ratio to be 1448/3480 ~=\n0.42, therefore, increase 1200 to 2000 in the initial scaling ratio\ncalculation is closer to the reality.\n\nSigned-off-by: Hechao Li ","shortMessageHtmlLink":"tcp: bring back tcp_adv_win_scale"}},{"before":"1549de7ec00e3ca8ff711c234739a8036b24a2e2","after":"097f32c5a8da07ebab3fa35ea000070e2d287aa8","ref":"refs/heads/hli/tcp_scaling_ratio","pushedAt":"2024-03-27T00:34:15.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: bring back tcp_adv_win_scale\n\nIn the patch dfa2f0483360 (\"tcp: get rid of sysctl_tcp_adv_win_scale\"),\nthe TCP receive window size calculation was changed.\n\nBefore the patch, the receive window is calculated by\n\nstatic inline int tcp_win_from_space(const struct sock *sk, int space)\n{\n\tint tcp_adv_win_scale =\n\t\tREAD_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);\n\n\treturn tcp_adv_win_scale <= 0 ?\n\t\t(space>>(-tcp_adv_win_scale)) :\n\t\tspace - (space>>tcp_adv_win_scale);\n}\n\nWhen sysctl_tcp_adv_win_scale is set to 1, the TCP receive window is\n(space - (space >> 1)) = space - 1/2 * space = 1/2 * space.\n\nAfter the patch, the receive window is calculated by\n\nstatic inline int tcp_win_from_space(const struct sock *sk, int space)\n{\n\ts64 scaled_space = (s64)space * tcp_sk(sk)->scaling_ratio;\n\n\treturn scaled_space >> TCP_RMEM_TO_WIN_SCALE;\n}\n\nAnd scaling_ratio is initialized to\n\nstatic inline void tcp_scaling_ratio_init(struct sock *sk)\n{\n\t/* Assume a conservative default of 1200 bytes of payload per 4K page.\n\t * This may be adjusted later in tcp_measure_rcv_mss().\n\t */\n\ttcp_sk(sk)->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) /\n\t\t\t\t SKB_TRUESIZE(4096);\n}\nWith TCP_RMEM_TO_WIN_SCALE = 8, the initial scaling_ratio = 65 so that\nthe TCP receive window is (space * 65) >> 8 ~= 1/4 * space.\n\nHere space == rcv_buf.\n\nIn our environment, we used to have sysctl_tcp_adv_win_scale=1 and hence\nhad a window size to be 1/2 of rcv_buf. But after a kernel upgrade, the\nnew change causes the initial TCP window size to be 1/4 of the rcv_buf\nvalue. In one of our kafka applications, the client sets sockopt\nSO_RCVBUF to 65536, which in turn sets rcv_buf to 2 * 65536 = 131072.\nThis results in a smaller TCP receive window (32k v.s. old 62k) and\nhence slower data transfer. Originally, the app could transfer 10MB data\nwithin 30 seconds. But after that patch, the transfer times out after 30\nseconds (application-level timeout).\n\nIn short, even though that patch is trying to scale the TCP window\ncorrectly, it does break certain applications. This patch brings back\nsysctl_tcp_adv_win_scale and also adds a new\nsysctl_tcp_adv_win_scale_enabled to decide whether to use\nsysctl_tcp_adv_win_scale. When sysctl_tcp_adv_win_scale_enabled = 0,\nthen sysctl_tcp_adv_win_scale value is ignored and scaling_ratio is used\nas the original patch does.\n\nNote that the initial scaling ratio is also increased. Because with\nMTU=1500, we observed skb->len/skb->truesize ratio to be 1448/3480 ~=\n0.42, therefore, increase 1200 to 2000 in the initial scaling ratio\ncalculation is closer to the reality.\n\nSigned-off-by: Hechao Li ","shortMessageHtmlLink":"tcp: bring back tcp_adv_win_scale"}},{"before":"bbc331e0af5f51f311c43f28b9d1cfb5cc46a321","after":"d8f4aeac0504888441acfd50fbf4ea36821b0b64","ref":"refs/heads/nflx-v6.6.16__with_reverted_tcp_suspect_change","pushedAt":"2024-03-26T14:05:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"jadams41","name":"Ethan Adams","path":"/jadams41","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4854550?s=80&v=4"},"commit":{"message":"Merge tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs\n\nPull pdfd updates from Christian Brauner:\n\n - Until now pidfds could only be created for thread-group leaders but\n not for threads. There was no technical reason for this. We simply\n had no users that needed support for this. Now we do have users that\n need support for this.\n\n This introduces a new PIDFD_THREAD flag for pidfd_open(). If that\n flag is set pidfd_open() creates a pidfd that refers to a specific\n thread.\n\n In addition, we now allow clone() and clone3() to be called with\n CLONE_PIDFD | CLONE_THREAD which wasn't possible before.\n\n A pidfd that refers to an individual thread differs from a pidfd that\n refers to a thread-group leader:\n\n (1) Pidfds are pollable. A task may poll a pidfd and get notified\n when the task has exited.\n\n For thread-group leader pidfds the polling task is woken if the\n thread-group is empty. In other words, if the thread-group\n leader task exits when there are still threads alive in its\n thread-group the polling task will not be woken when the\n thread-group leader exits but rather when the last thread in the\n thread-group exits.\n\n For thread-specific pidfds the polling task is woken if the\n thread exits.\n\n (2) Passing a thread-group leader pidfd to pidfd_send_signal() will\n generate thread-group directed signals like kill(2) does.\n\n Passing a thread-specific pidfd to pidfd_send_signal() will\n generate thread-specific signals like tgkill(2) does.\n\n The default scope of the signal is thus determined by the type\n of the pidfd.\n\n Since use-cases exist where the default scope of the provided\n pidfd needs to be overriden the following flags are added to\n pidfd_send_signal():\n\n - PIDFD_SIGNAL_THREAD\n Send a thread-specific signal.\n\n - PIDFD_SIGNAL_THREAD_GROUP\n Send a thread-group directed signal.\n\n - PIDFD_SIGNAL_PROCESS_GROUP\n Send a process-group directed signal.\n\n The scope change will only work if the struct pid is actually\n used for this scope.\n\n For example, in order to send a thread-group directed signal the\n provided pidfd must be used as a thread-group leader and\n similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be\n used as a process group leader.\n\n - Move pidfds from the anonymous inode infrastructure to a tiny pseudo\n filesystem. This will unblock further work that we weren't able to do\n simply because of the very justified limitations of anonymous inodes.\n Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds\n to become useful for the first time. They can now be compared by\n inode number which are unique for the system lifetime.\n\n Instead of stashing struct pid in file->private_data we can now stash\n it in inode->i_private. This makes it possible to introduce concepts\n that operate on a process once all file descriptors have been closed.\n A concrete example is kill-on-last-close. Another side-effect is that\n file->private_data is now freed up for per-file options for pidfds.\n\n Now, each struct pid will refer to a different inode but the same\n struct pid will refer to the same inode if it's opened multiple\n times. In contrast to now where each struct pid refers to the same\n inode.\n\n The tiny pseudo filesystem is not visible anywhere in userspace\n exactly like e.g., pipefs and sockfs. There's no lookup, there's no\n complex inode operations, nothing. Dentries and inodes are always\n deleted when the last pidfd is closed.\n\n We allocate a new inode and dentry for each struct pid and we reuse\n that inode and dentry for all pidfds that refer to the same struct\n pid. The code is entirely optional and fairly small. If it's not\n selected we fallback to anonymous inodes. Heavily inspired by nsfs.\n\n The dentry and inode allocation mechanism is moved into generic\n infrastructure that is now shared between nsfs and pidfs. The\n path_from_stashed() helper must be provided with a stashing location,\n an inode number, a mount, and the private data that is supposed to be\n used and it will provide a path that can be passed to dentry_open().\n\n The helper will try retrieve an existing dentry from the provided\n stashing location. If a valid dentry is found it is reused. If not a\n new one is allocated and we try to stash it in the provided location.\n If this fails we retry until we either find an existing dentry or the\n newly allocated dentry could be stashed. Subsequent openers of the\n same namespace or task are then able to reuse it.\n\n - Currently it is only possible to get notified when a task has exited,\n i.e., become a zombie and userspace gets notified with EPOLLIN. We\n now also support waiting until the task has been reaped, notifying\n userspace with EPOLLHUP.\n\n - Ensure that ESRCH is reported for getfd if a task is exiting instead\n of the confusing EBADF.\n\n - Various smaller cleanups to pidfd functions.\n\n* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)\n libfs: improve path_from_stashed()\n libfs: add stashed_dentry_prune()\n libfs: improve path_from_stashed() helper\n pidfs: convert to path_from_stashed() helper\n nsfs: convert to path_from_stashed() helper\n libfs: add path_from_stashed()\n pidfd: add pidfs\n pidfd: move struct pidfd_fops\n pidfd: allow to override signal scope in pidfd_send_signal()\n pidfd: change pidfd_send_signal() to respect PIDFD_THREAD\n signal: fill in si_code in prepare_kill_siginfo()\n selftests: add ESRCH tests for pidfd_getfd()\n pidfd: getfd should always report ESRCH if a task is exiting\n pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together\n pidfd: exit: kill the no longer used thread_group_exited()\n pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN))\n pid: kill the obsolete PIDTYPE_PID code in transfer_pid()\n pidfd: kill the no longer needed do_notify_pidfd() in de_thread()\n pidfd_poll: report POLLHUP when pid_task() == NULL\n pidfd: implement PIDFD_THREAD flag for pidfd_open()\n ...\n\n(cherry picked from commit b5683a37c881e2e08065f1670086e281430ee19f)","shortMessageHtmlLink":"Merge tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kerne…"}},{"before":"beb8b66127cb442122f59ff4dd5c41a8df41b443","after":"1549de7ec00e3ca8ff711c234739a8036b24a2e2","ref":"refs/heads/hli/tcp_scaling_ratio","pushedAt":"2024-03-26T01:37:48.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: bring back tcp_adv_win_scale\n\nIn the patch \"tcp: get rid of sysctl_tcp_adv_win_scale\"\n(dfa2f0483360d4d6f2324405464c9f281156bd87), the TCP receive window size\ncalculation was changed.\n\nBefore the patch, the receive window is calculated by\n\nstatic inline int tcp_win_from_space(const struct sock *sk, int space)\n{\n\tint tcp_adv_win_scale = READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_adv_win_scale);\n\treturn tcp_adv_win_scale <= 0 ?\n\t\t(space>>(-tcp_adv_win_scale)) :\n\t\tspace - (space>>tcp_adv_win_scale);\n}\n\nWhen sysctl_tcp_adv_win_scale is set to 1, the TCP receive window is\n(space - (space >> 1)) = space - 1/2 * space = 1/2 * space.\n\nAfter the patch, the receive window is calculated by\n\nstatic inline int tcp_win_from_space(const struct sock *sk, int space)\n{\n\ts64 scaled_space = (s64)space * tcp_sk(sk)->scaling_ratio;\n\treturn scaled_space >> TCP_RMEM_TO_WIN_SCALE;\n}\n\nAnd scaling_ratio is initialized to\n\nstatic inline void tcp_scaling_ratio_init(struct sock *sk)\n{\n\t/* Assume a conservative default of 1200 bytes of payload per 4K page.\n\t * This may be adjusted later in tcp_measure_rcv_mss().\n\t */\n\ttcp_sk(sk)->scaling_ratio = (1200 << TCP_RMEM_TO_WIN_SCALE) /\n\t\t\t\t SKB_TRUESIZE(4096);\n}\nWith TCP_RMEM_TO_WIN_SCALE = 8, the initial scaling_ratio = 65 so that\nthe TCP receive window is (space * 65) >> 8 ~= 1/4 * space.\n\nHere space == rcv_buf.\n\nIn our environment, we used to have sysctl_tcp_adv_win_scale=1 and hence\nhad a window size to be 1/2 of rcv_buf. But after a kernel upgrade, the\nnew change causes the initial TCP window size to be 1/4 of the rcv_buf\nvalue. In one of our kafka applications, the client sets sockopt\nSO_RCVBUF to 65536, which in turn sets rcv_buf to 2 * 65536 = 131072.\nThis results in a smaller TCP receive window (32k v.s. old 62k) and\nhence slower data transfer. Originally, the app could transfer 10MB data\nwithin 30 seconds. But after that patch, the transfer times out after 30\nseconds (application-level timeout).\n\nIn short, even though that patch is trying to scale the TCP window\ncorrectly, it does break certain applications. This patch brings back\nsysctl_tcp_adv_win_scale and also adds a new\nsysctl_tcp_adv_win_scale_enabled to decide whether to use\nsysctl_tcp_adv_win_scale. When sysctl_tcp_adv_win_scale_enabled = 0,\nthen sysctl_tcp_adv_win_scale value is ignored and scaling_ratio is used\nas the original patch does.\n\nNote that the initial scaling ratio is also increased. Because with\nMTU=1500, we observed skb->len/skb->truesize ratio to be 1448/3480 ~=\n0.42, therefore, increase 1200 to 2000 in the initial scaling ratio\ncalculation is closer to the reality.","shortMessageHtmlLink":"tcp: bring back tcp_adv_win_scale"}},{"before":"d6e4c0fd2e837c36b9ef9a6143cb8ea3d8e9d7da","after":"beb8b66127cb442122f59ff4dd5c41a8df41b443","ref":"refs/heads/hli/tcp_scaling_ratio","pushedAt":"2024-03-25T20:30:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Revert \"tcp: increase the default TCP scaling ratio\"\n\nThis reverts commit d6e4c0fd2e837c36b9ef9a6143cb8ea3d8e9d7da.","shortMessageHtmlLink":"Revert \"tcp: increase the default TCP scaling ratio\""}},{"before":null,"after":"d6e4c0fd2e837c36b9ef9a6143cb8ea3d8e9d7da","ref":"refs/heads/hli/tcp_scaling_ratio","pushedAt":"2024-03-22T19:11:11.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"tcp: increase the default TCP scaling ratio","shortMessageHtmlLink":"tcp: increase the default TCP scaling ratio"}},{"before":null,"after":"add3ba8b41286d093d644ca8c5e29c8be4412a8b","ref":"refs/heads/hli/dummy","pushedAt":"2024-03-19T20:44:18.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"hechaoli","name":"Hechao Li","path":"/hechaoli","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/16675751?s=80&v=4"},"commit":{"message":"Dummy change","shortMessageHtmlLink":"Dummy change"}},{"before":"ead216e3e804ece1106c60aa71c39edfd3235d83","after":"5d9d1342ce652c1b463299dc3856667f36eae26d","ref":"refs/heads/nflx-v6.7.9","pushedAt":"2024-03-15T23:31:14.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jadams41","name":"Ethan Adams","path":"/jadams41","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4854550?s=80&v=4"},"commit":{"message":"perf build: fix out of tree build\n\nIt seems that a previous modification to sysreg-defs, which corrected\nemitting the headaer to the specified output directory, exposed missing\nsubdir, prefix variables. This breaks out of tree builds of perf as the\nfile is now built into the output directory, but still tries to descend\ninto output directory as a subdir.\n\nFixes: a29ee6aea7\nSigned-off-by: Ethan Adams ","shortMessageHtmlLink":"perf build: fix out of tree build"}},{"before":"fcc5f708d0c5252fe0845502205bac9674547a1f","after":"5d9d1342ce652c1b463299dc3856667f36eae26d","ref":"refs/heads/eadams/fix-out-of-tree-perf-build","pushedAt":"2024-03-13T18:00:58.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jadams41","name":"Ethan Adams","path":"/jadams41","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4854550?s=80&v=4"},"commit":{"message":"perf build: fix out of tree build\n\nIt seems that a previous modification to sysreg-defs, which corrected\nemitting the headaer to the specified output directory, exposed missing\nsubdir, prefix variables. This breaks out of tree builds of perf as the\nfile is now built into the output directory, but still tries to descend\ninto output directory as a subdir.\n\nFixes: a29ee6aea7\nSigned-off-by: Ethan Adams ","shortMessageHtmlLink":"perf build: fix out of tree build"}},{"before":null,"after":"fcc5f708d0c5252fe0845502205bac9674547a1f","ref":"refs/heads/eadams/fix-out-of-tree-perf-build","pushedAt":"2024-03-13T15:07:31.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jadams41","name":"Ethan Adams","path":"/jadams41","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4854550?s=80&v=4"},"commit":{"message":"perf build: fix out of tree build\n\nIt seems that a previous modification to sysreg-defs, which corrected\nemitting the headaer to the specified output directory, exposed missing\nsubdir, prefix variables. This breaks out of tree builds of perf as the\nfile is now built into the output directory, but still tries to descend\ninto output directory as a subdir.\n\nFixes: a29ee6aea7\n\nSigned-off-by: Ethan Adams ","shortMessageHtmlLink":"perf build: fix out of tree build"}},{"before":"bf109b4440c84fd79b5b7879397234744b22b4da","after":"ead216e3e804ece1106c60aa71c39edfd3235d83","ref":"refs/heads/nflx-v6.7.9","pushedAt":"2024-03-13T01:28:38.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jadams41","name":"Ethan Adams","path":"/jadams41","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4854550?s=80&v=4"},"commit":{"message":"fix it all","shortMessageHtmlLink":"fix it all"}},{"before":"07c5381f95c478deb129d0987c6d371d950801a8","after":"bf109b4440c84fd79b5b7879397234744b22b4da","ref":"refs/heads/nflx-v6.7.9","pushedAt":"2024-03-13T00:51:35.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jadams41","name":"Ethan Adams","path":"/jadams41","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4854550?s=80&v=4"},"commit":{"message":"fix it all","shortMessageHtmlLink":"fix it all"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEK2NqoAA","startCursor":null,"endCursor":null}},"title":"Activity · Netflix-Skunkworks/linux"}