C Lessons

Quick start

It is not C Lessons at all :). I’d programming in C long time ago, sometimes I want to pick something up, but I cannot find the peice of code somewhere or cannot run the code written in another machine.

Sadly, old dog always need to learn something new.

Access the code from anywhere, oh, GitHub is good one
Run or write code on anywhere, so Linux, Darwin, or Windows, Docker Box
Easy to try and learn

Now, we had Nore, something changed and something not.

Let’s start …

# bootstrap Nore
curl https://raw.githubusercontent.com/junjiemars/nore/master/bootstrap.sh -sSfL | sh

# configure -> make -> test -> install
./configure --has-hi
make
make test
make install

Language

Run the example under src/lang.

./configure --has-lang
make clean test

Preprocessor

The preprocessor runs first, as the name implies. It performs some text manipulations, such as:

stripping comments
resolving #include directives and replacing them with the contents of the included file
#include_next directives does not distinguish between <file> and =”file”= inclusion, just look the file in the search path
evaluating #if and #ifdef directives
evaluating #define
expanding the macros found in the rest of the code according to those #define

./configure --lang
make clean lang_preprocessor_test

`#ident`

`#include`

The #include directive instructs the preprocessor to paste the text of the given file into the current file. Generally, it is necessary to tell the preprocessor where to look for header files if they are not placed in the current directory or a standard system directory.

`#define`

The #define directive takes two forms: defining a constant or creating a macro.

Defining a constant

#define identifier [value]

When defining a constant, you may optionally elect not to provide a value for that constant. In this case, the identifier will be replaced with blank text, but will be “defined” for the purposes of #ifdef and ifndef. If a value is provided, the given token will be replaced literally with the remainder of the text on the line. You should be careful when using #define in this way.

Defining a parameterized macro

#define identifier(<arg> [, <arg> ...]) statement
#define max(a, b) ((a) > (b) ? (a) : (b))

`#undef`

#undef identifier

The #undef directive undefines a constant or macro that defined previously using #define.

For example:

#define E 2.71828
double e_squared = E * E;
#ifdef E
#  undef E
#endif

Usually, #undef is used to scope a preprocessor constant into a very limited region: this is done to avoid leaking the constant. #undef is the only way to create this scope since the preprocessor does not understand block scope.

`#if` vs. `#ifdef`

#if check the value of the symbol when the symbol had been defined, #ifdef just check the existence of the symbol.

Prefer #if defined(...), it’s more flexible

#if defined(LINUX) || defined(DARWIN)
/* code: when on LINUX or DARWIN platform */
#endif

#if defined(CLANG) && (1 == NM_CPU_LITTLE_ENDIAN)
/* code: when using clang compiler and on a little endian machine */
#endif

`#ifndef`

#ifndef identifer
/* code: when the identifier had not been defined */
#endif

#ifndef checks whether the given identifier has been #defined earlier in the file or in an included file; if not, it includes the code between it and the closing #else or, if no #else is present, #endif statement. #ifndef is often used to make header files idempotent by defining a identifier once the file has been included and checking that the identifier was not set at the top of that file.

#ifndef    _LANG_H_
#  define  _LANG_H_
#endif

#if !defined(identifier) is equivalent to #ifndef identifier

#if !defined(min)
#  define min(a, b) ((a) < (b) ? (a) : (b))
#endif

`#error`

#error "[description]"

The #error macro allows you to make compilation fail and issue a statement that will appear in the list of compilation errors. It is most useful when combined with #if/#elif/#else to fail compilation if some condition is not true. For example:

#if (1 == _ERROR_)
#  error "compile failed: because _ERROR_ == 1 is true"
#endif

`#pragma`

The #pragma directive is used to access compiler-specific preprocessor extensions.

A common use of #pragma is the #pragma once directive, which asks the compiler to include a header file only a single time, no matter how many times it has been imported.

#pragma once
/* header file code */

/* #pragma once is equivalent to */
#ifndef    _FILE_NAME_H_
#  define  _FILE_NAME_H_
/* header file code */
#endif

The #pragma directive can also be used for other compiler-specific purposes. #pragma is commonly used to suppress warnings.

#if (MSVC)
#  pragma warning(disable:4706) /* assignment within conditional expression */
#  pragma comment(lib, "Ws2_32.lib") /* link to Ws2_32.lib */
#elif (GCC)
#  pragma GCC diagnostic ignored "-Wstrict-aliasing" /* (unsigned*) &x */
#elif (CLANG)
#  pragma clang diagnostic ignored "-Wparentheses"
#endif

`FILE`

__FILE__ expands to full path to the current file
__LINE__ expands to current line number in the source file, as an integer
__DATE__ expands to current date at compile time in the form Mmm dd yyyy as a string, such as “Oct 26 2021”
__TIME__ expands to current time at compile time in the form hh:mm:ss in 24 hour time as a string, such as “16:08:17”
__TIMESTAMP__ expands to current time at compile time in the form Ddd Mmm Date hh::mm::ss yyyy as a string, where the time is in 24 hour time, Ddd is the abbreviated day, Mmm is the abbreviated month, Date is the current day of the month (1-31), and yyyy is the four digit year, such as “Tue Oct 26 12:42:21 2021”
__func__ expands to the function name as part of C99

main

exit

Most C programs call the library routine exit, which flushes buffers, closes streams, unlinks temporary files, etc., before calling _exit.

assert

No, there’s nothing wrong with assert as long as you use it as intended.

assert: a failure in the program’s logic itself.
error: an erroneous input or system state not due to a bug in the program.

Assertions are primarily intended for use during debugging and are generally turned off before code is deployed by defining the NDEBUG macro.

# with assert
./configure --has-lang
make clean lang_assert_test

# erase assertions: simple way
./configure --has-lang --with-release=yes
make clean lang_assert_test

An assertion specifies that a program statisfies certain conditions at particular points in its execution. There are three types of assertion:

preconditions: specify conditions at the start of a function.
postconditions: specify conditions at the end of a function.
invariants: specify conditions over a defined region of a program.

The static_assert macro, which expands to the _Static_assert_, a keyword added in C11 to provide compile-time assertion.

enum

enum [identifier] { enumerator-list };

enumerator = constant-expression;

enumerator-list is a comma-separated list, tailing comma permitted since C99, identifier is optional. If enumerator is followed by constant expression, its value is the value of that constant expression. If enumerator is not followed by constant-expression, its value is the value one greater than the value of the previous enumerator in the same enumeration. The value of the first enumerator if it does not use constant-expression is zero.

Unlike struct and union, there are no forward-declared enum in C.

Error

fail safe pertaining to a system or component that automatically places itself in a safe operating mode in the event of a failue: a traffic light that reverts to blinking red in all directions when normal operation fails.
fail soft pertaining to a system or component that continues to provide partial operational capability in the event of certain failues: a traffic light that continues to alternate between red and green if the yellow light fails. A static variable errno indicating the error status of a function call or object. These indicators are fail soft.
fail hard aka fail fast or fail stop. The reaction to a detected fault is to immediately halt the system. Termination is fail hard.

errno

Before C11, errno was a global variable, with all the inherent disadvantages:

later system calls overwrote earlier system calls;
global map of values to error conditions (ENOMEM, ERANGE, etc);
behavior is underspecified in ISO C and POSIX;
technically errno is a modifiable lvalue rather than a global variable, so expressions like &errno may not be well-defined;
thread-unsafe;

In C11, errno is thread-local, so it is thread-safe.

Disadvantages of Function Return Value:

functions that return error indicators cannot use return value for other uses;
checking every function call for an error condition increases code stabilities by 30%-40%;
impossible for library function to enforce that callers check for error condition.

strerror

char * strerror(int errnum);

Interprets the value of errnum, generating a string with a message that describes the error condition as if set to errno by a function of the library. The returned pointer points to a statically allocated string, which shall not be modified by the program. Further calls to this function may overwrite its content (particular library implementations are not required to avoid data races). The error strings produced by strerror may be specific to each system and library implementation.

perror

void perror(const char *str);

Interprets the value of errno as an error message, and prints it to stderr (the standard error output stream, usually the console), optionally preceding it with the custom message specified in str. If the parameter str is not a null pointer, str is printed followed by a colon : and a space. Then, whether str was a null pointer or not, the generated error description is printed followed by a newline character =’\n’. =perror should be called right after the error was produced, otherwise it can be overwritten by calls to other functions.

Function

main

C90 main() declarations:

int main(void);

int main(int argc, char **argv);

/* samed with above */
int main(int argc, char *argv[]);

/* classicaly, Unix system support a third variant */
int main(int argc, char **argv, char**envp);

C99 the value return from main():

the int return type may not be omitted.
the return statement may be omitted, if so and main() finished, there is an implicit return 0.

In arguments:

argc > 0
argv[argc] == 0
argv[0] through to argv[argc-1] are pointers to string whose meaning will be determined by the program.
argv[0] will be a string containing the program’s name or a null string if that is not avaiable.
envp is not specified by POSIX but widely supported, getenv is the only one specified by the C standard, the putenv and extern char **environ are POSIX-specific.

Forward declaration

call graph is cyclic
cross more than one translation unit

Macro

`#` macro operator

Prefixing a macro token with # will quote that macro token. This allows you to turn bare words in your source code into text token. This can be particularly useful for writing a macro to convert the member of enum from int into a string.

enum COLOR { RED, GREEN, BLUE };
#define COLOR_STR(x) #x

`##` macro operator

The ## operator takes two separate tokens and pastes them together to form a single identifier. The resulting identifier could be a variable name, or any other identifier.

#define DEFVAR(type, var, val) type var_##var = val

DEFVAR(int, x, 1); /* expand to: int var_x = 1; */
DEFVAR(float, y, 2.718); /* expand to: float var_y = 2.718; */

Expression

Expression-type macro will expand to expression, such as the following macro definition

#define double_v1(x) 2*x

But double_v1 has drawback, call double_v1(1+1)*8 expands to wrong expression 2*1+1*8 .

Use parens to quoted input arguments and final expression:

#define double_v2(x) (2*(x))

Now, it expands to (2*(1+1))*8

But, max macro has side-effect that eval the argument twice

#define max(a, b) ((a) > (b) ? (a) : (b))

when call it with max(a, b++) .

Block

If the macro definition includes ; statatment ending character, we need to block it.

#define incr(a, b)   \
    (a)++;           \
    (b)++;

Call it with

int a=2, b=3;
if (a > b) incr(a, b);

just only b will be incremented. We can block it and convert it to block-type macro.

#define incr(a, b) { \
   (a)++; (b)++;     \
}

But the aboved block macro is not good enough: omit ; is no intitutive and the tailing ; will wrong in some cases, such as

int a = 2, b = 3;
if (a < b)
  incr(a, b); /* tailing ; */
else
  a *= 10;

/* expanded code, and should compile failed */
if (a < b)
  { (a)++; (b)++; };
else
  a *= 10;

do { ... } while (0) resolved those issues.

#define incr(a, b) do { \
   (a)++; (b)++;        \
} while (0) /* no tailing ; */

/* expanded code */
if (a < b)
  do { (a)++; (b)++; } while (0); /* append ; */
else
  a *= 10;

Name clash

We can use same machinism like Lisp’s (gensym) to rebind the input arguments to new symbols.

Nested macro

Macro name within another macro is called Nesting of Macro.

#define SQUARE(x) ((x)*(x))
#define CUBE(x) (SQUARE(x)*(x))

Check expansion

cc -E <source-file>

Pointer

`&` and `*`

The & address of.

The * has two distinct meanings within C in relation to pointers, depending on where it’s used. When used within a variable declaration, the value on the right hand side of the equals side should be a pointer value to an address in memory. When used with an already declared variable, the * will deference the pointer value, following it to the pointer-to place in memory, and allowing the value stored there to be assigned or retrieved.

`sizeof` Pointer

Depends on compiler and machine, all types of pointers on specified machine and compiled via specified compiler has same the size, generally occupy one machine word.

`const` Pointer

Threre is a technique known as the Clockwise/Spiral Rule enables any C programmer to parse in their head any C declaration.

The first const can be either side of the type.

const int * == int const *; /* pointer to const int */
const int * const == int const * const; /* const pointer to const int  */

pointer to const object

int v = 0x11223344;
const int *p = &v;

const pointer to object

int v1=0x11223344;
int *const p1 = &v1;

const pointer to const object

int v1=0x11223344;
const int *const p = &v1;

pointer to pointer to const object
```
const int **p;
    
```
pointer to const pointer to object
```
int *const *p;
    
```
const pointer to pointer to object
```
int* *const p;
    
```
pointer to const pointer to const object
```
const int *const *p;
    
```
const pointer to pointer to const object
```
const int **const p;
    
```
const pointer to const pointer to object
```
int *const *const p;
    
```

Run example:

./configure --has-lang
make clean lang_ptr_const_test

`volatile` Pointer

The volatile is to tell the compiler not to optimize the reference, so that every read or write does not use the value stored in register but does a real memory access.

volatile int v1;
int *p_v1 = &v1; /* bad */
volatile int *p_v1 = &v1; /* better */

`restrict` Pointer

restrict keyword had been introduced after c99
It’s only way for programmer to inform about an optimizations that compiler can make.

function Pointer

return_type_of_fn (*fn)(type_of_arg1 arg1, type_of_arg2 arg2 ...);

void Pointer

The void* is a catch all type for pointers to object types, via void pointer can get some ploymorphic behavior. see qsort in stdlib.h

Dangling Pointer

Pointers that point to invalid addresses are sometimes called dangling pointers.

Pointer decay

Decay refers to the implicit conversion of an expression from an array type to a pointer type. In most contexts, when the compiler sees an array expression it converts the type of the expression from N-element array of T to const pointer to T and set the value of the expression to the address of the first element of the array. The exceptions to this rule are when an array is an operand of either the sizeof or & operators, or the array is a string literal being used as an initializer in a declaration. More importantly the term decay signifies loss of type and dimension.

Pointer aliasing

In computer programming, aliasing refers to the situation where the same memory location can be accessed using different names.

Storage

Storage class in C decides the part of storage to be allocated for a variable, it also determines the scope of a variable. Memory and CPU registers are types of locations where a variable’s value can be stored. There are four storage classes in C those are automatic, register, static, and external.

Each declaration can only have one of five storage class specifier: static, extern, auto, register and typedef.

typedef storage class specifier does not reserve storage and is called a storage class specifier only for syntatic convenience.

The general declaration that use a storage class is show here: <storage-class-specifier> <type> <identifer>

Living example:

./configure --has-lang
make clean lang_storage_test

Automatic storage class

auto storage class specifier denotes that an identifier has automatic duration. This means once the scope in which the identifier was defined ends, the object denoted by the identifier is no longer valid.

Since all objects, not living in global scope or being declared static, have automatic duration by default when defined, this keyword is mostly of historical interest and should not be used. auto can’t apply to parameter declarations. It is the default for variable declared inside a function body, and is in fact a historic leftover from C predecessor’s B.

scope: variable defined with auto storage class specifier are local to the function scope or block scope inside which they are defined.
duration: automatic, till the end of the function scope or block scope where the variable is defined
default initial value: garbage value

Register storage class

Hints to the compiler that access to an object should as fast as possible.Whether the compiler actually uses the hint is implementation-defined; it may simply treat it as equivalent to auto.

The compiler does make sure that you do not take the address of a vairable with the register storage class.

The only property that is definitively different for all objects that are declared with register is that they cannot have their address computed. Thereby register can be a good tool to ensure centain optimizations:

/* error: address of register variable requested */
register int i = 0x10;
int *p = &i;

i that can never alias because no code can pass its address to another function where it might be changed unexpectedly

This property also implies that an array

void decay(char *a);
register char a[] = { 0x11, 0x22, 0x33, 0x44, };
decay(a);

cannot decay into a pointer to its first element (i.e. turning into &a[0]). This means that the elements of such an array cannot be accessed and the array itself cannot be passed to a function.

In fact, the only legal usage of an array declared with a register storage class is the sizeof operator; Any other operator would require the address of the first element of the array. For that reason, arrays generally should not be declared with the register keyword since it makes them useless for anything other than size computation of the entire array, which can be done just as easily without register keyword.

The register storage class is more appropriate for variables that are defined inside a block and are accessed with high frequency.

scope: function scope or block scope
duration: automatic, till the end of function scope or block scope in which the variable is defined
default initial value: garbage value

Static storage class

The static storge class serves different purposes, depending on the location of the declaration in the file. >=C99, used in function parameters to denote an array is expected to have a constant minimum number of elements and a non-null parameter.

scope: file scope (confine the identifier to that translation unit only) or function scope (save data for use with the next call of a function)
duration: static
default initial value: 0

External storage class

extern keyword used to declare an object or function that is defined elsewhere (and that has external linkage). In general, it is used to declare an object or function to be used in a module that is not the one in which the corresponding object or function is defined.

scope: global
duration: static
default initial value: 0

Scope

In C, all identifiers are lexically (or statically) scoped.

The scope of a declaration is the part of the code where the declaration is seen and can be used. Note that this says nothing about whether the object associated to the declaration can be accessed from some other part of the code via another declaration. We uniquely identify an object by its memory: the storage for a variable or the function code.

Finally, note that a declaration in a nested scope can hide a declaration in an outer scope; but only if one of two has no linkage.

Declarations and Definitions

If neither the extern keyword nor an initializer are present, the statement can be either a declaration or a definition. It is up to the compiler to analyse the modules of the program and decide.

All declarations with no linkage are also definitions. Other declarations are definitions if they have an initializer.
A file scope variable declaration without the external linkage storage class specifier or initializer is a tentative definition.
All definitions are declarations but not vice-versa.
A definition of an identifier is a declaration for that identifier that: for an object, causes storage to be reserved for that object.

A declaration specifies the interpretation and attributes of a set of identifiers. A definition of an identifier is a declaration for that identifier that:

for an object, causes storage to be reserved for that object;
for a function, includes the function body;
for an enumeration constant or typedef name, is the only declaration of the identifier.

In the following example we declared a function. Using extern keyword is optional while declaring function. If we don’t write exern keyword while declaring function, it is automatically appended before it.

int add(int, int);

Block scope

Every variable or function declaration that appears inside a block has block scope. The scope goes from the declaration to the end of the innermost block in which the declaration appears. Function parameter declarations in function definitions (but not in prototypes) also have block scope. The scope of a parameter declaration therefore includes the parameter declarations that appears after it.

Function scope

goto <label> is a bit special, which are implicitly declared at the place where they appears, but they are visible throughout the function, even if they appear inside a block.

function prototype scope is the scope for function parameters that appears inside a function prototype. It extends until the end of the prototype. This scope exists to ensure that function parameters have distinct names.

File scope

All vairables and functions defined ouside functions have file scope. They are visible from their declaration until the end of the file. Here, the term file should be understood as the source file being compiled, after all includes have been resolved.

Duration

Indicates whether the object associated to the declaration persists throughout the program’s execution (static) or whether it is allocated dynamically when the declaration’s scope is entered (automatic).

There are two kind of duration:

automatic
static

Within functions at block scope, declarations without extern or static have automatic duration. Any other declaration at file scope has static duration.

Linkage

Linkage describes how identifiers can or can not refer to the same entity throughout the whole program or one single translation unit.

Living example:

./configure --has-lang
make clean lang_linkage_test

Translation unit

A translation unit is the ultimate input to a C compiler from which an object file is generated. In casual usage it is sometimes referred to as a compilation unit. A translation unit roughly consists of a source file after it has been processed by the C preprocessor, meaning that header files listed in #include directives are literally included, sections of code within #ifdef may be included, and macros have been expanded.

No linkage

A declaration with no linkage is associated to an object that is not shared with any other declaration. All declarations with no linkage happen at block scope: all block scope declarations without the extern storage class specifier have no linkage.

Internal linkage

Internal linkage means that the variable must be defined in your translation unit scope, which means it should either be defined in any of the included libraries, or in the same file scope. Within the translation unit, all declarations with internal linkage for the same identifier refer to the same object.

External linkage

External linkage means that the variable could be defined somewhere else outside the file you are working on, which means you can define it inside any other translation unit rather your current one. Within the whole program, all declarations with external linkage for the same identifier refer to the same object.

Size type and Pointer difference types

The C language specification include the typedefs size_t and ptrdiff_t to represent memory-related quantities. Their size is defined according to the target processor’s arithmetic capabilities, not the memory capabilities, such as avaialable address space. Both of these types are defined in the <stddef.h> header.

size_t is an unsigned integeral type used to represent the size of any object in the particular implementation. The sizeof operator yields a value of the type size_t. The maximum size of size_t is provided via SIZE_MAX, a macro constant which is defined in the <stdint.h> header.
ptrdiff_t is a signed integral type used to reprensent the difference between pointers. It is only guranteed to be valid against pointers of the same type.
ssize_t is POSIX standard not C standard.

Literal suffix

l or L for long, such as 123l, 3.14L
f for float, such as 2.718f

struct

A struct is a type consisting of a sequence of members whose storage is allocated in order which the members were defined.

struct optional_name { declaration_list; };
struct name;

Initialization, sizeof and === operator ignore the flexible array member.

Run example

./configure --has-lang
make clean lang_struct_test

Padding

There may be unnamed padding between any two members of a struct or after the last member, but not before the first member. The size of a struct is at least as large as the sum of the sizes of its members.

extern int a[]; /* the type of a is incomplete */
char a[4];      /* the type of a is now complete */

struct node {
  struct node *next; /* struct node is incomplete type at this point */
} /* struct node is now complete at this point */

union

A union is a type consisting of a sequence of members whose storage overlaps.

union optional_name { declaration_list; };
union name;

Type

Basic types

Integer

All C types be represented as binary numbers in memory, the way how to interprete those numbers is what type does.

C provides the four basic arithmetic type specifiers char, int, float and double, and the modifiers signed, unsigned, short and long.

long and long int are identical. So are long long and long long int. In both case, the int is optional.

specifier	type
`long long int`	`long long int`
`long long`	`long long int`
`long`	`long int`

Incomplete type

An incomplete type is an object type that lacks sufficent information to determine the size of the object of that object, and an incomplete type may be completed at some point in the translation unit.

void cannot be completed.
[] array type of unknown size, it can be completed by a later declaration that specifies the size.

typedef

typedef type_specifier declarator;
typedef type_specifier declarator1, *declarator2, (*declarator3)(void);

The typedef used to create an alias name for another types. As such, it is often used to simplify the syntax of declaring complex data structure consisting of struct and union types, but is just as common in providing specific descriptive type names for integer types of varying lengths. The C standard library and POSIX reserve the suffix _t, for example as in size_t and time_t.

#define is a C directive which is also used to define the aliases for various data types similar to typedef but with the following differences:

typedef is limited to givien symbolic names to types only where as #define can be used to define alias for values as well.
typedef interpretation is performed by the compiler whereas #define statements are processed by the preprocessor.

Using typedef to hide struct is considered a bad idea in Linux kernel coding style

Run typedef example

./configure --has-lang
make clean lang_typedef_test

typeof

typeof operator is not C standard.

Run typeof example

./configure --has-lang
make clean lang_typeof_test

cdecl

A declaration can have exactly one basic type. The basic types are argumented with derived types, can C has three of them:

function [(decl-list)] returning: ()
array [number] of: []
[const | volatile | restrict] pointer to: ***

The array of [] and function returning () type operators have higher precedence than pointer to *.

alloc

malloc

Don’t cast the result of malloc. It is unneccessary, as void * is automatically and safely prompted to any other pointer type in this case. It adds clutter to the code, casts are not very easy to read (especially if the pointer type is long). It makes you repeat yourself, which is generally bad. It can hide an error, if you forgot to include <stdlib.h>. This can crashes (or, worse, not cause a crash until way later in some totally different part of the code). Consider what happens if pointers and integers are differently sized; then you’re hiding a warning by casting and might lose bits of your returned address. Note: as of C11 implicit functions are gone from C, and this point is no longer relevant since there’s no automatic assumption that undeclared functions return int.

To add further, your code needlessly repeats the type information (int) which can cause errors. It’s better to dereference the pointer being used to store the return value, to lock the two together: int*x = malloc(length * sizeof *x); This also moves the lengh to theront for increased visibility, and drops the redundant parentheses with sizeof(); they are only needed when the argument is a type name. Many people seem to not know or ignore this, which makes their code more verbose. Remember: sizeof is not a function!

While moving length to the front may increase visibility in some rare cases, one should also pay attention that in the general case, it should be better to write the expression as: int *x = malloc*x * length); Compare with malloc(sizeof *x * length * width) vs. malloc(length * width * sizeof *x) the second may overflow the length * width when length and width are smaller types than size_t.

calloc

calloc should zero intializes the allocated memory. Call calloc is not necessarily more expensive.

realloc

libc

The C standard library is a standardized collection of header files and library routines used to implement common operations.

std

There has an good answer of What is the difference between C, C99, ANSI C and GNU C:

Everything before standardization is generally called “K&R C”, after the famous book, with Dennis Ritchie, the inventor of the C language, as one of the authors. This was “the C language” from 1972-1989.
The first C standard was released 1989 nationally in USA, by their national standard institute ANSI. This release is called C89 or ANSI-C. From 1989-1990 this was “the C language”.
The year after, the American standard was accepted internationally and published by ISO (ISO 9899:1990). This release is called C90. Technically, it is the same standard as C89/ANSI-C. Formally, it replaced C89/ANSI-C, making them obsolete. From 1990-1999, C90 was “the C language”.
Please note that since 1989, ANSI haven’t had anything to do with the C language. Programmers still speaking about “ANSI C” generally haven’t got a clue about what it means. ISO “owns” the C language, through the standard ISO 9899.
In 1999, the C standard was revised, lots of things changed (ISO 9899:1999). This version of the standard is called C99. From 1999-2011, this was “the C language”. Most C compilers still follow this version.
In 2011, the C standard was again changed (ISO 9899:2011). This version is called C11. It is currently the definition of “the C language”.

headers

name	std	intro
assert.h	C90	conditionally compiled macro that compare its argument to zero
ctype.h	C90	functions to determine the type contained in character data
errno.h	C90	macros reporting error conditions
float.h	C90	limits of float types
limits.h	C90	sizes of basic types
locale.h	C90	localization utilities
math.h	C90	common mathematics functions
setjmp	C90	nonlocal jumps
signal.h	C90	signal handling
stdarg.h	C90	variable arguments
stddef.h	C90	common macro definitions
stdio.h	C90	input/output
stdlib.h	C90	general utilities: memory, program, string, random, algorithms
string.h	C90	string handling
time.h	C90	time/date utilites
iso646.h	C95	alternative operator spellings
wchar.h	C95	extended multibyte and wide character
wctype.h	C95	functions to determine the type contained in wide character utilities
complex.h	C99	complex number arithmetic
fenv.h	C99	floating-point environment
inttypes.h	C99	format conversion of integer types
stdbool.h	C99	macros for boolean types
stdint.h	C99	Fixed-width integer types
tgmath.h	C99	type-generic math
stdalign.h	C11	alignas and alignof convenience macros
stdatomic.h	C11	atomic types
stdnoreturn.h	C11	noreturn convenience macros
threads.h	C11	thread library
uchar.h	C11	UTF-16/32 character utilities

References

Compiler

flex

References

flex In A Nutshell

x86

While memory stores the program and data, the Central Processing Unit does all the work. The CPU has two parts: registers and Arithmetic Logic Unit(ALU). The ALU performs the actual computations such as addtion and multiplication along with comparison and other logical operations.

Load

Load instructions read bytes into register. The source may be a constant value, another register, or a location in memory.

;; load the constant 23 into register 4
R4 = 23

;; copy the contents of register 2 into register 3
R3 = R2

;; load char (one byte) starting at memory address 244 into register 6
R6 = .1 M[244]

;; load R5 with the word whose memory address is in R1
R5 = M[R1]

;; load the word that begins 8 bytes after the address in R1.
;; this is known as constant offset mode and is about the fanciest
;; addressing mode a RISC processor will support
R4 = M[R1+8]

Store

Store instructions are basically the reverse of load instructions: they move values from registers back out to memory.

;; store the constant number 37 into the word beginning at 400
M[400] = 37

;; store the value in R6 into the word whose address is in R1
M[R1] = R6

;; store lower half-word from R2 into 2 bytes starting at address 1024
M[1024] = .2 R2

;; store R7 into the word whose address is 12 more than the address in R1
M[R1+12] = R7

ALU

;; add 6 to R3 and store the result in R1
R1 = 6 + R3

;; subtract R3 from R2 and store the result in R1
R1 = R2 - R3

Branching

By default, the CPU fetches and executes instructions from memory in order, working from low memory to high. Branch instructions alter this order. Branch instructions test a condition and possibly change which instruction should be executed next by changing the value of the PC register. The operands in the test of a branch statement must be in registers or constant values. Branches are used to implement control structures like if as well as loops like for and while.

;; begin executing at address 344 if R1 equals 0
BEQ R1, 0, 344

;; begin executing at address 8 past current instruction if R2 less than R3
BLT R2, R3, PC+8

;; The full set of branch variants:
BLT ... ;; branch if first argument is less than second
BLE ... ;; less than or equal
BGT ... ;; greater than
BGE ... ;; greater than or equal
BEQ ... ;; equal
BNE ... ;; not equal

;; unconditional jump that has no test, but just immediately
;; diverts execution to new address
;; begin executing at address 2000 unconditionally: like a goto
JMP 2000

;; begin executing at address 12 before current instruction
JMP PC-12

Type Convertion

The types char, short, int, and long are all in the same family, and use the same binary polynomial representation. C allows you to freely assign between these types.

broaden: When assigning from a smaller-sized type to a larger, there is no problem. All of the source bytes are copied and the remaining upper bytes in the destination are filled using what is called sign extension – the sign bit is extended across the extra bytes.
narrow: Only copy the lower bytes and ignores the upper bytes.

Remember a floating point 1.0 has a completely different arrangement of bits than the integer 1 and instruction are required to do those conversions.

;; take bits in R2 that represent integer, convert to float, store in R1
R1 = ItoF R2

;; take bits in R4, convert from float to int, and store back in same Note
;; that converting in this direction loses information, the fractional
;; component is truncated and lost
R4 = FtoI R3

Typecast

A typecast is a compile-time entity that instructs the compiler to treat an expression differently than its declared type when generating code for that expression.

casting a pointer from one type to another could change the offset was multiplied for pointer arithmetic or how many bytes were copied on a pointer dereference.
some typecasts are actually type conversions. A type conversion is required when the data needs to be converted from one representation to another, such as when changing an integer to floating point representation or vice versa.
most often, a cast does affect the generated code, since the compiler will be treating the expression as a different type.

int i;
((struct binky *)i)->b = 'A';

What does this code actually do at runtime? Why would your ever want to do such a thing? The typecast is one of the reasons C is a fundamentatlly unsafe launguage.

Data Sizes

16-bits	Size (bytes)	Size (bits)
Word	2	16
Doubleword	4	32
Quadword	8	64
Paragraph	16	128
Kilobyte	1024	8192
Megabyte	1,048,576	8388608

In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word is an important characteristic of any specific processor design or computer architecture.

Registers

rsp

rbp

callq

pushq <address-of-after-callq>

retq

jmp <address-of-$rsp>

cmp

cmp dst src perfomans a substraction but does not store result. Such as sub dst src.

cmp dst, src	CF	PF	AF	ZF	SF	OF
unsigned src < unsigned dst	1
parity of LSB is even		1
carry in the low nibble of (src-dst)			1
0, (i.e src == dst)				1
if MSB of (src-dst) == 1					1
sign bit of src != sign bit of (src-dst)						1

jmp

Jump	Description	signed-ness	Flags
je	jump if equal		ZF = 1
jg	jump if greater	signed	ZF = 0 and SF = OF
jge	jump if greater or equal	signed	SF = OF
jl	jump if less	signed	SF != OF
jle	jump if less or equal	signed	ZF = 1 or SF != OF

rflags

RFLAGS Register

Bit(s)	Label	Description
0	CF	Carry Flag
1	1	Reserved
2	PF	Parity Flag, set if LSB contains 1 is even bits
3	0	Reserved
4	AF	Auxiliary Carry Flag
5	0	Reserved
6	ZF	Zero Flag, set if result is zero
7	SF	Sign Flag, set MSB of result
8	TF	Trap Flag
9	IF	Interrupt Enable Flag
10	DF	Direction Flag
11	OF	Overflow Flag
12-13	IOPL	I/O Privilege Level
14	NT	Nested Task
15	0	Reserved
16	RF	Resume Flag
17	VM	Virtual-8086 Mode
18	AC	Alignment Check / Access Control
19	VIF	Virtual Interrupt Flag
20	VIP	Virtual Interrupt Pending
21	ID	ID Flag
22-63	0	Reserved

Addressing

References

Memory

Run the examples under src/memory.

./configure --has-memory
make clean test

Bits and Bytes

Bits

The smallest unit of memory is the bit. A bit can be in one of two states: on vs. off, or alternately, 1 vs. 0.

Most computers don’t work with bits individually, but instead group eight bits together to form a byte. Eash byte maintains one eight-bit pattern. A group of N bits can be arranged in 2^N different patterns.

Strictly speaking, a program can interpret a bit pattern any way it chooses.

Bytes

The byte is sometimes defined as the smallest addressable unit of memory. Most computers also support reading and writting larger units of memory: 2 bytes half-words (sometimes known as a short word) and 4 byte word.

Most computers restrict half-word and word accesses to be aligned: a half-word must start at an even address and a word must start at an address that is a multiple of 4.

Shift

Logical shift always fill discarded bits with 0s while arithmetic shift fills it with 0s only for left shift, but for right shift it copies the Most Significant Bit thereby preserving the sign of the operand.

Left shift on unsigned integers, x << y

shift bit-vector x by y positions
throw away extra bits on left
fill with 0s on right

Right shift on unsigned integers, x >> y

shift bit-vector x right by y positions
throw away extra bits on right
fill with 0s on left

Left shift, x << y

equivalent to multiplying by 2^y
if resulting value fits, no 1s are lost

Right shift, x >> y

logical shift for unsigned values, fill with 0s on left
arithmetic shift for signed values
- replicate most significant bit on left
- maintains sign of x
equivalent to floor(2^y)
- correct rounding towards 0 requires some care with signed numbers.
- (unsigned)x >> y | ~(~0u >> y)

Basic Types

Character

The ASCII code defines 128 characters and a mapping of those characters onto the numbers 0..127. The letter ‘A’ is assigned 65 in the ASCII table. Expressed in binary, that’s 2^6 + 2^0 (64 + 1). All standard ASCII characters have zero in the uppermost bit (the most significant bit) since they only span the range 0..127.

Short Integer

2 bytes or 16 bits. 16 bits provide 2^16 = 65536 patterns. This number is known as 64k, where 1k of something is 2^10 = 1024. For non-negative numbers these patterns map to the numbers 0..65535. Systems that are big-endian store the most-significant byte at the lower address. A litter-endian (Intel x86) system arranges the bytes in the opposite order. This means when exchanging data through files or over a network between different endian machines, there is often a substantial amount of byte-swapping required to rearrange the data.

Long Integer

4 bytes or 32 bits. 32 bits provide 2^32 = 4294967296 patterns. 4 bytes is the contemporary default size for an integer. Also known as a word.

Fixed-point

Floating-point

4,8, or 16 bytes. Almost all computers use the standard IEEE-754 representation for floating point numbers that is a system much more complex than the scheme for integers. The important thing to note is that the bit pattern for the floating point number 1.0 is not the same as the pattern for integer 1. IEEE floats are in a form of scientific notation. A 4-byte float uses 23 bits for the mantissa, 8 bits for the exponent, and 1 bit for the sign. Some processors have a special hardware Floating Point Unit, FPU, that substantially speeds up floating point operations. With separate integer and floating point processing units, it is often possible that an integer and a floating point computation can proceed in parallel to an extent. The exponent field contains 127 plus the true exponent for sigle-precision, or 1023 plus the true exponent for double precision. The first bit of the mantissa is typically assumed to be 1._f_, where f is the field of fraction bits.

		(base 2 + 1023)
	sign	exponent	mantissa
		(base 2 + 127)	(base 2, 1/2, 1/4…)
signle precision	1 [31]	8 [30-23]	23 [22-00]
double precision	1 [63]	11 [62-52]	52 [51-00]

References

Record

The size of a record is equal to at least the sum of the size of its component fields. The record is laid out by allocating the components sequentially in a contiguous block, working from low memory to high. Sometimes a compiler will add invisible pad fields in a record to comply with processor alignment rectrictions.

Array

The size of an array is at least equal to the size of each element multiplied by the number of components. The elements in the array are laid out consecutively starting with the first element and working from low memory to high. Given the base address of the array, the compiler can generate constant-time code to figure the address of any element. As with records, there may be pad bytes added to the size of each element to comply with alignment retrictions.

Pointer

A pointer is an address. The size of the pointer depends on the range of addresses on the machine. Currently almost all machines use 4 bytes to store an address, creating a 4GB addressable range. There is actually very little distinction between a pointer and a 4 byte unsigned integer. They both just store integers– the difference is in whether the number is interpreted as a number or as an address.

Instruction

Machine instructions themselves are also encoded using bit patterns, most often using the same 4-byte native word size. The different bits in the instruction encoding indicate things such as what type of instruction it is (load, store, multiply, etc) and registers involved.

Pointer Basics

Pointers and Pointees

We use the term pointee for the thing that the pointer points to, and we stick to the basic properties of the pointer/pointee relationship which are true in all languages.

Allocating a pointer and allocating a pointee for it to point to are two separate steps. You can think of the pointer/pointee structure are operating at two levles. Both the levels must be setup for things to work.

Dereferencing

The dereference operation starts at the pointer and follows its arrow over to access its pointee. The goal may be to look at the pointee state or to change the state.

The dereference operation on a pointer only works if the pointer has a pointee: the pointee must be allocated and the pointer must be set to point to it.

Pointer Assignment

Pointer assignment between two pointers makes them point to the same pointee. Pointer assignment does not touch the pointees. It just changes one pointer to have the same refrence as another pointer. After pointer assignment, the two pointers are said to be sharing the pointee.

C Array

A C array is formed by laying out all the elements contiguously in memory from low to high. The array as a whole is referred to by the address of the first element.

The programmer can refer to elements in the array with the simple [] syntax such as intArray[1]. This scheme works by combing the base address of the array with the simple arithmetic. Each element takes up a fixed number of bytes known at compile-time. So address of the nth element in the array (0-based indexing) will be at an offset of (n * element_size) bytes from the base address of the whole array.

[] Operator

The square bracket syntax [] deals with this address arithmetic for you, but it’s useful to know what it’s doing. The [] multiplies the integer index by the element size, adds the resulting offset to the array base address, and finally deferences the resulting pointer to get to the desired element.

a[3] == *(a + 3);
a+3 == &a[3];

a[b] == b[a];

The C standard defines the [] operator as follows: a[b] => *(a+b), and b[a] => *(b+a) => *(a+b), so a[b] = b[a]=.

In a closely related piece of syntax, adding an integer to a pointer does the same offset computation, but leaves the result as a pointer. The square bracket syntax dereferences that pointer to access the nth element while the + syntax just computes the pointer to the nth element.

Any [] expression can be written with the + syntax instead. We just need to add in the pointer dereference. For most purposes, it’s easiest and most readable to use the [] syntax. Every once in a while the + is convenient if you needed a pointer to the element instread of the element itself.

Pointer++

If p is a pointer to an element in an array, then (p+1) points to the next element in the array. Code can exploit this using the construct p++ to step a pointer over the elements in an array. It doesn’t help readability any.

Pointer Type Effects

Both [] and ++ implicitly use the compile time type of the pointer to compute the element size which effects the offset arithmetic.

	int *p;
	p = p + 12; /* p + (12 * sizeof(int)) */

	p = (int*) ((char*)p + 12); /* add 12 sizeof(char) */

Each int takes 4 bytes, so at runtime the code will effectively increment the address in p by 48. The compiler figures all this out based on the type of the pointer.

Arithmetic on a void pointer

What is sizeof(void)? Unknown! Some compilers assume that it should be treat it like a (char*), but if you were to depend on this you would be creating non-portable code.

Note that you do not need to cast the result back to (void*), a (void*) is the universal recipient of pinter type and can be freely assigned any type of pointer.

Arrays and Pointers

One effect of the C array scheme is that the compiler does not meaningfully distinguish between arrays and pointers.

Array Names are const

One subtle distinction between an array and a pointer, is that the pointer which represents the base address of an array cannot be changed in the code. Technically, the array base address is a const pointer. The constraint applies to the name of the array where it is declared in the code.

Dynamic Arrays

Since arrays are just contiguous areas of bytes, you can allocate your own arrays in the heap using malloc. And you can change the size of the malloc=ed array at will at run time using =realloc.

Passing multidimensional arrays to a function

Iteration

Row-major order, so load a[0][0] would potentially load a[0][1], but load a[1][0] would generate a second cache fault.

Stack Implementation

Writing a generic container in pure C is hard, and it’s hard for two reasons:

The language doesn’t offer any real support for encapsulation or information hiding. That means that the data structures expose information about internal representation right there in the interface file for everyone to see and manipulate. The best we can do is document that the data structure should be treated as an abstract data type, and the client shouldn’t directly manage the fields. Instead, he should just rely on the fuctions provided to manage the internals for him.

C doesn’t allow data types to be passed as parameters. That means a generic container needs to manually manage memory in terms of the client element size, not client data type. This translates to a bunch of malloc, realloc, free, memcpy, and memmove calls involving void*.

Endian

Endianness refers to the sequential order used to numerically interpret a range of bytes in computer memory as larger, composed word value. It also describes the order of byte transmission over a **digital link**.

However, if you have a 32-bit register storing a 32-bit value, it makes no to talk about endianness. The righmost bit is the least significant bit, and the leftmost bit is the most significant bit.

Big Endian

Little Endian

The little-endian system has the property that the same value can be read from memory at different lengths without using different addresses. For example, a 32-bit memory location with content 4A 00 00 00 can be read at the same address as either 8-bit (value = 4A), 16-bit (004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value.

Bit Swapping

Some CPU instruction sets provide native support for endian swapping, such as bswap (x86 and later), and rev (ARMv6 and later).

Unicode text can optionally start with a byte order mark (BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In UTF-32 for example, a big-endian file should start with 00 00 FE FF; a little endian should start with FF FE 00 00.

Endianness doesn’t apply to everything. If you do bitwise or bitshift operations on an int you don’t notice the endianness.

TCP/IP are defined to be big-endian. The multi-byte integer representation used by the TCP/IP protocols is sometimes called network byte order.

In <arpa/inet.h>:

htons() reorder the bytes of a 16-bit unsigned value from processor order to network order, the macro name can be read as “host to network short.”
htonl() reorder the bytes of a 32-bit unsigned value from processor order to network order, the macro name can be read as “host to network long.”
ntohs() reorder the bytes of a 16-bit unsigned value from network order to processor order, the macro name can be read as “network to host short.”
ntohl() reorder the bytes of a 32-bit unsigned value from network order to processor order. The macro name can be read as “network to host long

Tools

hexdump on Unix-like system

Memory Model

The only thing that C must care about is the type of the object which a pointer addresses. Each pointer type is derived from another type, its base type, and each such derived type is a distinct new type.

Memory Copy

References

CPU

cpuid

Cache

Check cache line

Linux

ll /sys/devices/system/cpu/cpu0/cache/
cat /sys/devices/system/cpu/cpu0/cache/cherency_line_size

Windows

wmic cpu list
wmic cpu get
wmic cpu get L2CacheSize, L2CacheSpeed

References

Timing

time ls /tmp
# ...
# ls -G /tmp  0.00s user 0.00s system 73% cpu 0.003 total

real refers to actual elapsed time, user and sys refer to CPU time used only by the process.

real is wall clock time.
user is the amount of CPU time spent in user-mode code within the process.
sys is the amount of CPU time spent in the kernel within the process.

user+sys is the actual all CPU time the process used.

POSIX

Library

Static Library

Shared Library

Library References

ELF

References

OS

References

Flex & Bison

The asteriod to kill this dinosaur is still in orbit. – Lex Manual Page

References

The Lex & Yacc Page

Unicode

References

IO

Stream

Streams are a portable way of reading and writing data. They provide a flexible and efficient means of I/O.

A Stream is a file or a physical device (e.g. printer or monitor) which is manipulated with a pointer to the stream.

There exists an internal C data structure, FILE, which represents all streams and is defined in stdio.h.

Stream I/O is buffered: That is to say a fixed chunk is read from or written to a file via some temporary storage area (the buffer).

Predefined streams

There are stdin, stdout, and stderr predefined streams.

Redirection

>: redirect stdout to a file;
<: redirect stdin from a file to a program;
|: puts stdout from one program to stdin of another.

Buffered vs. Unbuffered

All stdio.h functions for reading from FILE may exhibit either buffered or unbuffered behavior, and either echoing or non-echoing behavior.

The standard library function setvbuf can be used to enable or disable buffering of IO by the C library. There are three possible modes: block buffered, line_buffered, and unbuffered.

Buffered

Buffered output streams will accumulate write result into immediate buffer, sending it to the OS file system only when enough data has accumulated (or flush() is requested).

C RTL buffers, OS buffers, Disk buffers.

The function fflush() forces a write of all buffered data for the given output or update stream via the stream’s underlying write function. The open status of the steam is unaffected.

The function fpurge() erases any input or output buffered in the given steam. For output streams this discards any unwritten output. For input streams this discards any input read from the underlying object but not yet obtained via getc(); this includes any text pushed back via ungetc()

Unbuffered

Unbuffered output has nothing to do with ensuring your data reaches the disk, that functionality is provided by flush(), and works on both buffered and unbuffered steams. Unbuffered IO writes don’t gurantee the data has reached the physical disk.

close() will call flush().

The open system call is used for opening an unbuffered file.

ASCII vs. Binary

ASCII

Terminals, keyboards, and printers deal with character data. When you want to write a number like 1234 to the screen, it must be converted to four characters {'1', '2', '3', '4'} and written. Similarly, when you read a number from the keyboard, the data must be converted from characters to integers. This is done by the sscanf routine.

Binary

Binary files require no conversion. They also generally take up less space than ASCII files. The drawback is that they cannot be directly printed on a terminal or printer.

References

Network

DNS

simple.c using getaddrinfo() API call to query name.

query.c using domain name protocol to query name directly without -lresolv library.

TIL

getaddrinfo() is a POSIX.1g extension and is not available in pure C99,

on Linux, so We need -D_GNU_SOURCE if -std=c99 be specified (see c99 does not define getaddrinfo).

socklen_t represents the size of an address structure, see Linus Torvalds talk about socklen_t.

HTTP

References

Parallel

OpenMP

References

Pthread

References

POSIX Threads Programming

Algorithm

Hash

Algorithm References

Hash Functions

Regex

In POSIX-Extended regular expressions, all characters match themselves except for the following special characters: .[{}()\*+?|^$

WebAssembly

Run example in browser:

// directly call, shorten version
Module._sum(10, 0);
// ccall
Module.ccall('sum', 'number', ['number', 'number'], [10, 0]);

Tools

Display Dependents of Executable

OS	name	command line
MacOS	otool	otool -L <bin>
Linux	objdump	objdump -p <bin>
	ldd	ldd <bin>
Windows	dumpbin	dumpbin -dependents <bin>

Read ELF Format

readelf displays information about one or more ELF format object files.

This readelf program performs a similar function to objdump but it goes into more detail and it exists independently of the BFD library, so if there is a bug in BFD then readelf will not be affected.

On Darwin, there are no readelf, but we can use otool do the trick.

OS	name	command line
MacOS	otool	otool -l <bin>
Linux	reaelf	readelf <bin>
Windows

Metainformation about Libraries

pkg-config

Display Symbol Table

On Unix-like platform, there are nm program can view the symbol table in a executable.

OS	name	command line
MacOS	nm	nm <bin>
		nm -m <bin>
Linux	nm	nm <bin>

Remove symbols

OS	name	command line
MacOS	strip	nm <bin>
Linux	strip	nm <bin>

Disassembly

OS	name	command line
MacOS	otool	otool -tV <bin>
Linux	objdump	objdump -d <bin>

Hex Dump

OS	name	command line
MacOS	hexdump	hexdump <file>
Linux	hexdump	hexdump <file>
Window
Emacs	hexl-mode

Trace System Call

OS	name	command line
MacOS	dtruss	dtruss <bin>
Linux	strace	strace -o <out-file> -C <bin>

Kernel Trace

MacOSX: ktrace

Memory Leak Detection

`valgrind`

`sanitize`

References

Clang: AddressSanitizer

Debugger

Environment

example	command
set working directory	(lldb) platform settings -w <pwd>
	(gdb) cd <pwd>

list env vars	(lldb) `env`
	(lldb) `settings show target.env-vars`
	(gdb) `show env`

set env var	(lldb) `env XXX=zzz`
	(lldb) `settings set target.env-vars XXX=aa YYY=bb`
	(gdb) `set env XXX=zzz`

unset env var	(lldb) `settings remove target.env-vars XXX`
	(gdb) `unset env XXX`

set argv for main entry	(lldb) `r arg1 arg2 arg3`
	(lldb) `settings set target.run-args arg1 arg2`
	(gdb) `r arg1 arg2 arg3`
	(gdb) `set args arg1 arg2`
	0:000> `.kill;` `.create <target> arg1 arg2`
	0:000> `.exepath+ <path>`

Process

example	command
run process	(lldb) process launch
	(gdb) r
	0:000> g

attach process with pid	(lldb) `process attach --pid 123`
	(gdb) `attach 123`

attach process with name	(lldb) `process attach --name a.out`
	(lldb) `attach a.out`

wait for process	(lldb) `process attach --name a.out --wait-for`
	(gdb) `attach -waitfor a.out`

Image

example	command
list dependents of executable	(lldb) `image list`
	(gdb) `info sharedlibrary`
	0:000> `lm`

lookup main entry address in the executable	(lldb) `image lookup -a main -v`
	(gdb) `info symbol main`

lookup fn or symbol by regexp	(lldb) =image lookup -r -n’[fsv]printf’=

lookup type	(lldb) =image lookup -t’FILE’=

add moudle	(lldb) `image add /opt/local/lib/libgeo.dyld`
	0:000> `.reload -f -i libcffix.dll`

unload module	(lldb) ==
	0:000> `.reload -u libcffix.dll`

Breakpoint

example	command
list breakpoint	(lldb) `b`
	(lldb) `breakpoint list`
	(gdb) `info break`
	0:000> `bl`

breakpoint at fn	(lldb) `b main`
	(lldb) `b -nmain`
	(gdb) `b main`
	0:000> `bu <module>!main`

breakpoint at line	(lldb) `b -ftest.c -l32`
	(gdb) `b test.c:32`

breakpoint at fn by regexp	(lldb) `b -rm[a-z]in`

breakpoint at source by regexp	(lldb) `b -p'm[a-z]in' -ftest.c`

conditional breakpoint	(lldb) `breakpoint set -fvar.c -l23 -c'2 =` argc’=

delete breakpoint	(lldb) `breakpoint delete 1.1`
	(lldb) `breakpoint delete 2`
	0:000> `bc 1 2`

Memory

example	command
print argv in /main entry	(lldb) p -Z`argc` -- argv
	0:000 ==
	(gdb) `p -- argv[0]@argc`

examine argv in main entry	(lldb) x -t'char*' -c`argc` argv
	0:000> `dp @@(argv)`
	(gdb) ==

examine array of char* of /argv	(lldb) x -s`sizeof(char*)` -c`argc` -fx argv

exmaine &argc in main entry	(lldb) x -s`sizeof(int)` -fx -c1 &argc
	(gdb) `x/1xw &argc`

memory read	(lldb) memory read -o/tmp/x.out -s1 -fu -c10 – &argv[0]

~*** Frame

example	command
check stack frame	(lldb) `frame info`
	0:000> `k`
list frame variable	(lldb) `frame variable`
	0:000> `dv`

Evaluate

example	command
evaluate argc in main entry	(lldb) `e -- argc`
	(lldb) `e -fx -- argc`
	0:000> `?? argc`
	0:000> `.formats poi(argc)`

Disassemble

example	command
disassemble	0:000> `u`
disassemble function	0:000> `uf main`

disassemble	(lldb) `d`
disassemble function	(lldb) `d -nmain`
disassemble favor	(lldb) `d -Fatt`

disassemble	(gdb) `disassemble`

Step

example	command
quit	(lldb) `q`
	(gdb) `q`
	0:000> -q=
continue	(lldb) `c`
	0:000> `g`
step over	(lldb) `n`
	0:000> `p`
step into	(lldb) `s`
	(gcc) `s`
	0:000> `t`

Thread

example	command
list threads	0:000> `~`

Tools References

CPU Features

Linux:

lscpu

Darwin:

sysctl -a | grep machdep.cpu.features

Name		Name	Last commit message	Last commit date
Latest commit History 4,175 Commits
.github/workflows		.github/workflows
src		src
.c.tags		.c.tags
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.org		README.org
test.sh		test.sh

License

junjiemars/c

Folders and files

Latest commit

History

Repository files navigation

C Lessons

Quick start

Language

Preprocessor

#ident

#include

#define

#undef

#if vs. #ifdef

#ifndef

#error

#pragma

__FILE__

main

exit

assert

enum

Error

errno

strerror

perror

Function

main

Forward declaration

Macro

# macro operator

## macro operator

Expression

Block

Name clash

Nested macro

Check expansion

Pointer

& and *

sizeof Pointer

const Pointer

volatile Pointer

restrict Pointer

function Pointer

Dangling Pointer

Pointer decay

Pointer aliasing

Storage

Automatic storage class

Register storage class

Static storage class

External storage class

Scope

Declarations and Definitions

Block scope

Function scope

File scope

Duration

Linkage

Translation unit

No linkage

Internal linkage

External linkage

Size type and Pointer difference types

Literal suffix

struct

Padding

union

Type

Basic types

Integer

Incomplete type

typedef

typeof

cdecl

alloc

malloc

calloc

realloc

`#ident`

`#include`

`#define`

`#undef`

`#if` vs. `#ifdef`

`#ifndef`

`#error`

`#pragma`

`FILE`

`#` macro operator

`##` macro operator

`&` and `*`

`sizeof` Pointer

`const` Pointer

`volatile` Pointer

`restrict` Pointer