void *s and function pointers.char - think “one byte”. Same as
signed char.short int - think “two bytes”. Same as
short.int - think “four bytes”.long int - think “potentially larger
int”. Most commonly known solely as
long.float, double - floating point
values (not used in the class).void - the lack of a type. Variables
cannot have this type./* `void` is fine to denote "no variables/return values", ... */
int
main(void)
{
/* ...but not fine on variables */
void a;
return 0;
}Program output:
inline_exec_tmp.c: In function main:
inline_exec_tmp.c:6:14: error: variable or field a declared void
6 | void a;
| ^
make[1]: *** [Makefile:33: inline_exec_tmp] Error 1
enum - an int or
short int with some “named” values.#include <stdio.h>
enum weekdays {
MON, TUES, WED, THURS, FRI
};
int
main(void)
{
printf("MON = %d, TUES = %d, ..., FRI = %d\n", MON, TUES, FRI);
return 0;
}Program output:
gcc -Wall -Wextra -I. -c -o prog.o prog.c
gcc -Wall -Wextra -fpic -I. -c -o example.o example.c
gcc -Wall -Wextra -fpic -I. -c -o example2.o example2.c
gcc -Wall -Wextra -fpic -I. -c -o example2b.o example2b.c
gcc -o prog_naive prog.o example.o example2.o example2b.o
gcc -Wall -Wextra -I. -c -o prog2.o prog2.c
gcc -o prog2_naive prog2.o example.o example2.o example2b.o
ar -crs libexample.a example.o
ar -crs libexample2.a example2.o example2b.o
gcc -o prog_static prog.o -L. -lexample -lexample2
gcc -o prog2_static prog2.o -L. -lexample -lexample2
gcc -shared -o libexample.so example.o
gcc -shared -o libexample2.so example2.o example2b.o
gcc -o prog_dynamic prog.o -L. -lexample -lexample2
gcc -o prog2_dynamic prog2.o -L. -lexample -lexample2
gcc -Wall -Wextra -fpic -I. -c -o main.o main.c
main.c: In function main:
main.c:11:17: warning: ignoring return value of malloc declared with attribute warn_unused_result [-Wunused-result]
11 | malloc(1024);
| ^~~~~~~~~~~~
gcc -o main.bin main.o
gcc -Wall -Wextra -fpic -I. -c -o ml.o ml.c
gcc -shared -o libmalloclog.so ml.o -ldl
rm main.o
MON = 0, TUES = 1, ..., FRI = 4
You can choose the values explicitly
(e.g. enum grades { MON = 1, TUES = 2, WED = 3, THURS = 4, FRI = 5};).
Modifiers
unsigned - variables that cannot be
negative. Given that variables have a fixed bit-width, they
can use the extra bit (“negative” no longer needs to be
tracked) to instead represent numbers twice the size of
signed variants.signed - signed variables. You don’t see
this modifier as much because char,
int, long all default to
signed.long - Used to modify another type to make
it larger in some cases. long int can represent
larger numbers and is synonymous with long.
long long int (or long long) is an
even larger value!static - this variable should not be
accessible outside of the .c file in which it is
defined.const - an immutable value. We won’t focus
much on this modifier.volatile - this variable should be “read
from memory” every time it is accessed. Confusing now,
relevant later, but not a focus.int
main(void)
{
char a;
signed char a_signed; /* same type as `a` */
char b; /* values between [-128, 127] */
unsigned char b_unsigned; /* values between [0, 256] */
int c;
short int c_shortint;
short c_short; /* same as `c_shortint` */
long int c_longint;
long c_long; /* same type as `c_longint` */
return 0;
}Program output:
You might see all of these, but the common primitives, and their sizes:
#include <stdio.h>
/* Commonly used types that you practically see in a lot of C */
int
main(void)
{
char c;
unsigned char uc;
short s;
unsigned short us;
int i;
unsigned int ui;
long l;
unsigned long ul;
printf("char:\t%ld\nshort:\t%ld\nint:\t%ld\nlong:\t%ld\n",
sizeof(c), sizeof(s), sizeof(i), sizeof(l));
return 0;
}Program output:
char: 1
short: 2
int: 4
long: 8
stddef.h1:
size_t, usize_t,
ssize_t - types for variables that correspond
to sizes. These include the size of the memory request to
malloc, the return value from
sizeof, and the arguments and return values
from read/write/…
ssize_t is signed (allows negative values),
while the others are unsigned.NULL - is just
#define NULL ((void *)0)limits.h2:
INT_MAX, INT_MIN,
UINT_MAX - maximum and minimum values for a
signed integer, and the maximum value for an unsigned
integer.LONG_MAX, LONG_MIN,
ULONG_MAX - minimum and maximum numerical
values for longs and
unsigned longs.short ints (SHRT_MAX,
etc…) and chars (CHAR_MAX,
etc…).Many standard library calls take “format strings”. You’ve
seen these in printf. The following format
specifiers should be used:
%d - int%ld - long int%u - unsigned int%c - char%x - unsigned int printed as
hexadecimal%lx - long unsigned int
printed as hexadecimal%p - prints out any pointer value,
void *%s - prints out a string,
char *Format strings are also used in scanf
functions to read and parse input.
You can control the spacing of the printouts using
%NX where N is the number of
characters you want printed out (as spaces), and
X is the format specifier above. For example,
"%10ld"would print a long integer in a 10
character slot. Adding \n and \t
add in the newlines and the tabs. If you need to print out a
“\”, use \\.
#include <stdio.h>
#include <limits.h>
int
main(void)
{
printf("Integers: %d, %ld, %u, %c\n"
"Hex and pointers: %lx, %p\n"
"Strings: %s\n",
INT_MAX, LONG_MAX, UINT_MAX, '*',
LONG_MAX, &main,
"hello world");
return 0;
}Program output:
Integers: 2147483647, 9223372036854775807, 4294967295, *
Hex and pointers: 7fffffffffffffff, 0x650838c45149
Strings: hello world
struct - A collection of different values.
Example: objects with many different fields.union - One of a set of values.
Example: what if you want to represent data for one food
item among others.Unions are not very common, but are sometimes useful. The
size of a union is the maximum size of
its fields. In contrast, the struct is the
sum of the sizes of each of its fields.
#include <stdio.h>
struct hamburger {
int num_burgers;
int cheese;
int num_patties;
};
union food {
int num_eggs;
struct hamburger burger;
};
/* Same contents as the union. */
struct all_food {
int num_eggs;
struct hamburger burger;
};
int
main(void)
{
union food f_eggs, f_burger;
/* now I shouldn't access `.burger` in `f_eggs` */
f_eggs.num_eggs = 10;
/* This is just syntax for structure initialization. */
f_burger.burger = (struct hamburger) {
.num_burgers = 5,
.cheese = 1,
.num_patties = 1
};
/* now shouldn't access `.num_eggs` in `f_burger` */
printf("Size of union: %ld\nSize of struct: %ld\n",
sizeof(union food), sizeof(struct all_food));
return 0;
}Program output:
Size of union: 12
Size of struct: 16
We can see the effect of the union: The size
is max(fields) rather than
sum(fields). What other examples can you think
of where you might want unions?
An aside on syntax: The structure initialization syntax in this example is simply a shorthand. The
struct hamburgerinitialization above is equivalent to:f_burger.burger.num_burgers = 5; f_burger.burger.cheese = 1; f_burger.burger.num_patties = 1;Though since there are so many
.s, this is a little confusing. We’d typically want to simply as:struct hamburger *h = &f_burger.burger; h->num_burgers = 5; h->cheese = 1; h->num_patties = 1;More on
->in the next section.
Variables can have be pointer types
(e.g. int *a). Variables with pointer types are
pointers and hold an address to the a variable of
the type to which they point, or to NULL.
NULL denotes that the pointer is not valid. You
want to imagine that variables hold data, for example
int a = 6:
a---+
| 6 |
+---+
A pointer, int *b = &a should be
imagined as a pointer:
b ---> a---+
| 6 |
+---+
Note that the “address of”, & operator
takes a variable and returns its address. If you print out
the pointer, printf("%p", b), you’ll get the
address (i.e. the arrow) To follow the arrow, you
must dereference the pointer:
*b == 6.
#include <stdio.h>
static int value;
/* here, the `*` is part of the type "int *", i.e. a pointer to an `int` */
void
foo(int *ptr)
{
/*
* Here, the `*` is the dereference operation, and
* gives us the value that is pointed to.
*/
*ptr = 1;
}
int
main(void)
{
printf("value is initialized to %d\n", value);
foo(&value); /* `&` gives us the address of the variable `value` */
printf("value updated to %d\n", value);
return value - 1;
}Program output:
value is initialized to 0
value updated to 1
Pointers are necessary as they enable us to build linked data-structures (linked-lists, binary trees, etc…). Languages such as Java assume that every single object variable is a pointer, and since all object variables are pointers, they don’t need special syntax for them.
Arrays are simple contiguous data items, all of the same
type. int a[4] = {6, 7, 8, 9} should be
imagined as:
a ---> +---+---+---+---+
| 6 | 7 | 8 | 9 |
+---+---+---+---+
When you access an array item, a[2] == 8, C
is really treating a as a pointer, doing
pointer arithmetic, and dereferences to find offset
2.
#include <stdio.h>
int main(void) {
int a[] = {6, 7, 8, 9};
int n = 1;
printf("0th index: %p == %p; %d == %d\n", a, &a[0], *a, a[0]);
printf("nth index: %p == %p; %d == %d\n", a + n, &a[n], *(a + n), a[n]);
return 0;
}Program output:
0th index: 0x7ffeb5d86320 == 0x7ffeb5d86320; 6 == 6
nth index: 0x7ffeb5d86324 == 0x7ffeb5d86324; 7 == 7
Making this a little more clear, lets understand how C
accesses the nth item. Lets make a pointer
int *p = a + 1 (we’ll just simply and assume
that n == 1 here), we should have this:
p ---------+
|
V
a ---> +---+---+---+---+
| 6 | 7 | 8 | 9 |
+---+---+---+---+
Thus if we dereference p, we access the
1st index, and access the value
7.
#include <stdio.h>
int
main(void)
{
int a[] = {6, 7, 8, 9};
/* same thing as the previous example, just making the pointer explicit */
int *p = a + 1;
printf("nth index: %p == %p; %d == %d\n", p, &a[1], *p, a[1]);
return 0;
}Program output:
nth index: 0x7ffe91c27844 == 0x7ffe91c27844; 7 == 7
We can see that pointer arithmetic (i.e. doing
addition/subtraction on pointers) does the same thing as
array indexing plus a dereference. That is,
*(a + 1) == a[1]. For the most part, arrays and
pointers can be viewed as very similar, with only a few
exceptions3.
Pointer arithmetic should generally be avoided in favor of using the array syntax. One complication for pointer arithmetic is that it does not fit our intuition for addition:
#include <stdio.h>
int
main(void)
{
int a[] = {6, 7, 8, 9};
char b[] = {'a', 'b', 'c', 'd'};
/*
* Calculation: How big is the array?
* How big is each item? The division is the number of items.
*/
int num_items = sizeof(a) / sizeof(a[0]);
int i;
for (i = 0; i < num_items; i++) {
printf("idx %d @ %p & %p\n", i, a + i, b + i);
}
return 0;
}Program output:
idx 0 @ 0x7fff5c67dda0 & 0x7fff5c67ddb4
idx 1 @ 0x7fff5c67dda4 & 0x7fff5c67ddb5
idx 2 @ 0x7fff5c67dda8 & 0x7fff5c67ddb6
idx 3 @ 0x7fff5c67ddac & 0x7fff5c67ddb7
Note that the pointer for the integer array
(a) is being incremented by 4, while the
character array (b) by 1. Focusing on the key
part:
idx 0 @ ...0 & ...4
idx 1 @ ...4 & ...5
idx 2 @ ...8 & ...6
idx 3 @ ...c & ...7
^ ^
| |
Adds 4 ----+ |
Adds 1 -----------+
Thus, pointer arithmetic depends on the size of the
types within the array. There are types when one wants
to iterate through each byte of an array, even if the array
contains larger values such as integers. For example, the
memset and memcmp functions set
each byte in an range of memory, and byte-wise compare two
ranges of memory. In such cases, casts can be used
to manipulate the pointer type (e.g. (char *)a
enables a to not be referenced with pointer
arithmetic that iterates through bytes).
#include <stdio.h>
/* a simple linked list */
struct student {
char *name;
struct student *next;
};
struct student students[] = {
{.name = "Penny", .next = &students[1]}, /* or `students + 1` */
{.name = "Gabe", .next = NULL}
};
struct student *head = students;
/*
* head --> students+------+
* | Penny | Gabe |
* | next | next |
* +-|-----+---|--+
* | ^ +----->NULL
* | |
* +-----+
*/
int
main(void)
{
struct student *i;
for (i = head; i != NULL; i = i->next) {
printf("%s\n", i->name);
}
return 0;
}Program output:
Penny
Gabe
Generally, if you want to treat a pointer type as
another, you need to use a cast. You rarely want to do this
(see the memset example below to see an example
where you might want it). However, there is a need in C to
have a “generic pointer type” that can be implicitly cast
into any other pointer type. To get a sense of why this is,
two simple examples:
- What should the type of
NULLbe?
NULL is used as a valid value for
any pointer (of any type), thus NULL
must have a generic type that can be used in the code as a
value for any pointer. Thus, the type of NULL
must be void *.
mallocreturns a pointer to newly-allocated memory. What should the type of the return value be?
C solves this with the void * pointer type.
Recall that void is not a valid type for a
variable, but a void * is different. It is a
“generic pointer that cannot be dereferenced*. Note that
dereferencing a void * pointer shouldn’t work
as void is not a valid variable type
(e.g. void *a; *a = 10; doesn’t make much sense
because *a is type void).
#include <malloc.h>
int
main(void)
{
int *intptr = malloc(sizeof(int)); /* malloc returns `void *`! */
*intptr = 0;
return *intptr;
}Program output:
Data-structures often aim to store data of any type
(think: a linked list of anything). Thus, in C, you often
see void *s to reference the data they
store.
Indexing into arrays (a[b]) and arrows
(a->b) are redundant syntactic features, but
they are very convenient.
&a[b] is equivalent to
a + b where a is a pointer and
b is an index.a[b] is equivalent to *(a + b)
where a is a pointer and b is an
index.a->b is equivalent to
(*a).b where a is a pointer to a
variable with a structure type that has b as a
field.Generally, you should always try and stick to the array and arrow syntax were possible, as it makes your intention much more clear when coding than the pointer arithmetic and dereferences.
Dynamic memory allocations
void *malloc(size_t size) - allocate
size memory, or return NULL.void *calloc(size_t nmemb, size_t size) -
allocate nmemb * size bytes, and initialize
them to 0.void *realloc(void *ptr, size_t size) -
pass in a previous allocation, and either
grow/shrink it to size, or allocate a new chunk
of memory and copy the data in ptr into it.
Return the memory of size size, or
NULL on error.void free(void *ptr) - deallocate
previously allocated memory (returned from any of the
above).A few things to keep in mind:
int *arr = malloc(sizeof(int) * n); to allocate
an array of ints with a length of
n.malloc is not guaranteed to initialize its
memory to 0. You must make sure that
your array gets initialized. It is not uncommon to do a
memset(arr, 0, sizeof(int) * n); to set the
memory 0.calloc is guaranteed to initialize all its
allocated memory to 0.NULL, and handle the error appropriately.#include <malloc.h>
int
main(void)
{
int *a = malloc(sizeof(int));
/* Error: did not check return value! */
*a = 1;
free(a);
return 0;
}Program output:
free, and then
dereference that pointer, bad things can happen. The memory
might have already been re-allocated due to another call to
malloc, and is used for something completely
different in your program. It is up to you as a programmer
to avoid freeing memory until all references to
it are dropped.#include <malloc.h>
int
main(void)
{
int *a = malloc(sizeof(int));
if (a == NULL) return -1;
free(a);
/* Error: accessing what `a` points to after `free`! */
return *a;
}Program output:
inline_exec_tmp.c: In function main:
inline_exec_tmp.c:11:16: warning: pointer a used after free [-Wuse-after-free]
11 | return *a;
| ^~
inline_exec_tmp.c:8:9: note: call to free here
8 | free(a);
| ^~~~~~~
make[1]: *** [Makefile:30: inline_exec] Error 86
free it, then the
memory will never get freed. This is a memory
leak.#include <malloc.h>
int
main(void)
{
int *a = malloc(sizeof(int));
if (!a) return -1;
a = NULL;
/* Error: never `free`d `a` and no references to it remain! */
return 0;
}Program output:
free. If you
free the memory twice, bad things can happen.
You could confuse the memory allocation logic, or you could
accidentally free an allocation made after the
first free was called.#include <malloc.h>
int
main(void)
{
int *a = malloc(sizeof(int));
if (!a) return -1;
free(a);
free(a);
/* Error: yeah, don't do that! */
return 0;
}Program output:
inline_exec_tmp.c: In function main:
inline_exec_tmp.c:9:9: warning: pointer a used after free [-Wuse-after-free]
9 | free(a);
| ^~~~~~~
inline_exec_tmp.c:8:9: note: call to free here
8 | free(a);
| ^~~~~~~
free(): double free detected in tcache 2
make[1]: *** [Makefile:30: inline_exec] Aborted (core dumped)
valgrind will help you debug the last three
of these issues, and later in the class, we’ll develop a
library to help debug the first.
It is not uncommon for programs to require some
dynamic behavior. A data structure might need to do
different types of comparisons between its data (depending
on what data is being stored). This means that a generic
data-structure (think: a hash-table) needs to somehow
compare equality of the data objects it stores without
knowing what they are! In Java, think about the
equals method. Each object can define its own
equals method to compare between objects of the
same type.
In C, this is implemented using function
pointers. These are pointers to functions of a
particular type. They can be passed as arguments to
functions, and can be dereferenced to call the function. Let
look at an example from man 3 lsearch:
void *lfind(const void *key, const void *base, size_t *nmemb, size_t size,
int(*compar)(const void *, const void *));
This is a function to find a value in an array.
The implementation needs to understand how to compare a
key with each of the items in the array. For
this purpose, the compar function pointer is
the last argument.
Function pointer syntax is not natural in C. Thus, it is
common to use a typedef to create the type name
for a function pointer for a given API. The syntax for this
is perhaps even less natural, but only needs be
done once.
typedef int (*compare_fn_t)(const void *, const void *);
void *lfind(const void *key, const void *base, size_t *nmemb, size_t size, compare_fn_t compar);
Now the function declaration is more clear, and the
_fn_t postfix, by convention, denotes a
function pointer.
We’re going to look at a set of variables as memory. When variables are created globally, they are simply allocated into subsequent addresses.
#include <stdio.h>
#include <string.h>
void print_values(void);
unsigned char a = 1;
int b = 2;
struct foo {
long c_a, c_b;
int *c_c;
};
struct foo c = (struct foo) { .c_a = 3, .c_c = &b };
unsigned char end;
int
main(void)
{
size_t vars_size;
unsigned int i;
unsigned char *mem;
/* Q1: What would you predict the output of &end - &a is? */
printf("Addresses:\na @ %p\nb @ %p\nc @ %p\nend @ %p\n"
"&end - &a = %ld\n", /* Note: you can split strings! */
&a, &b, &c, &end, &end - &a);
printf("\nInitial values:\n");
print_values();
/* Q2: Describe what these next two lines are doing. */
vars_size = &end - &a;
mem = &a;
/* Q3: What would you expect in the following printout (with the print uncommented)? */
printf("\nPrint out the variables as raw memory\n");
for (i = 0; i < vars_size; i++) {
unsigned char c = mem[i];
// printf("%x ", c);
}
/* Q4: What would you expect in the following printout (with the print uncommented)? */
memset(mem, 0, vars_size);
/* memset(a, b, c): set the memory starting at `a` of size `c` equal `b` */
printf("\n\nPost-`memset` values:\n");
// print_values();
return 0;
}
void
print_values(void)
{
printf("a = %d\nb = %d\nc.c_a = %ld\nc.c_b = %ld\nc.c_c = %p\n",
a, b, c.c_a, c.c_b, c.c_c);
}Program output:
Addresses:
a @ 0x6543733b8010
b @ 0x6543733b8014
c @ 0x6543733b8020
end @ 0x6543733b8039
&end - &a = 41
Initial values:
a = 1
b = 2
c.c_a = 3
c.c_b = 0
c.c_c = 0x6543733b8014
Print out the variables as raw memory
Post-`memset` values:
Question Answer Q1-4 in the code, uncommenting and modifying where appropriate.
b) with an
integer type (sizeof(int) == 4) should always
have an address that is a multiple of its size
(&b % sizeof(b) == 0, so an
int’s address is always divisible by
4, a long’s by
8).&end - &a, is
crazy. We’re used to performing math operations
values on things of the same type, but not on
pointers. This is only possible because C sees the
variables are chunks of memory that happen to be laid out in
memory, one after the other.mem = &a, and our iteration through
mem[i]. We’re able to completely ignore the
types in C, and access memory directly!Question: What would break if we changed
char a; into int a;? C doesn’t let
us do math on variables of any type. If you fixed
compilation problems, would you still get the same
output?
Please read the man pages for
lsearch and lfind.
man pages can be pretty cryptic, and you are
aiming to get some idea where to start with an
implementation. An simplistic, and incomplete initial
implementation:
#include <stdio.h>
#include <assert.h>
#include <search.h>
#define NUM_ENTRIES 8
struct kv_entry {
int key; /* only support keys for now... */
};
/* global values are initialized to `0` */
struct kv_entry entries[NUM_ENTRIES];
size_t num_items = 0;
/**
* Insert into the key-value store the `key` and `value`.
* Return `0` on successful insertion of the value, or `-1`
* if the value couldn't be inserted.
*/
int
put(int key, int value)
{
return 0;
}
/**
* Attempt to get a value associated with a `key`.
* Return the value, or `0` if the `key` isn't in the store.
*/
int
get(int key)
{
return 0;
}
int
compare(const void *a, const void *b)
{
/* We know these are `int`s, so treat them as such! */
const struct kv_entry *a_ent = a, *b_ent = b;
if (a_ent->key == b_ent->key) return 0;
return -1;
}
int
main(void)
{
struct kv_entry keys[] = {
{.key = 1},
{.key = 2},
{.key = 4},
{.key = 3}
};
int num_kv = sizeof(keys) / sizeof(keys[0]);
int queries[] = {4, 2, 5};
int num_queries = sizeof(queries) / sizeof(queries[0]);
int i;
/* Insert the keys. */
for (i = 0; i < num_kv; i++) {
lsearch(&keys[i], entries, &num_items, sizeof(entries) / sizeof(entries[0]), compare);
}
/* Now lets lookup the keys. */
for (i = 0; i < num_queries; i++) {
struct kv_entry *ent;
int val = 0;
ent = lfind(&queries[i], entries, &num_items, sizeof(entries[0]), compare);
if (ent != NULL) {
val = ent->key;
}
printf("%d: %d @ %p\n", i, val, ent);
}
return 0;
}Program output:
0: 4 @ 0x5da33433804c
1: 2 @ 0x5da334338044
2: 0 @ (nil)
You want to implement a simple “key-value” store that is very similar in API to a hash-table (many key-value stores are implemented using hash-tables!).
Questions/Tasks:
lsearch and lfind? The
manpages should help here
(man 3 lsearch) (you can exit from a
man page using ‘q’). What operations would you
want to perform with each (and why)?put and get functions.main for
the relevant conditions and failures in get and
put.C presents some unique opportunities for how to structure your code and data-structures, but also requires that you’re careful about how data is passed around a program.
In C, when creating a variable, you have the option of allocating it in one of multiple different types of memory:
malloc or its sibling
functions.It might be surprising, but it is quite uncommon in programming languages to have this flexibility.
What does it look like to do internal allocation of one struct or array inside of another? See the following as an example.
#include <stdlib.h>
struct bar {
/*
* `arr` is allocated internally to `bar`, whereas `bytes` is a pointer to
* a separate allocation. `arr` must have a *fixed size*, and `bytes` does not
* because its allocation can be as large as you want!
*/
int arr[64];
unsigned char *bytes;
};
struct foo {
/* The `bar` structure is allocated as *part of* the `struct foo` in `internal` */
struct bar internal;
/* But we can *also* allocate another `bar` separately if we'd prefer */
struct bar *external;
};
int
main(void)
{
/*
* We didn't have to separately allocate the `struct bar internal` as
* it was allocated with the enclosing `a` allocation. However, we do have to
* allocate the `external` allocation. In both cases, we have access to a
* `struct bar` from within `struct foo`.
*/
struct foo *a = malloc(sizeof(struct foo)); /* should check return value */
a->external = malloc(sizeof(struct bar)); /* and this one ;-( */
/*
* Note the difference in needing to use `->` when `external` is a pointer
* versus simply using `.` with `internal`.
*/
a->internal.arr[0] = a->external->arr[0];
return 0;
}Program output:
One of the more interesting uses of internal allocation is in linked data-structures (e.g. like linked lists). It is common in languages like java to implement linked data-structures to have “nodes” that reference the data being tracked.
// This could be made generic across the data types it could store by instead
// using `class LinkedList<T>`, but I'm keeping it simple here.
class LinkedListOfStudents {
Node head;
class Node {
Student s;
Node next;
Node(Student s, Node next) {
this.s = s;
this.next = next;
}
}
void add(Student s) {
// The program has *previously* allocated `data`.
// Now we have to additionally allocate the `node` separate from the data!
Node n = new(s, this.head);
this.head = n;
}
// ...
}
// This looks like the following. Note separate allocations for Node and Student
//
// head --> Node------+ ,-->Node-----+
// | s next |---' | s next |---...
// +-|-------+ +-|------+
// v V
// Student-+ Student-+
// | ... | | ... |
// +-------+ +-------+Lets see the same in C.
#include <stdlib.h>
#include <stdio.h>
struct linked_list_node {
void *data;
struct linked_list_node *next;
};
struct linked_list {
struct linked_list_node *head;
};
struct student {
// student data here...
struct linked_list_node list;
};
void
add(struct linked_list *ll, struct linked_list_node *n, void *data)
{
n->next = ll->head;
n->data = data;
ll->head = n;
}
struct linked_list l;
int
main(void)
{
struct student *s = malloc(sizeof(struct student));
/* Should check that `s != NULL`... */
add(&l, &s->list, s); /* note that `&s->list` is the same as `&(s->list)` */
printf("student added to list!\n");
return 0;
}
/*
* This looks something like the following. Linked list is *internal* to student.
*
* head ----> student-+ ,-> student-+
* | ... | , | ... |
* | next |---' | next |--> NULL
* +-------+ +-------+
*/Program output:
student added to list!
A few interesting things happened here:
void * pointers within the
linked_list_node so that the linked list can
hold data of any pointer type. You can see that in
our list implementation, there is nothing that is
student-specific.list is inline-allocated inside the
student structure, which completely avoids the
separate node allocation.Most serious data-structure implementations enable this
inlining of data-structure nodes even without requiring the
data pointer in the node. We won’t cover that,
but you can see
a sample implementation.
We’ve already seen malloc,
calloc, realloc, and
free which are our interface to the heap. These
provide the most flexibility, but require that you track the
memory and free it appropriately4
We have already seen global allocations frequently in examples so far.
#include <stdio.h>
/* A globally allocated array! No `malloc` needed! */
int arr[64];
struct foo {
int a, b;
struct foo *next;
};
/* A globally allocated structure! */
struct foo s;
/* Globally allocated *and* initialized integer... */
int c = 12;
/* ...and array. */
long d[] = {1, 2, 3, 4};
int
main(void)
{
printf("What is uninitialized global memory set to?\n"
"Integer: %d\nPointer: %p (as hex: %lx)\n",
arr[0], s.next, (unsigned long)s.next);
/* Note that we use `.` to access the fields because `s` is not a pointer! */
return 0;
}Program output:
What is uninitialized global memory set to?
Integer: 0
Pointer: (nil) (as hex: 0)
Global variables are either initialized where they are
defined, or are initialized to 0. Note that
they are intialized to all 0s regardless their
type. This makes more sense than it sounds because pointers
set to 0 are NULL (because,
recall, NULL is just (void *)0 –
see the “hex” output above), and because strings (see later)
are terminated by \0 which is also
0! Thus, this policy initializes all numerical
data to 0, all pointers to NULL,
and all strings to "".
Variables can also be allocated on the stack. This effectively means that as you’re executing in a function, variables can be allocated within the context/memory of that function. An example that allocates a structure and an array on the stack:
#include <stdio.h>
#include <assert.h>
#define ARR_SZ 12
/*
* Find an integer in the array, reset it to `0`, and return its offset.
* Nothing very interesting here.
*/
int
find_and_reset(int *arr, int val)
{
int i;
/* find the value */
for (i = 0; i < ARR_SZ; i++) {
if (arr[i] == val) {
arr[i] = 0;
return i;
}
}
/* Couldn't find it! */
return -1;
}
int
fib(int v)
{
/* Allocate an array onto the stack */
int fibs[ARR_SZ] = {0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89}; /* looks like a suspicious sequence... */
int ret;
ret = find_and_reset(fibs, v);
/* should have been set to `0`, so this should return `-1` */
assert(find_and_reset(fibs, v) == -1);
return ret;
}
int
main(void)
{
printf("Should find 8 @ 6: %d\n", fib(8));
/* if the array was the same array for these two calls, it would be found at offset -1 (error) */
printf("Should find 8 @ 6: %d (-1 is wrong here)\n", fib(8));
return 0;
}Program output:
Should find 8 @ 6: 6
Should find 8 @ 6: 6 (-1 is wrong here)
Key points to realize:
fibs in fib is allocated
on the stack and is initialized in
fib!find_and_reset which modifies the array
directly as it is passed as a pointer.fib is called, it
creates the arr. After we return from
fib, the fibs goes away (strictly:
its memory is reclaimed as part of returning from
fib).fib, it effectively
is a new execution of fib, thus a
new allocation of fibs that is now
initialized a second time.How do we think about this? The stack starts out in
main:
stack |
| |
+-main---------+
| | <--- main's local data (note: no local variables)
+--------------+
What we’re drawing here is a “stack frame” for
the main function. This includes all of the
local data allocated within the function, and (at least
conceptually) also includes the arguments passed into the
function (main has none). When it calls
fib, a stack frame is allocated for
fib, the argument is passed in, and
fibs variables are allocated:
stack |
| |
+-main---------+
| |
+-fib----------+
| arg: v | <--- argument to fib
| fibs[ARR_SZ] | <-+- local variables, allocated here!
| ret | <-'
+--------------+
fib calls find_and_reset:
stack |
| |
+-main---------+
| |
+-fib----------+
| v |
| fibs[ARR_SZ] |<--+
| ret | | pointer to fibs passed as argument, and used to update arr[v]
+find_and_reset+ |
| arg: arr |---+
| arg: val |
| i |
+--------------+
Since find_and_reset updates the array in
fib’s stack frame, when it returns, the array
is still properly updated. Importantly, once we return to
main, we have deallocated all of the variables
from the previous call to fib…
stack |
| |
+-main---------+
| | <--- returning to main, deallocates the fibs array in fib
+--------------+
…thus the next call to fib allocates a
new set of local variables including
fibs.
stack |
| |
+-main---------+
| |
+-fib----------+
| arg: v |
| fibs[ARR_SZ] | <--- new version of fibs, re-initialized when we fib is called
| ret |
+--------------+
A few common errors with stack allocation include:
return). Any pointers that remain to any of
those variables are no longer pointing to valid memory!We discuss these next.
Initializing stack-allocated variables.
For global memory, we saw that variables, if not
intentionally initialized, would be set to 0.
This is not the case with stack allocated
variables. In fact, the answer is “it depends”. If you
compile your code without optimizations, stack allocated
variables will be initialized to 0; if you
compile with optimizations, they are not
initialized. Yikes. Thus, we must assume that they are not
automatically initialized (similar to the memory returned
from malloc). An example:
#include <stdio.h>
void
foo(void)
{
/* don't manually initialize anything here */
int arr[12];
int i;
for (i = 0; i < 12; i++) {
printf("%d: %d\n", i, arr[i]);
}
}
int
main(void)
{
foo();
return 0;
}Program output:
inline_exec_tmp.c: In function foo:
inline_exec_tmp.c:11:17: warning: arr may be used uninitialized [-Wmaybe-uninitialized]
11 | printf("%d: %d\n", i, arr[i]);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
inline_exec_tmp.c:7:13: note: arr declared here
7 | int arr[12];
| ^~~
0: 0
1: 0
2: 0
3: 0
4: 0
5: 0
6: 0
7: 0
8: 0
9: 0
10: 0
11: 0
Yikes. We’re getting random values in the array! Where do you think these values came from?
References to stack variables after function return.
#include <stdio.h>
unsigned long *ptr;
unsigned long *
bar(void)
{
unsigned long a = 42;
return &a; /* Return the address of a local variable. */
}
void
foo(void)
{
unsigned long a = 42; /* Allocate `a` on the stack here... */
ptr = &a; /* ...and set the global pointer to point to it... */
return; /* ...but then we deallocate `a` when we return. */
}
int
main(void)
{
unsigned long val;
foo();
printf("Save address of local variable, and dereference it: %lu\n", *ptr);
fflush(stdout); /* ignore this ;-) magic sprinkles here */
ptr = bar();
printf("Return address of local variable, and dereference it: %lu\n", *ptr);
return 0;
}Program output:
inline_exec_tmp.c: In function bar:
inline_exec_tmp.c:10:16: warning: function returns address of local variable [-Wreturn-local-addr]
10 | return &a; /* Return the address of a local variable. */
| ^~
inline_exec_tmp.c: In function foo:
inline_exec_tmp.c:18:13: warning: storing the address of local variable a in ptr [-Wdangling-pointer=]
18 | ptr = &a; /* ...and set the global pointer to point to it... */
| ~~~~^~~~
inline_exec_tmp.c:16:23: note: a declared here
16 | unsigned long a = 42; /* Allocate `a` on the stack here... */
| ^
inline_exec_tmp.c:3:16: note: ptr declared here
3 | unsigned long *ptr;
| ^~~
Save address of local variable, and dereference it: 42
make[1]: *** [Makefile:30: inline_exec] Segmentation fault (core dumped)
You can see a few interesting facts from the output.
*ptr after
foo is a random value, and dereferencing the
return value frombar causes a segmentation
fault.foo and bar contain logic that
feels pretty identical. In either case, they are taking a
local variable, and passing its address to main
where it is used. foo passed it through a
global variable, and bar simply returns the
address. Despite this, one of them causes a segmentation
fault, and the other seems to return a nonsensical value!
When you try and use stack allocated variables after they
are been freed (by their function returning), you get
unpredictable results.Stack allocation is powerful, can be quite useful. However, you have to always be careful that stack allocated variables are never added into global data-structures, and are never returned.
Lets look at an example that uses inlined, global, and stack allocation. We’ll avoid heap-based allocation to demonstrate the less normal memory allocation options in C.
#include <stdio.h>
#include <search.h>
#include <assert.h>
struct kv_entry {
unsigned long key;
char *value;
};
struct kv_store {
/* internal allocation of the key-value store's entries */
struct kv_entry entries[16];
size_t num_entries;
};
int
kv_comp(const void *a, const void *b)
{
const struct kv_entry *a_ent = a;
const struct kv_entry *b_ent = b;
/* Compare keys, and return `0` if they are the same */
return !(a_ent->key == b_ent->key);
}
int
put(struct kv_store *store, unsigned long key, char *value)
{
/* Allocate a structure on the stack! */
struct kv_entry ent = {
.key = key,
.value = value
};
struct kv_entry *new_ent;
/* Should check if the kv_store has open entries. */
/* Notice we have to pass the `&ent` as a pointer is expected */
new_ent = lsearch(&ent, store->entries, &store->num_entries, sizeof(struct kv_entry), kv_comp);
/* Should check if we found an old entry, and we need to update its value. */
if (new_ent == NULL) return -1;
return 0;
}
int
main(void)
{
/* Allocated the data-store on the stack, including the array! */
struct kv_store store = { .num_entries = 0 };
unsigned long key = 42;
char *value = "secret to everything";
put(&store, key, value);
/* Validate that the store got modified in the appropriate way */
assert(store.num_entries == 1);
assert(store.entries[0].key == key);
printf("%ld is the %s\n", store.entries[0].key, store.entries[0].value);
return 0;
}Program output:
42 is the secret to everything
In this program you might notice that we’ve used no
dynamic memory allocation at all! The cost of this is
that we had to create a key-value store of only a fixed size
(16 items, here).
C enables a high-degree of control in which memory different variables should be placed. Java traditionally has simple rules for which memory is used for variables:
new in the heap, and references to objects are
always pointers.free.Many aspects of this are absolutely fantastic:
. to access a field
(e.g. obj.a) – all object references are
pointers, so instead of having ->
everywhere, just replace it with the uniform
..However, there are significant downsides when the goal is to write systems code:
String don’t have a dedicated type in C – there is no
String type. Strings are
chars,\0) character in
the last array position to denote the termination of the
string.In memory, a simple string has this representation:
char *str = "hi!"
str ---> +---+---+---+---+
| h | i | ! |\0 |
+---+---+---+---+
Note that \0 is a single character even
though it looks like two. We can see this:
#include <stdio.h>
#include <assert.h>
int
main(void)
{
int i;
char *str = "hi!";
char str2[4] = {'h', 'i', '!', '\0'}; /* We can allocate strings on the stack! */
for (i = 0; str[i] != '\0'; i++) { /* Loop till we find the null-terminator */
assert(str[i] == str2[i]); /* Verify the strings are the same. */
printf("%c", str[i]);
}
assert(str[3] == str2[3] && str[3] == '\0'); /* Explicitly check that the null-terminator is there */
printf("\n");
}Program output:
hi!
So strings aren’t really all that special in C. They’re
just arrays with a special terminating character, and a
little bit of syntax support to construct them
(i.e. "this syntax"). So what’s there to know
about strings in C?
string.h FunctionsWorking with strings is not C’s strong point.
However, you have to handle strings in every language, and C
provides a number of functions to do so. You can read about
each of these functions with
man 3 <function>.
strlen - How many characters is a string
(not including the null-terminator)?strcmp - Compare two strings, return
0 if they are the same, or -1 or
1, depending on which is lexographically
less than the other. Similar in purpose to
equals in Java.strcpy - Copy into a string the contents of
another string.strcat - Concatenate, or append onto the
end of a string, another string.strdup - Duplicate a string by
mallocing a new string, and copying the string
into it (you have to free the string
later!).snprintf - You’re familiar with
printf, but snprintf enables you
to “print” into a string! This gives you a lot of
flexibility in easily constructing strings. A downside is
that you don’t really know how big the resulting string is,
so the n in the name is the maximum length of
the string you’re creating.#include <string.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
char result[256] = {'\0', }; /* initialize string to be "" */
char *a = "hello", *b = "world";
int c = 42;
int ret;
char *d;
assert(strcmp(a, b) != 0); /* these strings are different */
strcat(result, a);
strcat(result, " ");
strcat(result, b);
assert(strcmp(result, "hello world") == 0); /* should have constructed this string properly */
d = strdup(result);
assert(strcmp(d, result) == 0); /* a duplicate should be equal */
free(d);
strcpy(result, ""); /* reset the `result` to an empty string */
ret = snprintf(result, 255, "%s %s and also %d", a, b, c);
printf("\"%s\" has length %d\n", result, ret);
}Program output:
"hello world and also 42" has length 23
Many of these functions raise an important question: What
happens if one of the strings is not large enough to hold
the data being put into it? If you want to
strcat a long string into a small character
array, what happens? This leads us to a simple fact…
It is easy to use the string functions
incorrectly. Many of these functions also have a
strnX variant where X is the
operation (strnlen, strncmp,
etc..). These are safer variants of the string functions.
The key insight here is that if a string is derived from a
user, it might not actually be a proper string! It might
not, for example, have a null-terminator – uh oh! In that
case, many of the above functions will keep on iterating
past the end of the buffer
#include <string.h>
#include <stdio.h>
char usr_str[8];
int
main(void)
{
char my_str[4];
/*
* This use of `strcpy` isn't bugged as we know the explicit string is 7 zs,
* and 1 null-terminator, which can fit in `usr_str`.
*/
strcpy(usr_str, "zzzzzzz");
/*
* Also fine: lets limit the copy to 3 bytes, then add a null-terminator ourself
* (`strlcpy` would do this for us)
*/
strncpy(my_str, usr_str, 3);
my_str[3] = '\0';
printf("%s\n", my_str);
fflush(stdout); /* don't mind me, making sure that your print outs happen */
/*
* However, note that `strlen(usr_str)` is larger than the size of `my_str` (4),
* so we copy *past* the buffer size of `my_str`. This is called a "buffer overflow".
*/
strcpy(my_str, usr_str);
return 0;
}Program output:
zzz
*** stack smashing detected ***: terminated
make[1]: *** [Makefile:30: inline_exec] Aborted (core dumped)
While many of the previous functions have to do with creating and modifying strings, computers frequently need to “parse”, or try to understand the different parts of, a string. Some examples:
.c or
.java files is just a long string, and the
programming language needs to parse the string to
understand what commands you’re trying to issue to the
computer.html
code (along with other assets), and a browser needs to
parse it to determine what to display.pandoc to generate a pdf and
html.Some of the core functions that help us parse strings include:
strtol - Pull an integer out of a string.
Converts the first part of a string into a
long int, and also returns an
endptr which points to the character
in the string where the conversion into a number
stopped. If it cannot find an integer, it will return
0, and endptr is set to point to
the start of the string.strtok - Iterate through a string, and find
the first instance of one of a number of specific
characters, and return the string leading up to that
character. This is called multiple times to iterate through
the string, each time extracting the substring up to the
specific characters. See the example in the man
page for strtok for an example.strstr - Try and find a string (the
“needle”) in another string (the “haystack”), and return a
pointer to it (or NULL if you don’t find it).
As such, it finds a string in another string (thus
strstr).sscanf - A versatile function will enables
a format string (e.g. as used in printf) to
specify the format of the string, and extract out digits and
substrings.Some examples:
#include <string.h>
#include <stdlib.h>
#include <assert.h>
#include <stdio.h>
int
main(void)
{
char *input = "1 42 the secret to everything";
char *inputdup;
char *substr;
char *substr_saved;
long int extracted;
char sec[32];
extracted = strtol(input, &substr, 10); /* pull out the first two integers */
printf("extracted %ld, remaining string: \"%s\"\n", extracted, substr);
extracted = strtol(substr, &substr, 10);
printf("extracted %ld, remaining string: \"%s\"\n", extracted, substr);
substr_saved = substr;
/* what happens when we cannot extract a long? */
extracted = strtol(substr_saved, &substr, 10);
assert(extracted == 0 && substr_saved == substr); /* verify that we couldn't extract an integer */
assert(strcmp(strstr(input, "secret"), "secret to everything") == 0); /* find secret substring */
sscanf(input, "1 %ld the %s to everything", &extracted, sec); /* extract out the number and the secret */
printf("%ld and \"%s\"\n", extracted, sec);
printf("Using strtok to parse through a string finding substrings separated by 'h' or 't':\n");
inputdup = strdup(input); /* strtok will modify the string, lets copy it */
for (substr = strtok(inputdup, "ht"); substr != NULL; substr = strtok(NULL, "ht")) {
printf("[%s]\n", substr);
}
return 0;
}Program output:
extracted 1, remaining string: " 42 the secret to everything"
extracted 42, remaining string: " the secret to everything"
42 and "secret"
Using strtok to parse through a string finding substrings separated by 'h' or 't':
[1 42 ]
[e secre]
[ ]
[o every]
[ing]
When you use an explicit string
(e.g. "imma string") in your code, you’re
actually asking C to allocate the string in global memory.
This has some strange side-effects:
#include <stdio.h>
#include <string.h>
char c[5];
int
main(void)
{
char *a = "blah";
char *b = "blah";
strncpy(c, a, 5);
printf("%s%s%s\n", a, b, c);
/* compare the three pointers */
printf("%p == %p != %p\n", a, b, c);
return 0;
}Program output:
blahblahblah
0x565d43a13004 == 0x565d43a13004 != 0x565d43a15011
The C compiler and linker are smart enough to see that if
you have already used a string with a specific value (in
this case "clone"), it will avoid allocating a
copy of that string, and will just reuse the previous value.
Generally, it doesn’t make much sense to look at the address
of strings, and certainly you should not compare them. You
can see in this example how you must compare strings for
equality using strncmp, and not to
compare pointers.
When programming in C, you’ll see quite a few APIs.
Throughout the class, we’ll see quite a few APIs, most
documented in man pages. It takes some practice
in reading man pages to get what you need from
them. One of the things that helps the most is to understand
a few common patterns and requirements
that you find these APIs, and in C programming in
general.
Function’s often need to return multiple values. C does not provide a means to return more than one value, thus is forced to use pointers. To understand this, lets look at the multiple ways that pointers can be used as function arguments.
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
/*
* `arg` is used to pass an argument that happens to be an array here. In contrast,
* `ret` is a *second return value*. This function will set the value that `ret` points
* to -- which happens to be `retval` in the `main` stack frame -- to the value we are
* getting from the array.
*/
int
get(int *arg, int offset, int *ret)
{
if (arg == NULL) return -1;
*ret = arg[offset];
return 0;
}
/*
* Again, the array is passed in as the first argument, but this time it is used to
* store the new value.
*/
int
set(int *ret_val, int offset, int value)
{
if (ret_val == NULL) return -1;
ret_val[offset] = value;
return 0;
}
/*
* `arrdup`'s job is to duplicate an array by allocating and populating
* a new array. It will return `0` or not `0` on success/failure. Thus
* the new array must be returned using pointer arguments. `ret_allocated`
* is a pointer to a pointer to an array in the calling function, and it
* is used to return the new array.
*/
int
arrdup(int **ret_allocated, int *args, size_t args_size)
{
size_t i;
int *newarr;
if (ret_allocated == NULL) return -1;
newarr = calloc(args_size, sizeof(int)); /* 1 below */
if (newarr == NULL) return -1;
for (i = 0; i < args_size; i++) {
newarr[i] = args[i];
}
*ret_allocated = newarr; /* 2 and 3 below */
return 0;
}
/*
* Lets draw this one. The stack setup when we call `arrdup`:
*
* | |
* +-main----------+
* | arr |<---+
* | dup |<-+ |
* | ... | | |
* +-arrdup--------+ | |
* | ret_allocated |--+ |
* | args |----+
* | ... |
* +---------------+
*
* `ret_allocated` points to `dup` in `main`.
*
*
* 3. *ret_allocated = newarr
* ^
* |
* ,-------------'
* |
* | | |
* +-main----|-----+
* | arr | |
* | dup ---' <------+
* | ... | | -- 2. *ret_allocated
* +-arrdup--------+ |
* | ret_allocated ---+
* | args |
* | newarr --------------> 1. calloc(...)
* +---------------+
*
* 1. `arrdup` calls `calloc` to allocate on the heap
* 2. Dereferencing `ret_allocated` gives us access to `dup`
* 3. thus we can set dup equal to the new heap memory
*
* This effectively enables us to return the new memory into the
* `dup` variable in main.
*/
int
main(void)
{
int arr[] = {0, 1, 2, 3};
int *dup;
int retval;
assert(get(arr, 2, &retval) == 0 && retval == 2);
assert(set(arr, 2, 4) == 0);
assert(get(arr, 2, &retval) == 0 && retval == 4);
assert(arrdup(&dup, arr, 4) == 0);
assert(get(dup, 2, &retval) == 0 && retval == 4);
free(dup);
printf("no errors!");
return 0;
}Program output:
no errors!
The first question is how we can detect that some error occurred within a function we have called? We’ll separate functions into two different classes:
type *fn(...)) are often relatively
straightforward. If they return NULL, then an
error occurred; otherwise the returned pointer can be used.
malloc is an example here.-1 to indicate an
error. Sometimes any negative value indicates an
error, each negative value designating that a different
error occurred. Non-negative values indicate success. If a
function wishes to return a binary success or failure,
you’ll see that many APIs (counter-intuitively) return
0 for success, and -1 for
failure.It is common that you want more information than the
return value can give you. You want more information about
why the failure happened so that you can debug more
easily. The errno variable is UNIX’s solution
to this (see its man page), and can be
referenced if you include errno.h.
errno is simply an integer where specific
values represent specific errors. You can view these values
by looking them up in the source5 or, you can ask the
errno program6. For example:
$ errno -l
EPERM 1 Operation not permitted
ENOENT 2 No such file or directory
ESRCH 3 No such process
EINTR 4 Interrupted system call
EIO 5 Input/output error
...
You can look up specific values:
$ errno 28
ENOSPC 28 No space left on device
You can imagine how that error might occur. What if you’re trying to add files on disk, and the disk runs out of room!?
If you want your program to print a useful error when you
encounter such a problem, you can use the
perror function (see its man page
for documentation) to print out an error, or
strerror (via string.h) if you
just want a string corresponding to the error.
#include <stdlib.h>
#include <limits.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
int
main(void)
{
char *ret;
printf("Lets get greedy: allocate %ld bytes!\n", LONG_MAX);
ret = malloc(LONG_MAX);
if (ret == NULL) {
printf("Error: errno value %d and description: %s\n", errno, strerror(errno));
fflush(stdout);
perror("Error allocating memory");
return -1;
}
return 0;
}Program output:
Lets get greedy: allocate 9223372036854775807 bytes!
Error: errno value 12 and description: Cannot allocate memory
Error allocating memory: Cannot allocate memory
make[1]: *** [Makefile:30: inline_exec] Error 255
(Note: when you return from a program with a non-zero
value, it designates that your program had an
error. This is why we see the make error when
it runs your program.)
To understand the return value of UNIX library functions:
int or
a pointer)SYNOPSIS
#include <stdlib.h>
void *malloc(size_t size);
...
RETURN VALUE and
ERRORS sections of the man page.RETURN VALUE
The malloc() and calloc() functions return a pointer to the allocated memory, which is suitably aligned for
any built-in type. On error, these functions return NULL. NULL may also be returned by a successful call to
malloc() with a size of zero, or by a successful call to calloc() with nmemb or size equal to zero.
...
ERRORS
calloc(), malloc(), realloc(), and reallocarray() can fail with the following error:
ENOMEM Out of memory.
...
This explains why we got the error we did.
One of the last, but most challenging aspects of APIs in
C is that of memory ownership. The big question is: when a
pointer passed into a function, or returned from a function,
who is responsible for freeing the memory? This
is due to a combination of factors, mainly:
freed,
so someone has to do it, and it should be freed
only once, andIt is easiest to understand this issue through the concept of ownership: simply put, the owner of a piece of memory is the one that should either free it, or pass it to another part of the code that becomes the owner. In contrast, a caller can pass a pointer to memory into a function allowing it to borrow that memory, but the function does not free it, and after it returns it should not further access the memory. There are three general patterns:
A caller passes a pointer into a function,
and passes the ownership of that data to the
function. This is common for data-structures whose job is to
take a pointer of data to store, and at that point, they own
the memory. Think: if we enqueue data into a
queue.
Examples: The key-value store’s
put function owns the passed in data (assuming
it takes a void *.
A function returns a pointer to the function
caller and passes the ownership to the caller. The
caller must later free the data. This is also
common in data-structures when we wish to retrieve the data.
Think: if we dequeue data from a queue.
Examples: strdup creates a
new string, expecting the caller to free it.
A caller passes a pointer into a function,
but only allows the function to borrow the data.
Thus the caller still owns the memory (thus is still
responsible to free the data) after the
function returns, and the function should not
maintain any references to the data. Think: most of the
string functions that take a string as an argument, perform
some operation on it, and return expecting the caller to
still free the string.
Examples: Most other functions we’ve seen borrow pointers, perform operations, and then don’t maintain references to them.
A function returns a pointer to the function
caller that enables the caller to borrow the data.
This requires a difficult constraint: the caller can access
the data, but must not maintain a pointer to it after the
function or API (that still owns the data)
frees it.
Examples: The key-value store’s
get function transfers ownership to the
caller.
The memory ownership constraints are an agreement between the calling function, and a function being called.
An old interview question:
How can you write a function that determines if the execution stack grows upwards (from lower addresses to higher), or downwards?
Write this function!
Lets look at a simple key-value store that needs to learn to be more careful about memory. Above each function, we specify the ownership of pointers being passed – either passing ownership, or borrowing the memory. The current implementations do not adhere to these specifications.
#include <string.h>
#include <malloc.h>
/*
* Lets just use a single key/value as a proxy for an entire kv/store.
* You can assume that the stored values are strings, so `strdup` can
* allocate the memory for and copy the values. You absolutely will
* have to use `strdup` in some of the functions.
*/
static int kv_key;
static char *kv_value;
/**
* The returned value should not be maintained in the data-structure
* and should be `free`d by the caller.
*/
char *
get_pass_ownership(int key)
{
if (key != kv_key) return NULL;
return kv_value;
}
/**
* Pointers to the returned value are maintained in this data-structure
* and it will be `free`d by the data-structure, not by the caller
*/
char *
get_borrow(int key)
{
if (key != kv_key) return NULL;
return kv_value;
}
/**
* Pointers to the `value` passed as the second argument are maintained by
* the caller, thus the `value` is borrowed here. The `value` will be `free`d
* by the caller.
*/
void
set_borrow(int key, char *value)
{
/* What do we do with `kv_value`? Do we `strdup` anything? */
kv_key = key;
kv_value = value;
return;
}
/**
* Pointers to the `value` passed as the second argument are not maintained by
* the caller, thus the `value` should be `free`d by this data-structure.
*/
void
set_pass_ownership(int key, char *value)
{
/* What do we do with `kv_value`? Do we `strdup` anything? */
kv_key = key;
kv_value = value;
return;
}
int
main(void)
{
/* The values we pass in. */
char *v_p2p = strdup("value1"); /* calls `malloc` ! */
char *v_p2b = strdup("value2");
char *v_b2p = strdup("value3");
char *v_b2b = strdup("value4");
/* The return values */
char *r_p2p, *r_p2b, *r_b2p, *r_b2b;
/* p2p: passing ownership on set, passing ownership on get */
set_pass_ownership(0, v_p2p);
r_p2p = get_pass_ownership(0);
/* The question: should we `free(v_p2p)`?, `free(r_p2p)`? */
/* p2b: passing ownership on set, borrowing memory for get */
set_pass_ownership(0, v_p2b);
r_p2b = get_borrow(0);
/* The question: should we `free(v_p2b)`?, `free(r_p2b)`? */
/* b2p: borrowing ownership on set, passing ownership on get */
set_borrow(0, v_b2p);
r_b2p = get_pass_ownership(0);
/* The question: should we `free(v_b2p)`?, `free(r_b2p)`? */
/* b2b: borrowing ownership on set, borrowing on get */
set_borrow(0, v_b2b);
r_b2b = get_borrow(0);
/* The question: should we `free(v_b2b)`?, `free(r_b2b)`? */
if (kv_value) free(kv_value);
printf("Looks like success!...but wait till we valgrind; then ;-(\n");
return 0;
}Program output:
Looks like success!...but wait till we valgrind; then ;-(
The above code is hopelessly broken. Run it in valgrind to see.
Tasks:
malloc/free/strdup
operations that are necessary both in the key-value
implementation, and in the client (main) to
make both the caller and callee abide by the memory
ownership constraints.main? Why?get/set) be used? Why?Programming is hard. Really hard. When you write a program, you have to consider complex logic and the design of esoteric data-structures, all while desperately trying to avoid errors. It is an exercise that challenges all of our logical, creative, and risk management facilities. As such, programming is the act of methodically conquering complexity with our ability to abstract. We walk on the knives-edge of our own abstractions where we can barely, just barely, make progress and engineer amazing systems.
Imagine if when programming and debugging, we had to consider the actions and faults of all other programs running in the system? If your program crashed because one of your colleagues’ program had a bug, how could you make progress?
Luckily, we stand on the abstractions of those that came before us. A key abstraction provided by systems is that of isolation - that one program cannot arbitrarily interfere with the execution of another. This isolation is a core provision of Operating Systems (OSes), and a core abstraction they provide is the process. At the highest-level, each process can only corrupt its own memory, and a process that goes into an infinite loop cannot prevent another process for executing.
A process has a number of properties including:
printf, we interact with the descriptor to
output to the terminal. The OS prevents processes from
accessing and changing resources they shouldn’t have access
to.pid).pwd.gparmer) for whom the
process executes on the behalf of.
Throughout the class we’ll uncover more and more of these properties, but in this lecture, we’ll focus on the lifecycle of a process, and its relationship to its parent. Processes are the abstraction that provide isolation between users, and between computations, thus it is the core of the security of the system.
UNIX is an old system, having its origins in 1969 at Bell Labs when it was created by Ken Thompson and Dennis Ritchie7. Time has shown it is an “oldie but goodie” – as most popular OSes are derived from it (OSX, Linux, Android, …). In the OS class, you’ll dive into an implementation of UNIX very close to what it was in those early years! The original paper is striking in how uninteresting it is to most modern readers. UNIX is so profound that it has defined what is “normal” in most OSes.
UNIX Philosophy: A core philosophy of UNIX is that applications should not be written as huge monolithic collections of millions of lines of code, and should instead be composed of smaller programs, each focusing on “doing one thing well”. Programs focus on inputting text, and outputting text. Pipelines of processes take a stream of text, and process on it, process after process to get an output. The final program is a composition of these processes composed to together using pipelines.
In the early years, many different UNIX variants were
competing for market-share along-side the likes of DOS,
Windows, OS2, etc… This led to differences between the UNIX
variants which prevented programmers from effectively
writing programs that would work across the variants. In
response to this, in the late 1980s, POSIX standardized many
aspects of UNIX including command-line programs and the
standard library APIs. man pages are
documentation for what is often part of the POSIX
specification. I’ll use the term UNIX throughout the class,
but often mean POSIX. You can think of Linux as an
implementation of POSIX and as a variant of UNIX.
UNIX has been taken into many different directions. Android and iOS layer a mobile runtime on top of UNIX; OSX and Ubuntu layer modern graphics and system management on top of UNIX, and even Windows Subsystem for Linux (WSL) enables you to run a POSIX environment inside of Windows. In many ways, UNIX has won. However, it has won be being adapted to different domains – you might be a little hard-pressed looking at the APIs for some of those systems to understand that it is UNIX under the hood. Regardless, in this class, we’ll be diving into UNIX and its APIs to better understand the core technology underlying most systems we use.
The core UNIX philosophy endures, but has changed shape.
In addition to the pervasive deployment of UNIX, Python is
popular as it is has good support to compose together
disparate services and libraries, and applications are
frequently compositions of various REST/CRUD webservice
APIs, and json is the unified language in which
data is shared.
When we construct a pipeline of processes…
$ ps aux | grep gparmer
we’re executing two processes that separately run the
code of the programs ps and grep.
The shell is the parent process that creates the
ps and grep processes. It ensures
the descriptor for the output of the
ps process sends its data to the descriptor in
the grep process for its input.
What are the functions involved in the creation, termination, and coordination between processes?
fork - Create a new, child, process from
the state of the calling process.getpid, getppid - get the
unique identifier (the Process ID, or pid) for the process,
or for its parent.exit - This is the function to terminate a
process.wait - A parent awaits a child
exiting.exec - Execute a new program using
the current process8.Lets be a little more specific. A program is an
executable file. We see them in the file system (for
example, see ls /bin), and create them when we
compile our programs. When you compile your C programs, the
generated file that you can execute, is a program. A
process is an executing program with the
previously mentioned characteristics (disjoint memory,
descriptors to resources, etc…).
First, lets look at fork:
#include <stdio.h>
#include <unistd.h> /* fork, getpid */
int
main(void)
{
pid_t pid; /* "process identifier" */
printf("pre-fork:\n\t parent pid %d\n", getpid());
fflush(stdout);
pid = fork(); /* so much complexity in such a simple call! */
printf("post-fork:\n\treturned %d,\n\tcurrent pid %d\n\tparent pid %d\n",
pid, getpid(), getppid());
return 0;
}Program output:
pre-fork:
parent pid 1127875
post-fork:
returned 1127876,
current pid 1127875
parent pid 1127866
post-fork:
returned 0,
current pid 1127876
parent pid 2850
We can see the fork creates a new process
that continues running the same code, but has a separate set
of memory. A few observations:
pid
variable is different for the two processes. In fact, after
the fork call, all of memory is copied
into the new process and is disjoint from that of the
parent.getpid function returns the current
process’ process identifier.fork returns two values - one in
the parent in which it returns the process id of the newly
created child, and one in the child in which it returns
0. This return value is one of the main
indicators of if the current process is the parent, or the
child.So much complexity for such a simple function prototype!
exit and wait.
In contrast to fork that creates a new process,
exit is the function that used to terminate the
current process. exit’s integer argument is the
“exit status” – 0 is equivalent to
EXIT_SUCCESS, and denotes a successful
execution of the program, and -1 is
EXIT_FAILURE, and denotes execution failure.
The return value from main is also this exit
status. Where is the exit status used?
Parent processes have the responsibility to manage their
children. This starts with the wait function. A
parent that calls wait will “wait” for the
child to exit. Then, wait enables
a parent to retrieve the exit status of the child.
The relationship between wait (in the
parent), and exit or return from main (in the
child) is a little hard to understand. A figure always
helps…
Adopting orphaned children. What happens
if a parent exits before a child? Who is allowed to
wait and get the child’s status? Intuition
might tell you that it is the grandparent but that
is not the case. Most programs are not written to understand
that wait might return a pid other
than ones they’ve created. Instead, if a parent terminates,
the init process – the processes that is the
ancestor of all processes – adopts the
child, and waits for it. We’ll dive into
init later in the class.
Interpreting exit status.
Despite wait’s status return value being a
simple int, the bits of that integer
mean very specific things. See the man page for
wait for details, but we can use a set of
functions (really “macros”) to interpret the status
value:
WIFEXITED(status) will return
0 if the process didn’t exit (e.g. if it
faulted instead), and non-0 otherwise.WEXITSTATUS(status) will get the intuitive
integer value that was passed to exit (or
returned from main), but assumes that this
value is only 8 bits large, max (thus has a maximum value of
256).#include <unistd.h> /* fork, getpid */
#include <wait.h> /* wait */
#include <stdlib.h> /* exit */
#include <stdio.h>
int
main(void)
{
pid_t pid, child_pid;
int i, status;
/* Make four children. */
for (i = 0; i < 4; i++) {
pid = fork();
if (pid == 0) { /* Are we the child? */
printf("Child %d exiting with %d\n", getpid(), -i);
if (i % 2 == 0) exit(-i); /* terminate immediately! */
else return -i; /* also, terminate */
}
}
/*
* We are the parent! Loop until wait returns `-1`, thus there are no more children. Note
* that this strange syntax says "take the return value from `wait`, put it into the variable
* `child_pid`, then compare that variable to `-1`".
*/
while ((child_pid = wait(&status)) != -1) { /* no more children when `wait` returns `-1`! */
/* wait treats the `status` as the second return value */
if (WIFEXITED(status)) {
/* Question: why the `(char)` cast? */
printf("Child %d exited with exit status %d\n", child_pid, (char)WEXITSTATUS(status));
}
}
return 0;
}Program output:
Child 1127895 exiting with 0
Child 1127896 exiting with -1
Child 1127897 exiting with -2
Child 1127898 exiting with -3
Child 1127895 exited with exit status 0
Child 1127896 exited with exit status -1
Child 1127897 exited with exit status -2
Child 1127898 exited with exit status -3
The output of this program can actually vary, somewhat arbitrarily, in the following way: any of the output lines can be reordered with respect to any other, except that the parent’s print out for a specific child must come after it.
For example, all of the following are possible:
Child 4579 exiting with 0
Child 4580 exiting with -1
Child 4581 exiting with -2
Child 4582 exiting with -3
Child 4579 exited with exit status 0
Child 4580 exited with exit status -1
Child 4581 exited with exit status -2
Child 4582 exited with exit status -3
Child 4579 exiting with 0
Child 4579 exited with exit status 0
Child 4580 exiting with -1
Child 4580 exited with exit status -1
Child 4581 exiting with -2
Child 4581 exited with exit status -2
Child 4582 exiting with -3
Child 4582 exited with exit status -3
Child 4582 exiting with -3
Child 4582 exited with exit status -3
Child 4581 exiting with -2
Child 4581 exited with exit status -2
Child 4580 exiting with -1
Child 4580 exited with exit status -1
Child 4579 exiting with 0
Child 4579 exited with exit status 0
This non-determinism is a product of the
isolation that is provided by processes. The OS
switches back and forth between processes frequently (up to
thousands of time per second!) so that if one goes into an
infinite loop, others will still make progress. But this
also means that the OS can choose to run any of the
processes that are trying to execute at any point in time!
We cannot predict the order of execution, completion, or
wait notification. This non-deterministic
execution is called concurrency. You’ll want to
keep this in mind as you continue to learn the process APIs,
and when we talk about IPC, later.
exitLets dive into exit a little bit.
#include <stdlib.h>
int
main(void)
{
exit(EXIT_SUCCESS);
}Believe it or not, this is identical to
#include <stdlib.h>
int
main(void)
{
return EXIT_SUCCESS;
}…because when main returns, it calls
exit.
Investigating
mainreturn →exitviagdb. You can see this by diving into thereturn.cprogram withgdb -tui, breakpointing before the return (e.g.b 5), and single-stepping through the program. You’ll want tolayout asmto drop into “assembly mode”, and single step through the assembly and if you want to step through it instruction at a time usestepiorsi. You can see that it ends up calling__GI_exit.__GI_*functions are glibc internal functions, so we see thatlibcis actually callingmain, and when it returns, it is then going through its logic forexit.
Though exit is a relatively simple API, it
has a few interesting features.
on_exit which takes a function pointer to
be called upon returning from main or calling
exit. The function receives the status, and an
argument pass to on_exit. So we can see that
calling exit does not immediately
terminate the process!atexit takes a simpler function pointer
(that doesn’t receive any arguments) to call at exit._exit terminates a process without
calling any of the function pointers set up by
on_exit nor atexit.#include <stdlib.h>
#include <stdio.h>
int value = 0;
void
destructor(int status, void *data)
{
int *val = data;
printf("post-exit: status %d, data %d\n", status, *val);
}
int
main(void) {
on_exit(destructor, &value); /* `value` will get passed to the function */
value = 1;
printf("pre-exit\n");
return 0;
}Program output:
pre-exit
post-exit: status 0, data 1
fork
QuestionsHow many processes (not including the initial one) are created in the following?
#include <unistd.h>
int
main(void)
{
fork();
fork();
return 0;
}What are the possible outputs for:
#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>
int
main(void)
{
pid_t parent = getpid();
pid_t child;
child = fork();
if (getpid() == parent) {
printf("p");
wait(NULL); /* why is `NULL` OK here? */
printf("w");
} else {
printf("c");
}
return 0;
}Recall, a program is an executable that you
store in your file-system. These are the output of your
compilation, the “binaries”. So fork lets us
make a new process from an existing process, both of which
are executing within a single program. But how do we execute
a new program?
The exec family of system calls will do the
following:
main execution).The core insight is that the same process continues execution; it simply continues execution in the new program. This is a little unnatural, but has a few handy side-effects:
pid_t) and the parent/child
relationships of the process.exec.
exec
APIsThere are a few ways to execute a program:
execlexeclpexecleexecvexecvpThe naming scheme is quite annoying and hard to remember,
but the man page has a decent summary. The
trailing characters correspond to specific operations that
differ in how the command-line arguments are passed to the
main (l means pass the arguments
to this exec call to the program, while
v means pass the arguments as an array of the
arguments into the exec call), how the
program’s path is specified (by default, an “absolute path”
starting with / must be used, but in a
v variant, the binary is looked up using
comparable logic to your shell), and how environment
variables are passed to the program. For now, we’ll simply
use execvp, and cover the rest in subsequent
sections.
#include <stdio.h>
#include <assert.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
int
main(int argc, char *argv[])
{
int status;
pid_t pid;
if (fork() == 0) { /* child */
char *args[] = {"./03/hw.bin", NULL}; /* requires a NULL-terminated array of arguments */
/* here we use a relative path (one that starts with `.`), so we must use the `p` postfix */
if (execvp("./03/hw.bin", args)) {
/*
* This should really *not return* in the nromal case. This memory context
* goes away when we `exec`, and is replaced by the new program, thus simply
* removing this code. Thus, this will only continue executing if an error
* occured.
*/
perror("exec");
return EXIT_FAILURE;
}
}
pid = wait(&status);
assert(pid != -1);
printf("Parent: status %d\n", WEXITSTATUS(status));
return 0;
}Program output:
hello world!
Parent: status 42
I think that we likely have a decent intuition about what the command-line arguments are`:
$ ls /bin /sbin
The ls program takes two arguments,
/bin and /sbin. How does
ls access those arguments?
Lets look at a chain of programs that
exec each other. The first program (that you
see here) is called inline_exec_tmp, and the
programs 03/args?.c are subsequently
executed.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
char *prog = "./03/args1.bin";
int
main(int argc, char *argv[])
{
char *args[] = {prog, "hello", "world", NULL};
if (argc != 1) return EXIT_FAILURE;
printf("First program, arg 1: %s\n", argv[0]);
fflush(stdout);
/* lets execute args1 with some arguments! */
if (execvp(prog, args)) {
perror("exec");
return EXIT_FAILURE;
}
return 0;
}Program output:
First program, arg 1: ./inline_exec_tmp
Inside ./03/args2.bin
arg 0: ./03/args2.bin
arg 1: hello
arg 2: world
args1.c is
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>
char *prog = "./03/args2.bin";
int
main(int argc, char *argv[])
{
int i;
char **args; /* an array of strings */
printf("Inside %s\n", argv[0]);
/* lets just pass the arguments on through to args2! */
args = calloc(argc + 1, sizeof(char *));
assert(args);
args[0] = prog;
for (i = 1; i < argc; i++) {
args[i] = argv[i];
}
args[i] = NULL; /* the arguments need to be `NULL`-terminated */
if (execvp(prog, args)) {
perror("exec");
return EXIT_FAILURE;
}
return 0;
}…and args2.c is
#include <stdio.h>
int
main(int argc, char *argv[])
{
int i;
printf("Inside %s\n", argv[0]);
for (i = 0; i < argc; i++) {
printf("arg %d: %s\n", i, argv[i]);
}
return 0;
}So we see the following.
execs your programs).v variants of exec require
the NULL termination of the argument array,
something that is easy to mess up!Parsing through the command-line arguments can be a
little annoying, and getopt can help.
Environment variables are UNIX’s means of providing configuration information to any process that might want it. They are a key-value store9 that maps an environment variable to its value (both are strings).
Environment variables are used to make configuration information accessible to programs. They are used instead of command-line arguments when:
Example common environment variables include:
PATH - a :-separated list of
file system paths to use to look for programs when attempt
to execute a program.HOME - the current user’s home directory
(e.g. /home/gparmer).USER - the username
(e.g. gparmer).TEMP - a directory that you can use to
store temporary files.Many programs setup and use their own environment
variables. Note that environment variables are pretty
pervasively used – simple libraries exist to access them
from python, node.js,
rust, java, etc…
You can easily access environment variables from the command line:
$ echo $HOME
/home/gparmer
$ export BESTPUP=penny
$ echo $BESTPUP
penny
Any program executed from that shell, will be able to
access the “BESTPUP” environment variable. The
env command dumps out all current environment
variables.
So how do we access the environment variable key-value store in C? The core functions for working with environment variables include:
getenv - Get an environment variable’s
value.setenv - Set one of environment variable’s
value (used by the shell to set up children’s
variables).clearenv - Reset the entire
environment.environ array - This is the array of
environment variables you’ll see in the man
pages. You don’t want to access/modify this directly, but
you can imagine it is used to back all of the previous
calls.#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
int
main(int argc, char *argv[])
{
char *u = getenv("USER");
char *h = getenv("HOME");
assert(u && h);
printf("I am %s, and I live in %s\n", u, h);
return 0;
}Program output:
I am gparmer, and I live in /home/gparmer
You can see all of the environmental variables available by default with:
$ env
SHELL=/bin/bash
DESKTOP_SESSION=ubuntu
EDITOR=emacs -nw
PWD=/home/gparmer/repos/gwu-cs-sysprog/22/lectures
LOGNAME=gparmer
HOME=/home/gparmer
USERNAME=gparmer
USER=gparmer
PATH=/home/gparmer/.local/bin::/home/gparmer/.cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/racket/bin/
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
...
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
int
main(int argc, char *argv[])
{
char *u = getenv("USER");
assert(u);
printf("user: %s\n", u);
fflush(stdout);
if (setenv("USER", "penny", 1)) {
perror("attempting setenv");
exit(EXIT_FAILURE);
}
if (fork() == 0) {
char *u = getenv("USER");
char *args[] = { "./03/envtest.bin", "USER", NULL };
/* environment is inherited across *both* `fork` and `exec`! */
printf("user (forked child): %s\n", u);
fflush(stdout);
if (execvp("./03/envtest.bin", args)) {
perror("exec");
return EXIT_FAILURE;
}
}
return 0;
}Program output:
user: gparmer
user (forked child): penny
Environment variable USER -> penny
03/envtest.c is
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>
int
main(int argc, char *argv[])
{
char *e = "NOARG";
char *v = "NOVAL";
if (argc == 2) {
e = argv[1];
v = getenv(e);
if (!v) {
v = "";
}
}
printf("Environment variable %s -> %s\n", e, v);
return 0;
}A common use of environment variables is the “home” directory in your shell. How is this implemented?
$ cd ~
The ~ means “my home directory”. To
understand what directory is a user’s home directory, you
can getenv(HOME)!
posix_spawnfork and exec are not the only
functions to execute a program. posix_spawn
also enables the creation of a new process that execute a
given program. posix_spawn performs three
high-level actions:
fork to create a new process,posix_spawn, andexec to execute a program and pass
arguments/environmental variables.It is strictly more limited in what it can do
than fork and exec, but this is
often a feature not a bug. fork is really hard
to use well, and can be quite confusing to use. It is
considered by some to be a flawed
API. Thus the focus of posix_spawn on
specific executing a new program can be quite useful to
simply programs.
We’ve seen how to create and manage processes and how to
execute programs. Amazingly, modern systems have pretty
spectacular facilities for introspection into
executing processes. Introspection facilities generally let
you dive into something as it runs. The most immediate
example of this is gdb or any debugger that let
you dive into an implementation. Looking at
pause.c:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
const int global_readonly = 0;
int global = 0;
int
main(void)
{
int stack_allocated;
int *heap = malloc(sizeof(int));
printf("pid %d\nglobal (RO):\t%p\nglobal: \t%p\nstack: \t%p\nheap: \t%p\nfunction:\t%p\n",
getpid(), &global_readonly, &global, &stack_allocated, heap, main);
pause();
return 0;
}Program output:
pid 9310
global (RO): 0x55be6d40b008
global: 0x55be6d40d014
stack: 0x7ffc7abafc7c
heap: 0x55be6da162a0
function: 0x55be6d40a1c9
Lets take the process identifier, and dive into the
process! The “proc filesystem” in Linux is
a part of the file system devoted to representing processes.
There is a subdirectly in it for each process in the system.
Lets check out process 9310.
$ cd /proc/9310/
$ ls
arch_status cgroup coredump_filter exe io maps mountstats oom_adj patch_state sched smaps statm timers
attr clear_refs cpuset fd limits mem net oom_score personality schedstat smaps_rollup status timerslack_ns
autogroup cmdline cwd fdinfo loginuid mountinfo ns oom_score_adj projid_map sessionid stack syscall uid_map
auxv comm environ gid_map map_files mounts numa_maps pagemap root setgroups stat task wchan
$ cat maps
55be6d409000-55be6d40a000 r--p 00000000 08:02 1315893 /home/ycombinator/repos/gwu-cs-sysprog/22/lectures/03/pause.bin
55be6d40a000-55be6d40b000 r-xp 00001000 08:02 1315893 /home/ycombinator/repos/gwu-cs-sysprog/22/lectures/03/pause.bin
55be6d40b000-55be6d40c000 r--p 00002000 08:02 1315893 /home/ycombinator/repos/gwu-cs-sysprog/22/lectures/03/pause.bin
55be6d40c000-55be6d40d000 r--p 00002000 08:02 1315893 /home/ycombinator/repos/gwu-cs-sysprog/22/lectures/03/pause.bin
55be6d40d000-55be6d40e000 rw-p 00003000 08:02 1315893 /home/ycombinator/repos/gwu-cs-sysprog/22/lectures/03/pause.bin
55be6da16000-55be6da37000 rw-p 00000000 00:00 0 [heap]
7ff4a127f000-7ff4a12a4000 r--p 00000000 08:02 11797912 /lib/x86_64-linux-gnu/libc-2.31.so
7ff4a12a4000-7ff4a141c000 r-xp 00025000 08:02 11797912 /lib/x86_64-linux-gnu/libc-2.31.so
7ff4a141c000-7ff4a1466000 r--p 0019d000 08:02 11797912 /lib/x86_64-linux-gnu/libc-2.31.so
7ff4a1466000-7ff4a1467000 ---p 001e7000 08:02 11797912 /lib/x86_64-linux-gnu/libc-2.31.so
7ff4a1467000-7ff4a146a000 r--p 001e7000 08:02 11797912 /lib/x86_64-linux-gnu/libc-2.31.so
7ff4a146a000-7ff4a146d000 rw-p 001ea000 08:02 11797912 /lib/x86_64-linux-gnu/libc-2.31.so
7ff4a146d000-7ff4a1473000 rw-p 00000000 00:00 0
7ff4a1495000-7ff4a1496000 r--p 00000000 08:02 11797896 /lib/x86_64-linux-gnu/ld-2.31.so
7ff4a1496000-7ff4a14b9000 r-xp 00001000 08:02 11797896 /lib/x86_64-linux-gnu/ld-2.31.so
7ff4a14b9000-7ff4a14c1000 r--p 00024000 08:02 11797896 /lib/x86_64-linux-gnu/ld-2.31.so
7ff4a14c2000-7ff4a14c3000 r--p 0002c000 08:02 11797896 /lib/x86_64-linux-gnu/ld-2.31.so
7ff4a14c3000-7ff4a14c4000 rw-p 0002d000 08:02 11797896 /lib/x86_64-linux-gnu/ld-2.31.so
7ff4a14c4000-7ff4a14c5000 rw-p 00000000 00:00 0
7ffc7ab90000-7ffc7abb1000 rw-p 00000000 00:00 0 [stack]
7ffc7abe1000-7ffc7abe4000 r--p 00000000 00:00 0 [vvar]
7ffc7abe4000-7ffc7abe5000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
There’s a lot going on here. The most
important parts of the ranges of addresses on the
left that tell us where we can find the
pause.bin program’s code, read-only global
data, global read-writable data, heap, and stack! We also
see that the number of maps in a very simple process is
surprisingly large. Let me manually focus in on a few parts
of this.
...
55be6d40a000-55be6d40b000 r-xp ... /03/pause.bin <-- &main (0x55be6d40a1c9)
55be6d40b000-55be6d40c000 r--p ... /03/pause.bin <-- &global_ro (0x55be6d40b008)
...
55be6d40d000-55be6d40e000 rw-p ... /03/pause.bin <-- &global (0x55be6d40d014)
...
55be6da16000-55be6da37000 rw-p ... [heap] <-- heap (0x55be6da162a0)
...
7ff4a14c4000-7ff4a14c5000 rw-p ...
7ffc7ab90000-7ffc7abb1000 rw-p ... [stack] <-- stack (0x7ffc7abafc7c)
...
We can see that each of the variables we’re accessing in
the program are at addresses that correspond to the
ranges of memory in the maps. Even
more, the /proc/9310/mem file contains the
actual memory for the process! This is really amazing:
we can watch the memory, even as it changes, as a process
actually executes. This is how debuggers work!
As you can see, processes are data too!
forkImplement forkonacci! This will solve the
nth fibonacci
number, using fork for the recursion, and a
combination of exit to “return” and
wait to retrieve the returned value, instead of
function calls. The normal recursive implementation:
unsigned int
fibonacci(unsigned int n)
{
if (n == 0 || n == 1) return n;
return fibonacci(n - 1) + fibonacci(n - 2);
}This will work decently for n <= 12, but
not for n > 12. Why10?
How do you think that the shell-based support for
environment variables (export BESTPUP=penny) is
implemented? Specifically, if we do the following…
$ export BESTPUP=penny
$ ./03/envtest.bin BESTPUP // this just prints the environment variable
Environment variable BESTPUP -> penny
…which process is using which APIs? Put another way, how
is export implemented?
The /proc filesystem is a treasure trove of
information. You can dive into the process with
pid N’s information in
/proc/N/. You’ll only have access to the
informationx for your processes. So how do you find the
pid of your processes? You can find the
pid of all of the processes that belong to you
using
ps aux | grep gparmer | awk '{print $2}'11, but replacing your
username for gparmer.
Choose one of those, and go into its directory in
/proc. It it doesn’t work, you might have
chosen a process that has terminated since you find its ID.
Try again.
What do you think the following files contain:
cmdlineenvironstatusSome of the files are binary, which means you
might need to use xxd (a hexdecimal binary view
program) to look at them. In xxd, pay attention
to the right column here.
We’ve discussed the APIs for process manipulation and program execution, but now we’re going to start discussing how processes can manipulate the resources it has access to. Two of the key UNIX mechanisms that processes must use effectively are
pwd),Each process has a “working directory” which is used as
the base for any relative paths in the file system.
All file system paths are either absolute paths –
in which case they begin with / and specify the
path from the “root”, or relative paths. Relative
paths are quite frequently used when we interact with the
shell. Every time you type cd blah/, you’re
saying “please change the current working directory to
blah/ which is a directory in the current
working directory. Similarly,
cat penny_the_best_pup.md looks up the file
penny_the_best_pup.md in the current working
directory.
This is backed by a simple API:
getcwd which lets you get the current
process’ working directory, andchdir which enable the process to change
its working directory.#include <unistd.h>
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
char *wd = getcwd(NULL, 0);
assert(wd);
printf("Current directory: %s\n", wd);
free(wd);
if (chdir("..") == -1) {
perror("chdir");
abort();
}
wd = getcwd(NULL, 0);
printf("New current dir: %s\n", wd);
free(wd);
return 0;
}Program output:
Current directory: /home/gparmer/repos/systems_programming
New current dir: /home/gparmer/repos
The cd command in shells is actually not
a program, and is instead a shell-internal function.
You can confirm this using the which program,
which tells you the location of a program:
$ which ls
/bin/ls
$ which pwd
/bin/pwd
$ which cd
$
cd isn’t a program, so which
can’t tell us where it is!
Each process has a set of descriptors that are each associated with some resource in the system. Each descriptor is an integer that is passed into a family of APIs that perform operations on their corresponding resources. For example, most processes have at least a set of three descriptors that are each attached to a resource:
STDIN_FILENO = 0 (defined in
unistd.h) - This is the standard
input, or the main way the process gets input from the
system. As such, the resource is often the terminal if the
user directly provides input, or sometimes the output of a
previous stage in a command-line pipeline.STDOUT_FILENO = 1 - This is
the standard output, or the main way the process
sends its output to the system. When we call
printf, it will, by default, output using this
descriptor.STDERR_FILENO = 2 - This is
the standard error, or the main way that the
process outputs error messages. perror (and
other error-centric APIs) output to the standard error.Each of these descriptors is associated with a
potentially infinite sequence of bytes, or
channel12. When we
type at the shell, we’re providing a channel of data that is
sent to the standard input of the active process. When a
process prints, it sends a sequence of characters to the
standard output, and because programs can loop, that stream
can be infinite! Channels, however, can terminate
if they run out of data. This could happen, for example, if
a channel reading through a file reaches the end of the
file, or if the user hits cntl-d on their
terminal to signify “I don’t have any more data”.
File descriptors are analogous to pointers. The descriptors effectively point to channel resources. Some of the core operations on file descriptors include:
read - Pull a sequence of bytes from the
channel into a buffer in memory. This is one of the core
means of reading data from files, pipes, and other
resources.write - Send a sequence of bytes from a
buffer in memory into a channel. At its core, even
printf ends up being a write to
the standard out descriptor.dup/dup2 - Take a file
descriptor as an argument, and create a new descriptor that
is a duplicate of the existing descriptor. Note,
this does not duplicate the channel itself, just the
descriptor. The analogy is that dup just copies
the pointer, not the resource it points to.close - Deallocate a file descriptor. This
is like removing a pointer – it doesn’t necessarily remove
the backing channel, for example, if the descriptor was
previously duped.Lets see some of these in action.
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <stdlib.h>
int
main(void)
{
char *hw = "hello world\n";
char output[256];
int fd;
ssize_t amnt; /* signed size */
amnt = write(STDOUT_FILENO, hw, strlen(hw));
if (amnt == 0) { /* maybe STDOUT writes to a file with no disk space! */
/* this is *not* an error, so errno not set! */
printf("Cannot write more data to channel\n");
exit(EXIT_FAILURE);
} else if (amnt > 0) {
/* normally, the return value tells us how much was written */
assert(amnt == (ssize_t)strlen(hw));
} else { /* amnt == -1 */
perror("Error writing to stdout");
exit(EXIT_FAILURE);
}
amnt = write(STDERR_FILENO, hw, strlen(hw));
assert(amnt >= 0);
fd = dup(STDOUT_FILENO);
assert(fd >= 0);
/* We can write formatted data out to stdout manually! */
snprintf(output, 255, "in: %d, out: %d, err: %d, new: %d\n",
STDIN_FILENO, STDOUT_FILENO, STDERR_FILENO, fd);
output[255] = '\0';
amnt = write(fd, output, strlen(output));
/* new file descriptors are supposed to use the lowest unused descriptor! */
/* make a descriptor available */
close(STDIN_FILENO); /* STDIN is no longer really the input! */
fd = dup(STDOUT_FILENO);
printf("New descriptor @ %d\n", fd);
return 0;
}Program output:
hello world
hello world
in: 0, out: 1, err: 2, new: 3
New descriptor @ 0
You can run this, and redirect the standard error to a
file to see that writing to standard error is a different
operation than writing to standard output. For example:
$ prog 2> errors.txt will redirect file
descriptor 2 (stderr) to the file.
Lets focus in a little bit on read and
write. First, it is notable that the buffer
they take as an argument (along with its length) is simply
an array of bytes. It can be a string, or it could be the
bytes that are part of an encoded video. Put another way, by
default, channels are just sequences of bytes. It
is up to our program to interpret those bytes properly.
Second, we need to understand that the return value for
read/write has four main,
interesting conditions:
#include <unistd.h>
#include <stddef.h>
#include <string.h>
#include <errno.h>
int
main(void)
{
ssize_t amnt;
char *hi = "more complicated than you'd think...";
ssize_t hi_sz = strlen(hi);
amnt = write(STDOUT_FILENO, hi, hi_sz);
/* Can often mean that we are not able to write to the resource */
if (amnt == 0) {
/*
* Keep trying to write, or give up.
* Common return value for `read` when a file has no more data, or a pipe is closed.
*/
} else if (amnt > 0 && amnt < hi_sz) {
/*
* Didn't write everythign we wanted, better call write again sending
* data starting at `&hi[amnt]`, of length `hi_sz - amnt`.
*/
} else if (amnt == hi_sz) {
/*
* Wrote out everything! Wooo!
*/
} else { /* amnt == -1 */
/* Could be a genuine error, but not always... */
if (errno == EPIPE || errno == EAGAIN || errno == EINTR || errno == EWOULDBLOCK) {
/* conditions we should probably handle properly */
} else {
/* error in the channel! */
}
}
return 0;
}Program output:
more complicated than you'd think...
It is common to have a convention on how channel data is structured. UNIX pipeline encourage channels to be plain text, so that each program can read from their standard input, do some processing that can involved filtering out data or transforming it, and send the result to the standard out. That standard output is sent to the standard input of the next process in the pipeline. An example in which we print out each unique program that is executing on the system:
$ ps aux | tr -s ' ' | cut -d ' ' -f 11 | sort | uniq
Each of the programs in the pipeline is not configured to print out each unique process, and we are able to compose them together in a pipeline to accomplish the goal.
We’ve seen that the descriptors 0-2 are
automatically set up for our processes, and we can
use/manipulate them. But how do we create
resources? There are many different resources in a UNIX
system, but three of the main ones:
files and other file-system objects
(e.g. directories),sockets that are used to communicate over
the network, andpipes that
are used to send data and coordinate between processes.Each of these has very different APIs for creating the
resources. We’ll discuss files later, and will now focus on
pipes.
A pipe (a UNIX resource, not the
| character in a command-line pipeline) is a
channel referenced by two different file
descriptors13, one that we can
write to, and one that we can read
from. The sequence of bytes written to the pipe will be
correspondingly read out.
The pipe and pipe2 functions
create a pipe resource, and return the two file descriptors
that reference the readable and writable ends of the pipe.
Each pipe can only hold a finite capacity. Thus if
a process tries to write more than that capacity into a
pipe, it won’t complete that write until data
is read from the pipe.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <assert.h>
#include <sys/param.h> /* MIN */
/* Question: What happens when you increase `WRITE_CHUNK` (always <= sizeof(from))?? */
#define WRITE_CHUNK (1 << 8)
/* Large array containing 2^18 characters */
char from[1 << 18];
char to[1 << 18];
int
main(void)
{
int pipe_fds[2]; /* see `man pipe(3)`: `[0]` = read end, `[1]` = write end */
size_t buf_sz = sizeof(from);
size_t written = 0;
memset(from, 'z', buf_sz);
if (pipe(pipe_fds) == -1) {
perror("pipe creation");
exit(EXIT_FAILURE);
}
/* the most complicated `memcpy` implementation ever! */
while (buf_sz != 0) {
ssize_t ret_w, ret_r;
size_t write_amnt = MIN(buf_sz, WRITE_CHUNK);
ret_w = write(pipe_fds[1], &from[written], write_amnt);
if (ret_w < 0) {
perror("write to pipe");
exit(EXIT_FAILURE);
}
ret_r = read(pipe_fds[0], &to[written], ret_w);
assert(ret_r == ret_w);
/* Question: Describe at a high-level what `buf_sz` and `written` are doing here. */
buf_sz -= ret_w;
written += ret_w;
}
/* `memcmp` is like `strcmp`, but for general (non-string) memory */
assert(memcmp(from, to, sizeof(from)) == 0);
printf("Successfully memcpyed through a pipe!\n");
return 0;
}Program output:
Successfully memcpyed through a pipe!
Pipes are really quite a bit more useful when they are
used to coordinate between separate processes. Thus, lets
combine them with fork:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <assert.h>
/* Large array containing 2^20 characters */
char from[1 << 20];
char to[1 << 20];
int
main(void)
{
int pipe_fds[2]; /* see `man 3 pipe`: `[0]` = read end, `[1]` = write end */
pid_t pid;
size_t buf_sz = sizeof(from);
if (pipe(pipe_fds) == -1) {
perror("pipe creation");
exit(EXIT_FAILURE);
}
/* descriptors copied into each process during `fork`! */
pid = fork();
if (pid < 0) {
perror("fork error");
exit(EXIT_FAILURE);
} else if (pid == 0) { /* child */
ssize_t ret_w;
close(pipe_fds[0]); /* we aren't reading! */
ret_w = write(pipe_fds[1], from, buf_sz);
if (ret_w < 0) {
perror("write to pipe");
exit(EXIT_FAILURE);
}
assert((size_t)ret_w == buf_sz);
printf("Child sent whole message!\n");
} else { /* parent */
ssize_t ret_r;
ssize_t rest = buf_sz, offset = 0;
close(pipe_fds[1]); /* we aren't writing! */
while (rest > 0) {
ret_r = read(pipe_fds[0], &to[offset], rest);
if (ret_r < 0) {
perror("read from pipe");
exit(EXIT_FAILURE);
}
rest -= ret_r;
offset += ret_r;
}
printf("Parent got the message!\n");
}
return 0;
}Program output:
Parent got the message!
Child sent whole message!
The concurrency of the system enables separate
processes to be active at the same time, thus for the
write and read to be transferring
data through the pipe at the same time. This
simplifies our code as we don’t need to worry about sending
chunks of our data.
Note that we’re closeing the end of the pipe
that we aren’t using in the corresponding processes. Though
the file descriptors are identical in each process following
fork, each process does have a
separate set of those descriptors. Thus closing in
one, doesn’t impact the other. Remember, processes provide
isolation!
We can start to understand part of how to a shell might be implemented now!
Setting up pipes. Lets start with the
more obvious: for each | in a command, the
shell will create a new pipe. It is a little
less obvious to understand how the standard output for one
process is hooked up through a pipe to the
standard input of the next process. To do this, the shell
does the following procedure:
pipe.fork the processes (a fork for
each process in a pipeline).close(STDOUT_FILENO), and dup2 the
writable file descriptor in the pipe into
STDOUT_FILENO.close(STDIN_FILENO), and dup2 the
readable file descriptor in the pipe into
STDIN_FILENO.Due to this careful usage of close
to get rid of the old standard in/out, and dup
or dup2 to methodically replace it with the
pipe, we can see how the shell sets up the processes in a
pipeline!
Lets go over an example of setting up the file descriptors for a child process. This does not set up the pipe-based communication between two children, so is not sufficient for a shell; but it is well on the way. Pipes contain arbitrary streams of bytes, not just characters. This example will
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
void
perror_exit(char *s)
{
perror(s);
exit(EXIT_FAILURE);
}
int
main(void)
{
int fds[2];
pid_t pid;
/* make the pipe before we fork, so we can acccess it in each process */
if (pipe(fds) == -1) perror_exit("Opening pipe");
pid = fork();
if (pid == -1) perror_exit("Forking process");
if (pid == 0) { /* child */
/* Same as above, but for standard output */
close(STDOUT_FILENO);
if (dup2(fds[1], STDOUT_FILENO) == -1) perror_exit("child dup stdout");
close(fds[0]);
close(fds[1]);
printf("%d %c %x", 42, '+', 42);
fflush(stdout); /* make sure that we output to the stdout */
exit(EXIT_SUCCESS);
} else { /* parent */
int a, c;
char b;
/* close standard in... */
close(STDIN_FILENO);
/* ...and replace it with the input side of the pipe */
if (dup2(fds[0], STDIN_FILENO) == -1) perror_exit("parent dup stdin");
/*
* if we don't close the pipes, the child will
* always wait for additional input
*/
close(fds[0]);
close(fds[1]);
scanf("%d %c %x", &a, &b, &c);
printf("%d %c %x", a, b, c);
if (wait(NULL) == -1) perror_exit("parent's wait");
}
return 0;
}Program output:
42 + 2a
Closing pipes. reading from
a pipe will return that there is no more data on the pipe
(i.e. return 0) only if all
write-ends of the pipe are
closed. This makes sense because we think
of a pipe as a potentially infinite stream of bytes, thus
the only way the system can know that there are no more
bytes to be read, is if the write
end of the pipe cannot receive more data, i.e. if it is
closed.
This seems simple, in principle, but when implementing a
shell, you use dup to make multiple copies of
the write file descriptor. In this case, the
shell must be very careful to close its own copies because
if any write end of a pipe is open, the
reader will not realize when there is no more data
left. If you implement a shell, and it seems like
commands are hanging, and not exiting, this is
likely why.
Question. If we wanted to have a parent,
shell process setup two child connected by |,
what would the final picture (in the image of the
pictures above) look like?
We’re used to sequential execution in our
processes. Each instruction executes after the previous in
the “instruction stream”. However, systems also require
exceptional execution patterns. What happens when
you access memory that doesn’t exist
(e.g. *(int *)NULL); when you divide by
0; when a user types cntl-c; when
a process terminates; or when you make a call to access a
descriptor which somehow is not accessible outside of the
call14?
UNIX’s signals provide asynchronous execution in a process. Regardless the previous instruction a process was executing, when a signal activates, a signal handler function will immediately activate. Lets see what this looks like:
#include <signal.h> /* sigaction and SIG* */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
void
sig_fault_handler(int signal_number, siginfo_t *info, void *context)
{
/* printf has problems here; see "The Dark Side of Signals" */
printf("My downfall is the forbidden fruit at address %p.\n", info->si_addr);
/* Question: what happens if we comment this out? */
exit(EXIT_FAILURE);
return;
}
int
main(void)
{
sigset_t masked;
struct sigaction siginfo;
int ret;
sigemptyset(&masked);
sigaddset(&masked, SIGSEGV);
siginfo = (struct sigaction) {
.sa_sigaction = sig_fault_handler,
.sa_mask = masked,
.sa_flags = SA_RESTART | SA_SIGINFO /* we'll see this later */
};
if (sigaction(SIGSEGV, &siginfo, NULL) == -1) {
perror("sigaction error");
exit(EXIT_FAILURE);
}
printf("Lets live dangerously\n");
ret = *(int *)NULL;
printf("I live!\n");
return 0;
}Program output:
Lets live dangerously
My downfall is the forbidden fruit at address (nil).
make[1]: *** [Makefile:30: inline_exec] Error 1
This is kind-of crazy. We can actually execute in the signal handler when we access invalid memory! We can write code to execute in response to a segmentation fault. This is how Java prints out a backtrace – something we will do in C soon. The concepts in the above code:
sigset_t this is a bitmap with a bit per
signal. It is used to program the signal mask.
While a specific signal is executing, which signals can
occur during its execution? If a bit is set to
1, then the signal will wait for the specific
signal to complete execution, before its handler is
called. However, if a signal’s bit is set to 0,
then signals can nest: you can run a signal handler
while you’re in the middle of executing another signal
handler. sigemptyset initializes all bits to
0, while sigaddset sets the bit
corresponding to a signal number (SIGSEGV in
this case).sig_fault_handler is our signal handler. It
is the function that is called asynchronously when the
corresponding event happens. In the case of
SIGSEGV, here, the signal handler is called
whenever we access invalid memory (e.g. segmentation
fault).sigaction is the function that enables us
to set up a signal handler for a specific signal.There are signals for quite a few events. A list of all
of the signals is listed in man 7 signal and
glibc has decent documentation.
Notable signals include:
SIGCHLD - A notification that a child
process has terminated, thus is awaiting
waitSIGINT - The user typed
cntl-cSIGSTOP & SIGCONT - Stop a
child executing, and later continue its executionSIGTSTP - The user typed
cntl-z (…or other causes)SIGPIPE - write to a pipe with no
readerSIGSEGV - Invalid memory access
(segmentation fault)SIGTERM & SIGKILL -
Terminate a process; SIGTERM can be caught, but
SIGKILL is forceful,
non-optional terminationSIGHUP - The terminal attached to the shell
that created us, itself terminatedSIGALRM - A notification that time has
passedSIGUSR1 & SIGUSR2 -
User-defined signal handlers you can use for your own
machinationsEach signal has a default behavior that triggers if you do not define a handler. These are either to ignore the signal, terminate the process, to stop the process executing, and to continue its execution.
Lets see ualarm in action:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
volatile int timers = 0;
void
sig_handler(int signal_number, siginfo_t *info, void *context)
{
timers++;
return;
}
void
setup_signal(int signo)
{
sigset_t masked;
struct sigaction siginfo;
int ret;
sigemptyset(&masked);
sigaddset(&masked, signo);
siginfo = (struct sigaction) {
.sa_sigaction = sig_handler,
.sa_mask = masked,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(signo, &siginfo, NULL) == -1) {
perror("sigaction error");
exit(EXIT_FAILURE);
}
}
int
main(void)
{
int t = timers;
setup_signal(SIGALRM);
ualarm(1000, 1000); /* note: 1000 microseconds is a millisecond, 1000 milliseconds is a second */
while (t < 10) {
if (timers > t) {
printf("%d milliseconds passed!\n", t);
t = timers;
}
}
return 0;
}Program output:
0 milliseconds passed!
1 milliseconds passed!
2 milliseconds passed!
3 milliseconds passed!
4 milliseconds passed!
5 milliseconds passed!
6 milliseconds passed!
7 milliseconds passed!
8 milliseconds passed!
9 milliseconds passed!
Question: Track and explain the control flow through this program.
SIGKILL and SIGSTOP are unique
in that they cannot be disabled, and handlers for
them cannot be defined. They enable non-optional control of
a child by the parent.
Another example of coordination between parent and child
processes. We can use signals to get a notification that
a child has exited! Additionally, we can send the
SIGTERM signal to terminate a process (this is
used to implement the kill command line program
– see man 1 kill).
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h> /* kill, pause */
#include <assert.h>
#include <sys/wait.h>
void
sig_handler(int signal_number, siginfo_t *info, void *context)
{
switch(signal_number) {
case SIGCHLD: {
/* see documentation on `siginfo_t` in `man sigaction` */
printf("%d: Child process %d has exited.\n", getpid(), info->si_pid);
fflush(stdout);
break;
}
case SIGTERM: {
printf("%d: We've been asked to terminate. Exit!\n", getpid());
fflush(stdout);
exit(EXIT_SUCCESS);
break;
}}
return;
}
void
setup_signal(int signo, void (*fn)(int , siginfo_t *, void *))
{
sigset_t masked;
struct sigaction siginfo;
int ret;
sigemptyset(&masked);
sigaddset(&masked, signo);
siginfo = (struct sigaction) {
.sa_sigaction = fn,
.sa_mask = masked,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(signo, &siginfo, NULL) == -1) {
perror("sigaction error");
exit(EXIT_FAILURE);
}
}
int
main(void)
{
pid_t pid;
int status;
setup_signal(SIGCHLD, sig_handler);
setup_signal(SIGTERM, sig_handler);
/*
* The signal infromation is inherited across a fork,
* and is set the same for the parent and the child.
*/
pid = fork();
if (pid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (pid == 0) {
pause(); /* stop execution, wake upon signal */
exit(EXIT_SUCCESS);
}
printf("%d: Parent asking child (%d) to terminate\n", getpid(), pid);
kill(pid, SIGTERM); /* send the child the TERM signal */
/* Wait for the sigchild notification of child termination! */
pause();
/* this should return immediately because waited for sigchld! */
assert(pid == wait(&status));
assert(WIFEXITED(status));
return 0;
}Program output:
1128157: We've been asked to terminate. Exit!
1128156: Parent asking child (1128157) to terminate
1128156: Child process 1128157 has exited.
Note: You want to run this a few times on your system to see the output. The auto-execution scripts of the lectures might cause wonky effects here due to concurrency.
We now see a couple of new features:
SIGCHLD signal is activated in the
parent when a child process exits.kill function to send a
signal to a another process owned by the same user
(e.g. gparmer).pause call says to stop execution (to
pause) until a signal is triggered.
A couple of additional important functions:
raise will trigger a signal in the current
process (it is effectively a
kill(getpid(), ...)).ualarm will set a recurring
SIGALRM signal. This can be quite useful if
your program needs to keep track of time in some way.Signals are dangerous mechanisms in some situations. It can be difficult to use them properly, and avoid bugs. Signals complication data-structures as only functionality that is re-eentrant should be used in signal handlers, and they complicate the logic around all system calls as they can interrupt slow system calls.
“Slow” system calls. Many library calls
can block. These are calls that cannot necessarily
complete immediately as they might require concurrent events
to complete. For example, the wait call has to
block and wait for the child to exit;
and read has to block if there is no data
available (yet) on a pipe.
But what happens if a signal is sent to a process
while it is blocked? By default, the system call
will wake up and return immediately, even though
the call’s functionality has not been provided. For example,
wait will return even though a child didn’t
exit, and read returns despite not
reading data.
So how do you tell the difference between the blocking
function returning properly, or returning because it was
interrupted by a signal? The answer is, of course, in the
man pages: the function will return an error
(-1), and errno is set to
EINTR. Given this, we see the problem with this
design: now the programmer must add logic for every
single system call that can block to check this
condition. Yikes15.
Luckily, UNIX provides a means to disable the
interruption of blocking calls by setting the
SA_RESTART flag to the sa_flags
field of the sigaction struct passed to the
sigaction call. Note that the code above
already sets this as I consider it a default requirement if
you’re setting up signal handlers.
Lets see the explicit interaction with between the slow
call wait, and the signal handler:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
#include <sys/wait.h>
void
sig_handler(int signal_number, siginfo_t *info, void *context)
{
printf("Alarm signal handler\n");
return;
}
void
setup_signal(int signo)
{
sigset_t masked;
struct sigaction siginfo;
int ret;
sigemptyset(&masked);
sigaddset(&masked, signo);
siginfo = (struct sigaction) {
.sa_sigaction = sig_handler,
.sa_mask = masked,
/*
* We didn't set RESTART so that wait returns
* when a signal happens.
*/
.sa_flags = SA_SIGINFO /* | SA_RESTART */
};
if (sigaction(signo, &siginfo, NULL) == -1) {
perror("sigaction error");
exit(EXIT_FAILURE);
}
}
int
main(void)
{
pid_t pid;
setup_signal(SIGALRM);
/* lets spin up a simple child */
if ((pid = fork()) == 0) {
pause(); /* await signal */
exit(EXIT_SUCCESS);
}
/* our signal handler for the alarm should execute in a second */
alarm(1);
while (1) {
pid_t ret;
ret = wait(NULL);
if (ret == -1) {
if (errno == EINTR) {
printf("Wait call interrupted by signal!\n");
kill(pid, SIGTERM); /* goodbye sweet child! */
/* loop again and `wait` */
} else if (errno == ECHILD) {
printf("Child has exited.\n");
return 0; /* no more children! */
}
}
}
}Program output:
Alarm signal handler
Wait call interrupted by signal!
Child has exited.
Re-entrant computations. Signal handlers
execute by interrupting the currently executing
instructions, regardless what computations they are
performing. Because we don’t really know anything about what
was executing when a signal handler started, we have to an
issue. What if an action in the signal handler in some way
conflicts with the action it interrupted?
Conceptually, printf will copy data into a
buffer before calling write to output it to
standard output. What if half-way through generating the
data in printf, we execute a signal handler,
and call printf…which happens to overwrite all
of the hard work of the interrupted printf! Any
function that has these issues is called non-reentrant.
Yikes. An example:
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
void
sig_handler(int signal_number, siginfo_t *info, void *context)
{
/*
* Reset `errno`! In a real program, this might instead be a call that causes
* an error setting `errno` to whatever the error is for *that call*.
*/
errno = 0;
return;
}
void
setup_signal(int signo)
{
sigset_t masked;
struct sigaction siginfo;
int ret;
sigemptyset(&masked);
sigaddset(&masked, signo);
siginfo = (struct sigaction) {
.sa_sigaction = sig_handler,
.sa_mask = masked,
.sa_flags = SA_RESTART | SA_SIGINFO
};
if (sigaction(signo, &siginfo, NULL) == -1) {
perror("sigaction error");
exit(EXIT_FAILURE);
}
}
int
main(void)
{
setup_signal(SIGUSR1);
assert(read(400, "not going to work", 10) == -1);
raise(SIGUSR1);
printf("errno should be \"Bad file descriptor\", but has value \"%s\"\n", strerror(errno));
return 0;
}Program output:
errno should be "Bad file descriptor", but has value "Success"
The set of functions you can call in a signal
handler (i.e. that are re-entrant) are listed in
the manual page: man 7 signal-safety. Notably
these do not include the likes of
printf/snprintfmallocexit (though _exit is
fine)errno!It is hard to do much in a program of any complexity
without snprintf (called by
printf), malloc, or use
errno. A very common modern way to handle this
situation is to create a pipe into which the
signal handlers write a notification (e.g. the
signal number), while the main flow execution
reads from the pipe. This enables the main
execution in our programs to handle these notifications.
However, this is only really possible and feasible when we
get to poll later, and can block waiting for
any of a number of file descriptors, including this
pipe.
Note that in most cases, you won’t see a bug due to using non-re-entrant functions in your signal handlers. These bugs are heavily non-deterministic, and are dependent on exactly what instructions were interrupted when the signal activated. This may feel like a good thing: buggy behavior is very rare! But reality is opposite: rare, non-deterministic bugs become very very hard to debug and test. Worse, these bugs that pass your testing are more likely to happen in customer’s systems that might have different concurrency patterns. So it is nearly impossible to debug/test for these bugs, and they are more likely to cause problems in your customer’s environment. Again, yikes.
We’ve seen how to use pipes for
communication between processes, and how descriptors are
used to interact with system resources. In this module,
we’ll dive into how to access the file system and the files
and directories on your system!
The filesystem is one of the core abstractions on UNIX systems and indeed, on modern OSes. Each file is identified by a path through directories. Interestingly, an early goal of UNIX is that “everything is a file”. This is a strange statement as it raises the question “what wouldn’t normally be a file?” Some examples:
/proc/*! These
files are used to provide “task monitor” type functionality
(showing running processes) and gdb./dev/*!
You can actually cat a file directly to your
disk16!/dev/random. Want the
complete absence of anything? /dev/null17. Want to update your
power settings? There are files in /sys/* for
that.When everything is a file, it can all be manipulated and accessed using the exactly same functions and APIs as the normal files in the system. This means that all of the shell programs we’ve seen can be used not just to operate on files, but also on processes or devices!
So what does it look like to access files? The relevant functions include:
int open(path, flags) - Open a file,
identified by its path, and return a file
descriptor that can be used to access that file. The
flags must be include of the following:
O_RDONLY, O_WRONLY, or
O_RDWR – if you want to only read from the
file, only write to the file, or have the ability to both
read and write to the file. Another interesting
flag value is O_CREAT which will
create the file if it doesn’t already exist.
Whenever you see “flags”, you should think of them as a set
of bits, and each of the options as a single bit. Thus, when
passing in flags, you can use bitwise operators to pass in
multiple options, for example
open("penny_is_best.gif", O_RDWR | O_CREAT, 0700)
will open the file, creating it if it doesn’t already exist,
and enable reading and writing to the file. Note that when
you pass in O_CREAT, you should pass in the
third argument, the mode. For now, just always
pass in 0700!read &write - We’ve seen
these before when we saw pipes! They are the
generic functions for getting data from, and pushing data to
descriptors.close - Remember that we can
close any descriptor to release it (the
free for descriptors, if you will), including
those that reference files.int stat(path, struct stat *info) - Get
information about a file identified by path
into the info structure. The structure is
documented in the man page, but it includes,
for example, the file size. fstat enables to
you get stat info from an existing file descriptor.creat(path, mode) - Create a new file at
the path with the mode (specified
identically to open). Later, we’ll learn about
mode when we discuss security, for now, you can use a
permissive mode of 0700.unlink(path) - Try and remove a file! This
is called, for example, by the rm program.#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
#include <stdio.h>
#define TWEET_LEN 280
int
main(void)
{
struct stat finfo;
char tweet[TWEET_LEN];
int ret;
/* can use paths that are relative to the current working directory */
int fd = open("./05/prufrock.txt", O_RDONLY);
assert(fd >= 0); /* should normally do better error checking */
ret = fstat(fd, &finfo);
assert(ret >= 0);
printf("We want to say %ld things...but the world doesn't value artistry.\n", finfo.st_size);
printf("Lets cut down to 280 characters and tweet like it is 1915...\n\n");
fflush(stdout); /* see later */
ret = read(fd, tweet, TWEET_LEN);
assert(ret == TWEET_LEN); /* for this demo... */
write(STDOUT_FILENO, tweet, TWEET_LEN);
close(fd); /* we close file descriptors just the same */
printf("\n");
return 0;
}Program output:
We want to say 6044 things...but the world doesn't value artistry.
Lets cut down to 280 characters and tweet like it is 1915...
The Love Song of J. Alfred Prufrock
T.S. Elliot
Sio credesse che mia risposta fosse
A persona che mai tornasse al mondo,
Questa fiamma staria senza piu scosse.
Ma percioche giammai di questo fondo
Non torno vivo alcun, siodo il vero,
Senza tema dinfamia ti rispondo
It is really important that quite a few interesting functions operate on descriptors, which might reference pipes, files, the terminal, or other resources. This means that processes can operate on descriptors and not care what resources back them. This enables shells to implement pipelines and the redirection of output to files, all without the processes knowing! It enables
forked processes to inherit these resources from its parent. In this way, descriptors and the functions to operate on them are a polymorphic foundation (in the sense of object-oriented polymorphism) to accessing the system.
We see something a little strange with files as compared
to pipes. Files have a finite size, and the
data remains in the file after being read – unlike with
pipes where data is removed out of the pipe when read. For
files, subsequent reads and writes must progress through the
contents of the file, but we must also to be able to go back
and read previously read data. Lets understand what happens
when we read or write from a file
– we’ll focus only on read to start, but both
behave similarly. Each file descriptor18 tracks an
offset into the file at which the next
read will access data. Lets go through an
example of a file that contains the alphabet:
abcdefghijklmnopqrstuvwxyz
A freshly opened file returns a file
descriptor (fd) where
offset = 0.
abcdefghijklmnopqrstuvwxyz
^
|
+-- offset
A read(fd, buf, 10) that returns
10 (saying that 10 bytes or
characters a through j was
successfully read) advances offset by
10.
abcdefghijklmnopqrstuvwxyz
^
|
+-- offset
Thus, an additional read will start reading from the file
at k. The offset, then, enables subsequent
reads to iterate through a file’s contents.
write uses the offset identically.
There are many cases in which we might want to modify the
offset. For example, databases want to jump
around a file to find exactly the interesting records,
playing audio and video requires us to skip around a file
finding the appropriate frames. Thus, we have the
function:
off_t lseek(int fd, off_t update, int whence)
- This function lets you modify the offset of
the file associated with fd, by
update. The last argument can be set to
SEEK_SET to set the fd’s
offset to update,
SEEK_CUR to add update to
the current offset (thus making a relative
update), or SEEK_END to set the
offset to the files end plus
offset. It returns the resulting
offset of the fd.#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <assert.h>
#include <stdio.h>
#define TWEET_LEN 280
int
main(void)
{
char tweet[TWEET_LEN];
int ret;
/* can use paths that are relative to the current working directory */
int fd = open("./05/prufrock.txt", O_RDONLY);
assert(fd >= 0); /* should normally do better error checking */
printf("What lies beyond the tweet? Tweet 2/N:\n\n");
fflush(stdout); /* see later */
ret = lseek(fd, 280, SEEK_SET);
assert(ret == 280);
ret = read(fd, tweet, TWEET_LEN);
assert(ret == TWEET_LEN); /* for this demo... */
write(STDOUT_FILENO, tweet, TWEET_LEN);
close(fd); /* we close file descriptors just the same */
printf("\n");
return 0;
}Program output:
What lies beyond the tweet? Tweet 2/N:
.
Let us go then, you and I,
When the evening is spread out against the sky
Like a patient etherized upon a table;
Let us go, through certain half-deserted streets,
The muttering retreats
Of restless nights in one-night cheap hotels
And sawdust restaurants with oyster-shells:
St
It is very common to access files with read
and write. However, they require many
interactions with the file through subsequent calls to these
functions. Thus, UNIX provides a second means to access
files. UNIX also provides a way to directly “map” a file
into a process, enabling to appear directly in memory, thus
be accessible using normal memory accesses. This is cool
because your changes in memory update the file
directly!
void *mmap(addr, len, prot, flags, fd, offset)
- Wow, that’s a lot of arguments. Believe it or not, it was
the call in Linux that had the most arguments for a
long time! Lets go through these:
fd - this is the file descriptor we want to
map into memory.addr - this is the address in memory at
which you want to map. Normally, this is NULL,
which means “please map it whereever you want”. The
return value is the address at which the file is
mapped, or NULL if there was an error (see the
man page and errno values).len - How much of the file do we want to
map in? This is where stat is quite useful to
find the entire file’s size if you want to map it all
in.offset - If we aren’t mapping the entire
file, we can start mapping at this offset.prot - This enables you to choose some
properties for the mapping. prot values include
PROT_READ, and PROT_WRITE which
ask the system to map the memory in readable, or writable
(you can choose both with
PROT_READ | PROT_WRITE). Note that the type of
accesses here must not request accesses not previously
requested in open – for example, if we passed
O_RDONLY in to open,
mmap will return an error if we ask for
PROT_WRITE.flags - For now, lets focus on using only
MAP_PRIVATE.munmap - Unmap a previously mapped
file.
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int
main(void)
{
int ret;
char *addr;
/* can use paths that are relative to the current working directory */
int fd = open("./05/prufrock.txt", O_RDONLY);
assert(fd >= 0); /* should normally do better error checking */
addr = mmap(NULL, 280 * 2, PROT_READ, MAP_PRIVATE, fd, 0);
assert(addr);
/* The file is now memory we can access! */
ret = memcmp(&addr[5], "Love", 4);
assert(ret == 0);
/* write it out to see it is the same as before */
ret = write(STDOUT_FILENO, addr, 280 * 2);
assert(ret == 280 * 2);
munmap(addr, 280 * 2);
close(fd);
return 0;
}Program output:
The Love Song of J. Alfred Prufrock
T.S. Elliot
Sio credesse che mia risposta fosse
A persona che mai tornasse al mondo,
Questa fiamma staria senza piu scosse.
Ma percioche giammai di questo fondo
Non torno vivo alcun, siodo il vero,
Senza tema dinfamia ti rispondo.
Let us go then, you and I,
When the evening is spread out against the sky
Like a patient etherized upon a table;
Let us go, through certain half-deserted streets,
The muttering retreats
Of restless nights in one-night cheap hotels
And sawdust restaurants with oyster-shells:
St
Note that you cannot mmap a
pipe or terminal, only files. This kinda makes
sense: since a pipe holds transient data that is
only there until read out of the pipe it doesn’t
make sense for it to be at a fixed address in memory – it
would require an infinite amount of memory as more and more
data is written into the pipe!
Memory allocation with
mmap. Interesting, mmap
can also be used to allocate memory. Specifically,
if we pass in fd = 0,
prot = PROT_READ | PROT_WRITE, and
flags = MAP_ANONYMOUS | MAP_PRIVATE,
mmap will simply return newly allocated memory
instead of a file! Somewhere deep in malloc’s
implementation, it calls mmap to allocate the
memory it eventually returns to the application!
There is actually a third way to access files
beyond “raw” read/write and
mmap – streams! Streams
are an abstraction on top of the “raw” descriptor
accesses using read and write.
Streams are identified by the type FILE *,
and all the APIs for operating on streams take
FILEs instead of file descriptors. In most ways
they behave very similar to normal descriptor accesses – you
can fopen files to open a stream to a file, and
fread and fwrite to read data
from, and write data to the file. As with a process’
descriptors, we have default streams:
stdout, stdin,
stderr which correspond to
STDOUT_FILENO, STDIN_FILENO, and
STDERR_FILENO, respectively.Some of the stream functions:
FILE *fopen(path, mode) - similar to
open, but returning a stream. The access rights
we’re asking for (reading or writing to the file), are
specified in a more intuitive mode string. “r”
to read, “r+” for read/write, etc…fclose - similar to close, but
for FILE *s.fread/fwrite - Stream
equivalents of read/write.
Additionally, streams provide the feof function
that tells us if we are at the end of a file.printf/fprintf - Wait, what?
Yes, printf is actually used to send our string
to the stdout stream. fprintf
enables us to write the string we’re creating out to a
specific stream, not just the
stdout!scanf/fscanf - Similar,
scanf inputs from the stdin, and
fscanf inputs from a specific stream.#include <stdio.h>
#include <assert.h>
int
main(void)
{
FILE *f;
char tweet_thread[(280 * 2) + 1];
int ret;
f = fopen("./05/prufrock.txt", "r");
assert(f);
ret = fread(tweet_thread, 1, 280, f);
assert(ret == 280);
ret = fread(&tweet_thread[280], 1, 280, f);
assert(ret == 280);
tweet_thread[280 * 2] = '\0';
fprintf(stdout, "%s\n", tweet_thread);
fclose(f);
return 0;
}Program output:
The Love Song of J. Alfred Prufrock
T.S. Elliot
Sio credesse che mia risposta fosse
A persona che mai tornasse al mondo,
Questa fiamma staria senza piu scosse.
Ma percioche giammai di questo fondo
Non torno vivo alcun, siodo il vero,
Senza tema dinfamia ti rispondo.
Let us go then, you and I,
When the evening is spread out against the sky
Like a patient etherized upon a table;
Let us go, through certain half-deserted streets,
The muttering retreats
Of restless nights in one-night cheap hotels
And sawdust restaurants with oyster-shells:
St
Streams as buffered I/O. So it doesn’t
seem like we’re getting much of use out of these streams
compared to raw descriptors. Streams are an optimization
over read/write as they
buffer data. Buffering essentially means that your
fwrites actually write data into a “buffer” in
memory, and only when either you write out a \n
or when the buffer gets large enough does a
write get called to send the data to the
system. Why do this? By buffering data in memory rather than
making a write for each fwrite,
we’re saving any overhead associated with each
write (to be discussed in a couple of weeks).
However, this means that just because we called
printf, doesn’t mean it will output
immediately! Yikes. Lets see this.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
printf("hello world");
_exit(EXIT_SUCCESS); /* this exits immediately without calling `atexit` functions */
return 0;
}Program output:
_exit exits from the process
immediately, and without invoking any
atexit functions. Flushing the
stdout stream is done in one such function,
thus we don’t see the output!
#include <stdio.h>
#include <unistd.h>
int
main(void)
{
printf("hello world");
fork();
printf("\n"); /* remember, streams output on \n */
return 0;
}Program output:
hello world
hello world
Well that’s not good. The buffered data is
copied into each process, then it is output when a
\n is encountered.
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int
main(void)
{
printf("hello ");
write(STDOUT_FILENO, "world ", 6);
printf("\n"); /* remember, streams output on \n */
return 0;
}Program output:
world hello
Both printf and the write are
to standard output, but printf is to a stream.
The buffered printf output is not
written out immediately, thus ordering can get messed up.
Yikes.
Last example of the complexities of streams. What happens if you get a segfault!?
#include <stdio.h>
#include <unistd.h>
#include <string.h>
int
main(void)
{
int a;
printf("hello ");
a = *(int *)NULL;
return 0;
}Program output:
make[1]: *** [Makefile:30: inline_exec] Segmentation fault (core dumped)
Even though we fault after the
printf, we don’t see the printf’s
output! We now know why: the printf wrote into
a buffer, and didn’t yet write to the system.
Thus the segmentation fault, which terminates the process,
happens before the buffer is actually written!
Flushing the stream buffers. It is
imperative that streams give us some means to force the
buffer to be output! Thus, streams provide a means of
flushing the stream’s buffers, and sending them out
to the system (e.g. using write).
fflush - flush out the buffer associated
with a specific stream.Fixing the previous examples:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
printf("hello ");
fflush(stdout);
write(STDOUT_FILENO, "world ", 6);
printf("\n"); /* remember, streams output on \n */
printf("hello world");
fflush(stdout);
_exit(EXIT_SUCCESS); /* this exits immediately without calling `atexit` functions */
return 0;
}Program output:
hello world
hello world
#include <stdio.h>
#include <unistd.h>
int
main(void)
{
printf("hello world");
fflush(stdout);
fork();
printf("\n"); /* remember, streams output on \n */
return 0;
}Program output:
hello world
A few other functions for streams that can be useful:
fileno - Get the file descriptor number
associated with a stream. This can be useful if you need to
use an API that takes a descriptor, rather than a stream
* FILE.getline - Read out a line (delimited by a
\n) from a stream. Since reading input, line at
a time, is a pretty common thing, this is a useful
function.#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
int
main(void)
{
FILE *f = fopen("./05/prufrock.txt", "r");
size_t s = 0;
char *line = NULL;
int i;
assert(f);
printf("The new genre: abridged, tweet-worthy poetry...\n\n");
for (i = 0; getline(&line, &s, f) != -1; i++) {
if (i % 15 == 0) {
fwrite(line, 1, strlen(line), stdout);
/* same as printf("%s", line); */
}
}
if (!feof(f)) {
perror("getline");
exit(EXIT_FAILURE);
}
free(line);
fclose(f);
return 0;
}Program output:
The new genre: abridged, tweet-worthy poetry...
The Love Song of J. Alfred Prufrock
Of restless nights in one-night cheap hotels
Let fall upon its back the soot that falls from chimneys,
And for a hundred visions and revisions,
Disturb the universe?
Then how should I begin
Of lonely men in shirt-sleeves, leaning out of windows? ...
And I have seen the eternal Footman hold my coat, and snicker,
Am an attendant lord, one that will do
Files aren’t the only thing we care to access in the
system – what about directories! We want to be able to
create directories, delete them, and be able to read their
contents. These core operations are what underlies the
implementation of the ls program, or,
similarly, what happens when you double-click on a directory
in a graphical file explorer!
opendir, closedir - Open
(and close) a directory, and return a directory
stream, a DIR *, to it, which is backed by
a descriptor. Yes, another descriptor!
readdir - This takes a
DIR * to a directory, and returns a structure
that includes the “current” file or directory in the
directory. This includes the name of the returned object in
d_name, and type (in the d_type
field) which is one of:
DT_BLK - This is a block device.DT_CHR - This is a character device.DT_DIR - This is a
directory.DT_FIFO - This is a named pipe (FIFO).DT_LNK - This is a symbolic link.DT_REG - This is a regular
file.DT_SOCK - This is a UNIX domain
socket.DT_UNKNOWN - The file type could not be
determined.The two main ones we care about are directories and
files. Subsequent calls to readdir will return
the “next” file or directory.
scandir & glob - Two
functions that enable you to get directory contents based on
some simple search patterns and logic. scandir
lets you pass in a directory stream, and a couple of
functions that are used to sort directory contents output,
and filter it (enable you to say that you don’t want to look
at some directory contents). glob, on the other
hand, lets you search for “wildcards” designated by
* to select all files/direcotries that match
the string.
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
long file_size(char *dir, char *file);
int
main(void)
{
DIR *d = opendir("./05/");
struct dirent *entry;
assert(d);
errno = 0;
while ((entry = readdir(d)) != NULL) {
char *type;
if (entry->d_type == DT_DIR) type = "Directory";
else if (entry->d_type == DT_REG) type = "File";
else type = "Unknown";
if (entry->d_type == DT_DIR || entry->d_type != DT_REG) {
printf("%10s: %s\n", type, entry->d_name);
} else { /* entry->d_type == DT_REG */
printf("%10s: %s (size: %ld)\n", type, entry->d_name, file_size("05", entry->d_name));
}
}
if (errno != 0) {
perror("Reading directory");
exit(EXIT_FAILURE);
}
closedir(d);
return 0;
}
long
file_size(char *dir, char *file)
{
struct stat finfo;
char buf[512];
int ret;
memset(buf, 0, 512); /* zero out the buffer to add '\0's */
snprintf(buf, 512, "./%s/%s", dir, file);
ret = stat(buf, &finfo);
assert(ret == 0);
return finfo.st_size;
}Program output:
File: 01_lecture.md (size: 21580)
File: 02_exercises.md (size: 2978)
Directory: test_directory
File: prufrock.txt (size: 6044)
Directory: .
Directory: ..
To make changes in the first system hierarchy, we need an additional set of functions.
mkdir(path, mode) - This function is used
in, for example, the mkdir program. Don’t
confuse the program you use in the shell, with the function
you can call from C.rmdir(path) - Deletes a directory that is
empty. This means that you have to unlink and
rmdir the directory’s contents first.rename(from, to) - Change the name of file
or directory from from to to. You
can imagine this is used by the mv command line
program.#include <sys/stat.h>
#include <sys/types.h>
#include <assert.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
int
main(void)
{
int ret;
int fd;
ret = mkdir("05/newdir", 0700);
assert(ret == 0);
fd = open("05/newdir/newfile", O_RDWR | O_CREAT, 0700);
assert(fd >= 0);
ret = write(fd, "new contents", 13);
assert(ret == 13);
ret = rename("05/newdir", "05/newerdir");
assert(ret == 0);
ret = unlink("05/newerdir/newfile");
assert(ret == 0);
ret = rmdir("05/newerdir");
assert(ret == 0);
printf("If there were no errors, we\n\t"
"1. created a directory, \n\t"
"2. a file in the directory, \n\t"
"3. change the directory name,\n\t"
"4. removed the file, and\n\t"
"5. removed the directory\n");
return 0;
}Program output:
If there were no errors, we
1. created a directory,
2. a file in the directory,
3. change the directory name,
4. removed the file, and
5. removed the directory
Write a tree clone, with disk usage
information. Though it isn’t installed, by default, tree
outputs the filesystem at a specific directory:
$ tree .
.
├── 01_lecture.md
├── 02_exercises.md
├── prufrock.txt
└── test_directory
└── crickets.txt
We’ll simplify the output to print out markdown lists
with either D or F for directories
or files. If we run the program in the 05
directory:
$ ./t
- 01_lecture.md (20759)
- 02_exercises.md (1683)
- prufrock.txt (6044)
- test_directory
- crickets.txt (26)
Some starter code that focuses on printing out info for a single directory.
Task: Change this code to print out the hierarchy of files and directories, rather than just the contents of a single directory.
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* helper functions */
void indent(int n);
int ignoredir(char *dir);
long file_size(char *dir, char *file);
/*
* Return the size of the directory (sum of all files inside).
* The easiest implementation here is recursive where this is called
* for each directory.
*/
size_t
print_dir(char *dir, int depth)
{
DIR *d = opendir(dir);
struct dirent *entry;
assert(d);
errno = 0;
/* Go through each dir/file in this directory! */
while ((entry = readdir(d)) != NULL) {
/* we'll ignore . and .. */
if (ignoredir(entry->d_name)) continue;
/* print out the proper indentation */
indent(depth);
if (entry->d_type == DT_DIR) {
printf("- D %s\n", entry->d_name);
} else if (entry->d_type == DT_REG) {
printf("- F %s (%ld)\n", entry->d_name, file_size(dir, entry->d_name));
}
/* we'll ignore everything that isn't a file or dir */
}
if (errno != 0) {
perror("Reading directory");
exit(EXIT_FAILURE);
}
closedir(d);
return 0;
}
int
main(void)
{
print_dir("./", 0);
return 0;
}
/* Indent `n` levels */
void
indent(int n)
{
for (int i = 0; i < n; i++) printf(" ");
}
/* Should we ignore this directory? */
int
ignoredir(char *dir)
{
return strcmp(dir, ".") == 0 || strcmp(dir, "..") == 0;
}
long
file_size(char *dir, char *file)
{
struct stat finfo;
char buf[512];
int ret;
memset(buf, 0, 512); /* zero out the buffer to add '\0's */
snprintf(buf, 512, "%s/%s", dir, file);
ret = stat(buf, &finfo);
assert(ret == 0);
return finfo.st_size;
}Program output:
- D templates
- D 02
- F aggregate.md (165504)
- D 01
- F inline_exec_tmp (62552)
- D 10
- D 12
- F title.md (320)
- D 07
- D 09
- D 13
- D .git
- D 14
- D 03
- F README.md (35)
- D figures
- F output_tmp.dat (0)
- D 08
- D 05
- D 11
- D slides
- F .gitignore (520)
- D tools
- D 04
- D 99
- D 06
- F LICENSE (1522)
- F inline_exec_tmp.c (1658)
- F Makefile (1908)
Task: Add output for the size of files, and the size of each directory – the sum of the size of all contained directories and files.
Again, if we run the program in the 05
directory:
$ ./t
- 01_lecture.md (20759)
- 02_exercises.md (1683)
- prufrock.txt (6044)
- test_directory
- crickets.txt (26)
(26)
(28512)
The test_directory size is the
(26), while the ./ directory has
size (28512). Your specific sizes might
differ.
We’ve seen that processes often coordinate through pipes,
signals, and wait. This is necessary when
coordinating between parents and children in shells (both on
the terminal, and GUIs). These are useful when children have
a common parent (thus can share pipes), and where the parent
knows that the children want to communicate before they are
execed. Modern systems require a lot of
flexibility than this. Generally, we want a process to be
able to decide who to coordinate with as they execute given
their own goals. For example, the shell – when it executes a
child – doesn’t know if it wants to have a GUI, thus needs
to talk to the window server.
We require more dynamic, more flexible forms of coordination between processes. Generally, we call this coordination Inter-Process Communication or IPC. When do we generally want more flexible forms of IPC than pipes, signals, and wait?
Even applications that are implemented as many processes often require IPC between those processes.
Browsers all require a lot of IPC. Each tab is a separate process, and there is a single browser management process that manages the GUI and other browser resources. This is motivated by security concerns: a tab that fails or is compromised by an attacker cannot access the data of another tab! In the early (first 15-20 years) of browsers, this wasn’t the case, and you had to be careful about opening your banking webpage next to webpages that were suspect. The browser management process communicates with each of the child tab processes to send them user input (mouse clicks, keyboard input), and receive a bitmap to display on screen. Lets see this structure:
$ ps aux | grep firefox
... 8029 ... /usr/lib/firefox/firefox -new-window
... 8151 ... /usr/lib/firefox/firefox -contentproc -childID 1 -isForBrowser ... 8029 ...
...
I have an embarrassing number of tabs open, so I’ve taken
the liberty of manually filtering this output. We can see
that the process with pid 8029 is the parent,
browser management process, and the process
8151 is a child managing a tab (note, you can
see processes arranges as a process tree using
ps axjf). There are many other children
processes managing tabs.
Lets try and figure out how they communicate with each other. First, lets look at the file descriptors of the tab’s processes (the child).
$ ls -l /proc/8151/fd/
0 -> /dev/null
1 -> socket:[100774]
...
11 -> /memfd:mozilla-ipc
4 -> socket:[107464]
5 -> socket:[107467]
7 -> pipe:[110438]
We see that there is no standard input
(/dev/null is “nothing”), and that there are a
few means of IPC:
pipe - We know these!memfd - A Linux-specific way of sharing a
range of memory between the processes. Stores to the memory
can be seen by all sharing processes.socket - Another pipe-line channel in which
bytes written into the socket can be read out on the other
side. A key difference is that socket connections between
processes can be created without requiring the inheritance
of descriptors via fork.Lets look at the browser management process, and see what’s going on. Notably, we want to see how the tab’s processes communicate with it. Thus, lets look at the descriptors for the parent:
$ ls -l /proc/8029/fd/ | awk '{print $9 $10 $11}'
110 -> /home/ycombinator/.mozilla/.../https+++mail.google.com...data.sqlite
171 -> /home/ycombinator/.mozilla/.../https+++www.youtube.com....data.sqlite
123 -> /home/ycombinator/.mozilla/.../formhistory.sqlite
130 -> /home/ycombinator/.mozilla/.../bookmarks.sqlite
58 -> /home/ycombinator/.mozilla/.../cookies.sqlite
3 -> /dev/dri/renderD128
43 -> /dev/dri/card0
...
100 -> /memfd:mozilla-ipc
60 -> socket:[107464]
64 -> socket:[107467]
82 -> pipe:[110438]
This is heavily filtered. The first, highlighted
section shows that the browser management process uses a
database (sqlite) to access a webpage’s data
(see gmail and youtube), store
form histories, bookmarks, and cookies. It also uses the
dri device is part of the Direct
Rendering Infrastructure which enables it to communicate
with the X Window Server, and display what we see on screen.
Finally, we see it uses the same means of IPC as the client,
and if we carefully look at the [ids] of the
sockets and pipes, we can see that they match those in the
child tab process! We can see that the tab has multiple IPC
channels open with the parent, and can access them with file
descriptors.
There are multiple IPC mechanisms in the system, and they all represent different trade-offs. They might be good for some things, and bad at others.
Lets assess pipes in this taxonomy:
dup,
close, and pipe.If the goal is to send data from one process to another,
one option is found in the filesystem (FS): can’t we just
use files to share? Toward this, we saw in the section on FS
I/O that we can open a file, and read and
write (or
fread/fwrite) from it from
multiple processes. This will certainly get data
from one process to the other. However, it has a number of
shortcomings:
pipes), then we’d
end up with an infinite file. That translates to your disk
running out of space very quickly, with little benefit.
Take-away: files are for a finite amount of data, not for an
infinite stream of data.To emphasize these problems, lets try and implement a channel in a file to send data between processes. We just want to send a simple string repetitively from one process to the other.
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
int
main(void)
{
int ret;
ret = creat("string_data", 0777);
assert(ret != -1);
if (fork() == 0) {
char *msg[] = {"penny for pawsident", "america for good doggies"};
int fd = open("string_data", O_WRONLY);
assert(fd != -1);
/* send the first message! */
ret = write(fd, msg[0], strlen(msg[0]) + 1);
assert(ret == (int)strlen(msg[0]) + 1);
/* send the second message! */
ret = lseek(fd, 0, SEEK_SET);
ret = write(fd, msg[1], strlen(msg[1]) + 1);
assert(ret == (int)strlen(msg[1]) + 1);
close(fd);
exit(EXIT_SUCCESS);
} else {
char data[32];
int fd = open("string_data", O_RDONLY);
assert(fd != -1);
memset(data, 0, 32);
ret = read(fd, data, 32);
assert(ret != -1);
printf("msg1: %s\n", data);
ret = lseek(fd, 0, SEEK_SET);
assert(ret != -1);
ret = read(fd, data, 32);
assert(ret != -1);
printf("msg2: %s\n", data);
}
wait(NULL);
unlink("string_data");
return 0;
}Program output:
msg1:
msg2:
You can see that there are some problems here. If we run it many times, we can see that sometimes we don’t see any messages, sometimes we only see the first, sometimes we only see the second, and other times combinations of all three options. Thus, they are not useful for channel-based communication between multiple processes.
On the other side, files have a very useful properti that we’d like to use in a good solution to IPC: they have a location in the FS that the communicating processes can both use to find the file, thus avoiding the need for a shared parent.
An aside: you can use
mmapto map a file into the address space, and if you map itMAP_SHARED(instead ofMAP_PRIVATE), then the memory will be shared between processes. When one process does a store to the memory, the other process will see that store immediately! We can use this to pass data between processes, but we still have many of the problems above. How do we avoid conflicting modifications to the memory, get notifications that modifications have been made, and make sure that the data is formatted in an organized (finite) manner?
The core problem with files is that they aren’t channels
that remove existing data when it is “consumed”
(read). But they have the significant up-side
that they have a “name” in the filesystem that multiple
otherwise independent processes can use to access the
communication medium.
Named pipes or FIFOs are like pipes in that one process
can write into the FIFO, and another process can read from
it. Thus, unlike files, they have the desirable property of
channels in which data read from the channel is consumed.
However, like files (and unlike pipes) they have a “name” in
the filesystem – FIFOs appear in the filesystem along-side
files. The stat function will let you know that
a file is a FIFO if the st_mode is
S_IFIFO. Since these pipes appear in the
filesystem, they are called named pipes.
Two processes that wish to communicate need only both know
where in the filesystem they agree to find the named
pipe.
Lets see an example of using named pipes:
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void
proc1(void)
{
int fd, ret;
fd = open("office_hours", O_WRONLY);
assert(fd != -1);
ret = write(fd, "halp!", 5);
assert(ret == 5);
close(fd);
}
void
proc2(void)
{
char msg[6];
int fd, ret;
memset(msg, 0, 6);
fd = open("office_hours", O_RDONLY);
assert(fd != -1);
ret = read(fd, msg, 5);
assert(ret == 5);
close(fd);
printf("What I hear at office hours is \"%s\".", msg);
unlink("office_hours");
}
int
main(void)
{
int ret;
/* This is how we create a FIFO. A FIFO version of creat. */
ret = mkfifo("office_hours", 0777);
assert(ret == 0);
if (fork() == 0) {
proc1();
exit(EXIT_SUCCESS);
} else {
proc2();
wait(NULL);
}
return 0;
}Program output:
What I hear at office hours is "halp!".
There is one very awkward named pipe behavior. Processes
attempting to open a named pipe will block (the
open will not return) until processes have
opened it separately as both readable and
writable. This is awkward because how would a
process know when another process might want to
communicate with it? If a process gets IPC requests from
many others (think the browser manager), then it doesn’t
want to block awaiting a new communication; it wants to
service the requests of other processes.
Regardless, we do see named pipes as a cool option: named pipes enable us to use the filesystem to identify the pipe used for IPC. This enables communicating processes without shared parents to leverage IPC. This enables pipes to live up to the UNIX motto: everything is a file.
Lets check out an example that demonstrates how using named pipes for communication between a single process and multiple clients has a number of challenges.
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <assert.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/*
* Receive requests, and send them immediately as a response.
* You can imagine that interesting computation could be done
* to formulate a response.
*/
void
instructor(void)
{
int req, resp, ret;
char msg[32];
req = open("requests", O_RDONLY);
assert(req != -1);
resp = open("responses", O_WRONLY);
assert(resp != -1);
while (1) {
ret = read(req, msg, 32);
if (ret == 0) break;
assert(ret != -1);
ret = write(resp, msg, ret);
if (ret == 0) break;
assert(ret != -1);
}
close(req);
close(resp);
}
/*
* Make a "request" with our pid, and get a response,
* hopefully also our pid.
*/
void
student(void)
{
int req, resp, ret;
char msg[32];
req = open("requests", O_WRONLY);
assert(req != -1);
resp = open("responses", O_RDONLY);
assert(resp != -1);
ret = snprintf(msg, 32, "%d", getpid());
ret = write(req, msg, ret);
assert(ret != -1);
ret = read(resp, msg, 32);
assert(ret != -1);
printf("%d: %s\n", getpid(), msg);
close(req);
close(resp);
}
void
close_fifos(void)
{
unlink("requests");
unlink("responses");
}
int
main(void)
{
int ret, i;
pid_t pids[3];
/* clients write to this, server reads */
ret = mkfifo("requests", 0777);
assert(ret == 0);
/* server sends replies to this, clients read */
ret = mkfifo("responses", 0777);
assert(ret == 0);
/* create 1 instructor that is lecturing */
if ((pids[0] = fork()) == 0) {
instructor();
return 0;
}
/* Create 2 students "listening" to the lecture */
for (i = 0; i < 2; i++) {
if ((pids[i + 1] = fork()) == 0) {
student();
return 0;
}
}
atexit(close_fifos);
sleep(1);
for (i = 0; i < 3; i++) kill(pids[i], SIGTERM);
while (wait(NULL) != -1);
return 0;
}Program output:
1128517: 11285171128516
If executed many times, you see the expected result:
167907: 167907
167908: 167908
…but also strange results:
167941: 167940167941
If this is executed many times, we see a few properties of named pipes.
Named pipes summary. These solve an important problem: how can we have multiple processes find a “pipe” to use for communication even if they don’t have a parent to coordinate that communication? They use a filesystem path/name to identify the pipe. However, they are not suitable for a single processes (a server) to communicate with multiple clients as they don’t enable the communication for each client to be separated in the server.
mkfifo API explicitly.Sockets are the mechanism provided by UNIX to communicate over the network! However, they can also be used to communicate between processes on your system through UNIX domain sockets.
A few key concepts for domain sockets:
read and writen to, thus
making communication back and forth quite a bit easier.Lets look at an example were we want a server to receive a client’s requests as strings, and to reply with those same strings. This isn’t useful, per-say, but demonstrates how this communication can happen. Notably, we want to enable the server to communicate with different clients!
socket,
bind, and listen.
socket creates a domain socket file
descriptoraccept.socket and
connect.Most of these functions are complex and have tons of
options. Most of them have been distilled into the following
functions in 06/domain_sockets.h:
int domain_socket_server_create(const char *file_name)
- Create a descriptor to the “server” end of the IPC channel
identified by file_name in the filesystem,
similar to the named pipes before.int domain_socket_client_create(const char *file_name)
- Create a descriptor to the “client” end of the IPC channel
identified by a file name. One constraint is that the
server must create the domain socket first, or else
this call (the connect) will fail.The server’s descriptor is not meant to be used for
direct communication (i.e. should not be used for
read, and write). Instead, it is
used to create new descriptors, one per client!
With a descriptor per-client, we have the fortunate ability
to communicate explicitly with each client without the same
problem of messages getting messed up in named pipes.
Two functions that both take an argument which is the
domain socket name/path in the file system, and return a
descriptor to the socket. For the most part, you can just
use these functions in your code directly by using
06/domain_sockets.h.
#include "06/domain_sockets.h"
#include <errno.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
void
unlink_domain_socket(int status, void *filename)
{
unlink(filename);
free(filename);
}
#define MAX_BUF_SZ 128
void
server(int num_clients, char *filename)
{
char buf[MAX_BUF_SZ];
int new_client, amnt, i, socket_desc;
socket_desc = domain_socket_server_create(filename);
if (socket_desc < 0) exit(EXIT_FAILURE); /* should do proper cleanup */
on_exit(unlink_domain_socket, strdup(filename));
/*
* Service `num_clients` number of clients, one at a time.F
* For many servers, this might be an infinite loop.
*/
for (i = 0; i < num_clients; i++) {
/*
* We use this new descriptor to communicate with *this* client.
* This is the key function that enables us to create per-client
* descriptors. It only returns when a client is ready to communicate.
*/
if ((new_client = accept(socket_desc, NULL, NULL)) == -1) exit(EXIT_FAILURE);
printf("Server: New client connected with new file descriptor %d.\n", new_client);
fflush(stdout);
amnt = read(new_client, buf, MAX_BUF_SZ - 1);
if (amnt == -1) exit(EXIT_FAILURE);
buf[amnt] = '\0'; /* ensure null termination of the string */
printf("Server received message (sz %d): \"%s\". Replying!\n", amnt, buf);
fflush(stdout);
/* send the client a reply */
if (write(new_client, buf, amnt) < 0) exit(EXIT_FAILURE);
/* Done with communication with this client */
close(new_client);
}
close(socket_desc);
exit(EXIT_SUCCESS);
}
void
client(char *filename)
{
char msg[MAX_BUF_SZ];
int amnt = 0, socket_desc;
socket_desc = domain_socket_client_create(filename);
if (socket_desc < 0) exit(EXIT_FAILURE);
printf("1. Client %d connected to server.\n", getpid());
fflush(stdout);
snprintf(msg, MAX_BUF_SZ - 1, "Citizen %d: Penny for Pawsident!", getpid());
amnt = write(socket_desc, msg, strlen(msg) + 1);
if (amnt < 0) exit(EXIT_FAILURE);
printf("2. Client %d request sent message to server.\n", getpid());
fflush(stdout);
if (read(socket_desc, msg, amnt) < 0) exit(EXIT_FAILURE);
msg[amnt] = '\0';
printf("3. Client %d reply received from server: %s\n", getpid(), msg);
fflush(stdout);
close(socket_desc);
exit(EXIT_SUCCESS);
}
int
main(void)
{
char *channel_name = "pennys_channel";
int nclients = 2;
int i;
if (fork() == 0) server(nclients, channel_name);
/* wait for the server to create the domain socket */
sleep(1);
for (i = 0; i < nclients; i++) {
if (fork() == 0) client(channel_name);
}
/* wait for all of the children */
while (wait(NULL) != -1);
return 0;
}Program output:
Server: New client connected with new file descriptor 4.
1. Client 1128538 connected to server.
2. Client 1128538 request sent message to server.
1. Client 1128539 connected to server.
Server received message (sz 38): "Citizen 1128538: Penny for Pawsident!". Replying!
2. Client 1128539 request sent message to server.
3. Client 1128538 reply received from server: Citizen 1128538: Penny for Pawsident!
Server: New client connected with new file descriptor 4.
Server received message (sz 38): "Citizen 1128539: Penny for Pawsident!". Replying!
3. Client 1128539 reply received from server: Citizen 1128539: Penny for Pawsident!
The server’s call to accept is the key
difference of domain sockets from named pipes. It enables us
to have per-client descriptors we can use to separately
communicate (via read/write) to
each client.
Aside: Sockets are the main way to communicate over the network (i.e. to chat with the Internet). The APIs you’d use to create network sockets are the same, it simply requires setting up the
socketand thebinding of the socket in a network-specific manner.
Find the programs:
06/domain_socket_server.c
- a sample server.06/domain_socket_client.c
- a sample client.Both require the 06/domain_sockets.h
header file. You must run the server first to
create the domain socket “file”. If you run the server, and
it complains that “server domain socket creation”, then you
might need to rm the domain socket file on the
command line first. It already exists from a previous run of
the server, so the server cannot create it again!
Your job is to try to figure what in the hey these do! Answer the following questions:
waiting on client/child processes? Recall when
talking about background tasks in a shell that the
wait is key to the system’s
concurrency/behavior.The programs can be compiled directly with
gcc
(e.g. gcc domain_socket_server.c -o server; gcc domain_socket_client.c -o client).
Use these programs on the command line to send data from the
client to server.
In the following, if you aren’t positive of the
answer, please run the program! Note that we’re omitting
error checking in these programs to keep them terse.
Remember that in your programs, you must check and react to
all errors. This will require you to use the
man pages to look up the required
#includes. If the output doesn’t match a
high-level intuition, how would you modify the program to
match the intuition? What are all of the potential output of
the following programs? Why?
fork
and Stream Behaviorfork();
fork();
printf("z");printf("z");
fork();
fork();printf("z");
fork();
write(STDOUT_FILENO, ".", 1);#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/wait.h>
int
main(void)
{
pid_t child;
int i;
int wait_param = 0; /* or WNOHANG */
int output_w_write = 0; /* or 1 */
for (i = 0; i < 2; i++) {
child = fork();
if (child == 0) {
sleep(1);
if (output_w_write) write(STDOUT_FILENO, ".\n", 2);
else printf(".\n");
exit(EXIT_SUCCESS);
}
waitpid(child, NULL, wait_param);
write(STDOUT_FILENO, "Post-fork\n", 10);
}
/* ...are we done here? */
return 0;
}wait_param to equal
WNOHANG?output_w_write to
1?read
Behavior#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <string.h>
int
main(void)
{
pid_t child;
int fds[2];
char *msg = "What type of doggo is Penny?\n";
pipe(fds);
if ((child = fork()) == 0) {
/* recall: `cat` reads its stdin, and outputs it to stdout */
char *args[] = {"cat", NULL};
close(STDIN_FILENO);
dup2(fds[0], STDIN_FILENO);
execvp(args[0], args);
}
write(fds[1], msg, strlen(msg));
printf("100%% good girl.");
wait(NULL);
return 0;
}What is the following doing?
#include <execinfo.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h> /* sigaction and SIG* */
void
bt(void)
{
void *bt[128];
char **symbs;
int nfns, i;
nfns = backtrace(bt, 128);
symbs = backtrace_symbols(bt, nfns);
for (i = 0; i < nfns; i++) {
printf("%s\n", symbs[i]);
}
free(symbs);
}
void
bar(int *val)
{
*val = 42;
bt();
}
void
foo(int *val)
{
bar(val);
}
void
sig_fault_handler(int signal_number, siginfo_t *info, void *context)
{
printf("Fault triggered at address %p.\n", info->si_addr);
bt();
exit(EXIT_FAILURE);
return;
}
int
main(void)
{
sigset_t masked;
struct sigaction siginfo;
int ret;
int val;
sigemptyset(&masked);
sigaddset(&masked, SIGSEGV);
siginfo = (struct sigaction) {
.sa_sigaction = sig_fault_handler,
.sa_mask = masked,
.sa_flags = SA_RESTART | SA_SIGINFO /* we'll see this later */
};
if (sigaction(SIGSEGV, &siginfo, NULL) == -1) {
perror("sigaction error");
exit(EXIT_FAILURE);
}
foo(&val);
printf("---\nMoving on...\n---\n");
fflush(stdout);
foo(NULL);
return 0;
}Communicating with multiple clients is hard. Domain
sockets are complicated, but there are challenges around
blocking on reads. What if one client is very
“slow”, and we block waiting for them?
#include "06/domain_sockets.h"
#include <errno.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
#include <assert.h>
void
panic(char *msg)
{
perror(msg);
exit(EXIT_FAILURE);
}
void
client(char *filename, int slowdown)
{
int i, socket_desc;
char b;
socket_desc = domain_socket_client_create(filename);
if (socket_desc < 0) {
perror("domain socket client create");
exit(EXIT_FAILURE);
}
/* delay after creating connection, but before communicating */
sleep(slowdown);
if (write(socket_desc, ".", 1) == -1) panic("client write");
if (read(socket_desc, &b, 1) == -1) panic("client read");
printf("c: %c\n", b);
close(socket_desc);
exit(EXIT_SUCCESS);
}
void
client_slow(char *filename)
{
client(filename, 3);
}
void
client_fast(char *filename)
{
client(filename, 1);
}
int
main(void)
{
char *ds = "domain_socket";
int socket_desc, i;
socket_desc = domain_socket_server_create(ds);
if (socket_desc < 0) {
/* remove the previous domain socket file if it exists */
unlink(ds);
socket_desc = domain_socket_server_create(ds);
if (socket_desc < 0) panic("server domain socket creation");
}
/* TODO: change this order. What changes? */
if (fork() == 0) client_slow(ds);
if (fork() == 0) client_fast(ds);
/* handle two clients, one after the other */
for (i = 0; i < 2; i++) {
int ret, new_client, i;
char b;
new_client = accept(socket_desc, NULL, NULL);
if (new_client == -1) panic("server accept");
/* read from, then write to the client! */
if (read(new_client, &b, 1) == -1) panic("server read");
if (write(new_client, "*", 1) == -1) panic("server write");
close(new_client);
}
close(socket_desc);
/* reap all children */
while (wait(NULL) != -1) ;
return 0;
}main. When you change
the order, what do you see?The systemctl command enables you to
understand many of the services on the system (of which
there are many: systemctl --list-units | wc -l
yields 244 on my system).
$ systemctl --list-units
...
cups.service loaded active running CUPS Scheduler
...
gdm.service loaded active running GNOME Display Manager
...
ssh.service loaded active running OpenBSD Secure Shell server
...
NetworkManager.service loaded active running Network Manager
...
openvpn.service loaded active exited OpenVPN service
...
I’ve pulled out a few selections that are easier to relate to:
ssh.Each of these services communicates with many clients using domain sockets.
pollEach of these services is a process that any client send requests to. We’ve seen that domain sockets can help us to talk to many different clients as each is represented with a separate file descriptor. How does the service process know which of the file descriptors has information available on it? Imagine the following case:
sleep(100) (or, more realistically,
simply does some expensive computation).If the server issues a read on the
descriptor for A, it will block for 100 seconds!
Worse, it won’t service B despite it making a
request immediately. Why are we waiting for a slow client
when there is a fast client with data already available?
Yikes.
We’d really like a facility that can tell us
which descriptor’s have data and are ready to be
read from, and which are ready to be written to! Luckily,
UNIX comes to the rescue with its event
notification APIs. These are APIs that let us
understand when a file descriptor has an event
(i.e. a client writes to it) and is now readable or
writable. These include three functions: poll,
select, and (the more modern)
epoll. Lets look at how to use
poll!
pollLets look at some pseudocode for using
poll.
fdinfo[NUM_FDS] = {
# initialized to all fds of interest, listening for read and write events
}
while True:
poll(fdinfo, NUM_FDS, -1) # -1 = no timeout
for fdi in fdinfo:
# check and respond to each of the possible events
if fdi.revents & (POLLHUP | POLLERR):
# process closed fds
if fdi.revents & POLLIN:
# read off of, or accept on the file desciptor
if fdi.revents & POLLOUT:
# write to the file desciptorThis yields what is called an event loop – we
loop, each time processing a single event. You see event
loops in most GUIs (where each event is a key/mouse press),
and in web programming where javascript callbacks are
executed by the browser’s event loop. Importantly, we
only process descriptors that have events, thus can avoid
blocking on descriptors that don’t have available data!
This solves our previous problem: a server won’t block
awaiting communication with a client that is delayed, or
never writes to a channel, instead only
reading/writeing to descriptors
that are ready.
poll
APIint poll(struct pollfd *fds, nfds_t nfds, int timeout)
- We pass in an array of struct pollfds of
length nfds, with each entry corresponding to a
single file descriptor we want to get information about. The
timeout is in milliseconds, and enables
poll to return after that amount of time
returns even if none of the file descriptors has an event. A
negative timeout is interpreted as “infinite”,
while 0 means that poll will
return immediately with any current events.Lets check out the struct pollfd:
struct pollfd {
int fd; /* file descriptor */
short events; /* requested events */
short revents; /* returned events */
};When we make the poll call, we populate the
fd and events fields with the file
descriptors we’re interested in, and which events we’re
interested in retrieving. events is a “bitmap”
which means each bit in the value denotes a different type
of event, and we bitwise or event types together
when writing them into events. These event
types include:
POLLIN - is there data available to be
read? If the file descriptor is the domain
socket that we use for accept, then this
POLLIN means that a new client request is ready
to accept.POLLOUT - is there data available to be
writen?We can almost always set
events = POLLIN | POLLOUT as we wait to wait
for both.
When poll returns, we determine which events
happened by looking at the contents of the
revents field. In addition to
POLLIN and POLLOUT which tell us
if data is ready to be read or
written, we have the following:
POLLHUP - The other side of the pipe closed
its descriptor! Subsequent reads to the
descriptor will return 0. We can likely
close our descriptor.POLLERR - Again, there was some sort of a
problem with the descriptor, and we should likely
close it, terminating communication.poll CodeLets put this all together. For simplicity, in this
example, we’ll assume that we can always write
to a descriptor without blocking. This isn’t generally true
if you’re writing large amounts of data, and in “real code”
you’d also want to handle POLLOUT
appropriately.
#include "06/domain_sockets.h"
#include <errno.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <sys/wait.h>
#include <assert.h>
#include <poll.h>
void
panic(char *msg)
{
perror(msg);
exit(EXIT_FAILURE);
}
void
client(char *filename)
{
int i, socket_desc;
sleep(1); /* await the domain socket creation by the server */
socket_desc = domain_socket_client_create(filename);
if (socket_desc < 0) {
perror("domain socket client create");
exit(EXIT_FAILURE);
}
for (i = 0; i < 5; i++) {
char b;
if (write(socket_desc, ".", 1) == -1) panic("client write");
if (read(socket_desc, &b, 1) == -1) panic("client read");
printf("c %d: %c\n", getpid(), b);
}
close(socket_desc);
exit(EXIT_SUCCESS);
}
/* we can track max 16 fds */
#define MAX_FDS 16
void
server(char *filename)
{
int socket_desc, num_fds = 0;
struct pollfd poll_fds[MAX_FDS];
/*** Initialize the domain socket and its pollfd ***/
socket_desc = domain_socket_server_create(filename);
if (socket_desc < 0) {
unlink(filename); /* remove the previous domain socket file if it exists */
socket_desc = domain_socket_server_create(filename);
if (socket_desc < 0) panic("server domain socket creation");
}
/* Initialize all pollfd structs to 0 */
memset(poll_fds, 0, sizeof(struct pollfd) * MAX_FDS);
poll_fds[0] = (struct pollfd) {
.fd = socket_desc,
.events = POLLIN,
};
num_fds++;
/*** The event loop ***/
while (1) {
int ret, new_client, i;
/*** Poll; if we don't get a client for a second, exit ***/
ret = poll(poll_fds, num_fds, 1000);
if (ret == -1) panic("poll error");
/*
* If we timeout, break out of the loop.
* This isn't what you'd normally do as servers stick around!
*/
if (ret == 0) break;
/*** Accept file descriptor has a new client connecting! ***/
if (poll_fds[0].revents & POLLIN) {
if ((new_client = accept(socket_desc, NULL, NULL)) == -1) panic("server accept");
/* add a new file descriptor! */
poll_fds[num_fds] = (struct pollfd) {
.fd = new_client,
.events = POLLIN
};
num_fds++;
poll_fds[0].revents = 0;
printf("server: created client connection %d\n", new_client);
}
/*** Communicate with clients! ***/
for (i = 1; i < num_fds; i++) {
if (poll_fds[i].revents & (POLLHUP | POLLERR)) {
printf("server: closing client connection %d\n", poll_fds[i].fd);
poll_fds[i].revents = 0;
close(poll_fds[i].fd);
/* replace the fd to fill the gap */
poll_fds[i] = poll_fds[num_fds - 1];
num_fds--;
/* make sure to check the fd we used to fill the gap */
i--;
continue;
}
if (poll_fds[i].revents & POLLIN) {
char b;
poll_fds[i].revents = 0;
/* our server is simply sending a '*' for each input character */
if (read(poll_fds[i].fd, &b, 1) == -1) panic("server read");
if (write(poll_fds[i].fd, "*", 1) == -1) panic("server write");
}
}
}
close(socket_desc);
exit(EXIT_SUCCESS);
}
int
main(void)
{
char *ds = "domain_socket";
if (fork() == 0) client(ds);
if (fork() == 0) client(ds);
server(ds);
return 0;
}Up till now, we’ve been focusing on UNIX programming APIs, and how to interact with various aspects of the system. Part of this discussion has been how various programs can be orchestrated to cooperate. Pipelines enable larger functionalities to be composed out of multiple programs in which the output of one goes to the input of the next. System servers provide services to clients that use IPC to harness their functionality. This enables servers that control system resources (like wifi) to let many clients have limited and controlled access to those resources.
The next few chapters dive under the hood of processes. We’ll investigate
How do we think about our system’s programs? We’re used to writing our own programs, and like to think of the code we write being relatively self-contained. In contrast, quite a bit of system programming is about providing functionality that can be used by any program on the system. What if you wanted to implement the world’s best key-value store, and wanted to enable anyone to use your implementation!?
We think of these shared functionalities as falling into one of two categories:
libc is the core library that
provides all of the functions that we use when we include
many of the headers in the class, for example,
#include <stdlib.h>,
#include <stdio.h>, or
#include <string.h>.ssh sessions, that
provide your vpn connection, and that display
your graphical user interface, and many other functions.
We’ve already seen many of the IPC and communication
foundations that make these tick.This chapter, we’ll focus on libraries as a foundation for sharing software in systems.
Before we start, consider:
#include a ton of files, do you
think that all of the related functionality is compiled into
your program? That would mean that every program
has the functionality compiled into it.Visually, this is what it would look like to compile with
“libraries” that are defined as normal *.o
files.
The goals of libraries are to:
string.h).Libraries are collections of functionality that can be used by many different programs. They are code that expand the functionality of programs. This differs from services which are separate programs. Libraries have two core components that mirror what we understand about C:
Lets set up some visual nomenclature.
There are two main ways library code is integrated into programs:
We’ll discuss each of these in subsequent sections
We can see most of the header files in
/usr/include, within which you’ll find some
familiar files:
$ ls /usr/include/std*
/usr/include/stdc-predef.h /usr/include/stdint.h /usr/include/stdio_ext.h /usr/include/stdio.h /usr/include/stdlib.h
Yes, this means that you have the source code for much of
the standard library at your fingertips! But how does
gcc know to find all of these files when we use
#include <...>? The following
gcc incantation gives us a good idea:
$ gcc -xc -E -v -
...
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/9/include
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
These are all of the “include paths” that are searched,
by default, when we do an
#include <>.
Aside: In
Makefiles, we often want to add to tell the compiler to look for header files in our own files Thus, we update the include paths by adding multiple-I<dir>options to thegcccommand line. We can see how this works:$ gcc -I. -xc -E -v - ... #include <...> search starts here: . /usr/lib/gcc/x86_64-linux-gnu/9/include ...The include paths are searched in order, so you can see that when we explicitly tell
gccto include a directory-I.here, we’re asking to to check in the current directory (.) first.
Before we proceed to figure out how multiple
*.o objects are combined into an
executable, lets learn what it means for one object to
depend on another for some functionality. To understand
this, we have to understand how object files
(*.o) and binaries think about symbols and
linking them together.
When your program uses a library’s API, the library’s
code is linked into your program. Linking is the
act of taking any symbols that are referenced, but not
defined in your object (*.o) files, and the
definition of those symbols in the objects that provide
them. A symbol is a functions or global variable in
your program. Each symbol has a representation in your
object files so that they can be referenced across objects.
We have to understanding the linking operation to dive into
how libraries can be added into, thus accessible from your
program. Lets peek into how our objects think about the
world using a number of programs that can introspect on the
objects including nm, objdump, and
readelf.
As an example, lets look at your ptrie
implementation. We know that each of the tests (in
tests/0?-*.c) depends on your ptrie
implementation in ptrie.c. What does that look
like?
$ nm tests/01-add.o
U _GLOBAL_OFFSET_TABLE_
000000000000022a T main
U printf
U ptrie_add
U ptrie_allocate
U ptrie_free
U __stack_chk_fail
0000000000000000 t sunit_execute
0000000000000000 d test
0000000000000184 T test_add
0000000000000144 T test_alloc
The left column is the address of the symbol, and the character in the middle column gives us some information about the symbol:
T - this symbol is part of the code of the
*.o file, and is visible outside of the object
(i.e. it isn’t static). Capital letters mean
that the symbol is visible outside the object which simply
means it can be linked with another object.t - this symbol is part of the code, but is
not visible outside of the object (it is
static).D - a global variable that is visible.d - a global variable that is not
visible.U - an undefined symbol that must
be provided by another object file.For other symbol types, see man nm.
We can see what we’d expect: the test defines its own
functions for (e.g. main,
test_add), but requires the
ptrie_* functions and printf to be
provided by another object file. Now lets check out the
ptrie objects.
$nm ptrie.o
...
U calloc
U free
...
U printf
00000000000005ae T ptrie_add
00000000000003fc t ptrie_add_internal
0000000000000016 T ptrie_allocate
0000000000000606 T ptrie_autocomplete
00000000000000f8 T ptrie_free
...
U putchar
U puts
U strcmp
U strdup
U strlen
We can see that the ptrie.o object depends
on other objects for all of the functions we’re using from
stdio.h, malloc.h, and
string.h, provides all of the functions that
are part of the public API (e.g. ptrie_add),
and some other symbols
(e.g. ptrie_add_internal) that cannot be
linked to other object files.
After we link the ptrie into the test (with
gcc ... tests/01_add.o ptrie.o -o tests/01_add.test),
$ nm tests/01_add.test | grep alloc
U calloc@@GLIBC_2.2.5
0000000000001566 T ptrie_allocate
00000000000013ad T test_alloc
Now we can see that there are no longer any undefined
references (U) to ptrie_allocate,
so the test object has now found that symbol within the
ptrie.o object. Additionally, we can see that
some symbols (e.g. calloc here) are still
undefined and have some mysterious @@...
information associated with them. You might guess that this
somehow tells us that the function should be provided by the
standard C library (glibc on Ubuntu).
Lets see another example visually using the
libexample/* files:
Where are we? OK, lets summarize so far. Objects can have undefined symbol references that get linked to the symbols when combined with the objects in which they are defined. How is this linking implemented?
Question: What does a file format
mean? We talk about the format of webpages
being html, images, being png,
documents being docx or
pdf. What do these formats
mean? What is the format of object and
executable binaries?
First, we have to understand something that is a little amazing: all of our objects and binaries have a defined file format called the Executable and Linkable Format (ELF)20.
We can confirm that these files are ELF with
$ xxd ptrie.o | head -n 1
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000 .ELF............
xxd dumps out a file in hexadecimal format,
and head just lets us filter out only the first
line (i.e. at the head). The first characters in most
formats often denote the type of file, in this case
an ELF file.
This reveals a cool truth: programs are
data, plain and simple. They happen to be data of a
very specific format, ELF. nm,
readelf, and objdump are all
programs that simply understand how to dive into the ELF
format, parse, and display the information.
As programs are simply data encapsulated in the ELF format, we can start to understand what it means to link two objects together.
An in-depth explanation follows. Now we get to an interesting part of the ELF objects:
$ readelf -r tests/01_add.o | grep ptrie
000000000151 001500000004 R_X86_64_PLT32 0000000000000000 ptrie_allocate - 4
000000000179 001600000004 R_X86_64_PLT32 0000000000000000 ptrie_free - 4
000000000191 001500000004 R_X86_64_PLT32 0000000000000000 ptrie_allocate - 4
0000000001dd 001800000004 R_X86_64_PLT32 0000000000000000 ptrie_add - 4
00000000021f 001600000004 R_X86_64_PLT32 0000000000000000 ptrie_free - 4
The -r flag here outputs the “relocation”
entries of the elf object.
This output says that there are number of references to
the ptrie_* functions in the code, and
enumerates each of them. You can imagine that these are
simply all of the places in the code that these functions
are called. The first column gives us the offset into the
ELF object where the reference to the function is made – in
this case where it is called. For example, at the offset
0x15121 into the
ELF object, we have a call to ptrie_allocate.
This is important because the tests/01_add.o
does not know where the ptrie_allocate function
is yet. It will only know that when it links with the
ptrie implementation!
Lets check out what this looks like in the ELF object’s
code. For this, we’ll use objdump’s ability to
dump out the “assembly code” for the object file. It also
will try and print out the C code that corresponds to the
assembly, which is pretty cool.
$ objdump -S tests/01_add.o
...
sunit_ret_t
test_alloc(void)
{
144: f3 0f 1e fa endbr64
148: 55 push %rbp
149: 48 89 e5 mov %rsp,%rbp
14c: 48 83 ec 10 sub $0x10,%rsp
struct ptrie *pt;
pt = ptrie_allocate();
150: e8 00 00 00 00 callq 155 <test_alloc+0x11>
155: 48 89 45 f0 mov %rax,-0x10(%rbp)
...
I’ve printed out the addresses in the object around
offset 0x151 because the first reference to
ptrie_allocate is made there (see
readelf output above). We can see an
instruction to call test_alloc.
However, you can see that the address that the binary has
for that function is 00 00 00 00, or
NULL! This is what it means for a function to
be undefined (recall: the nm output for the
symbol is U).
Now we can understand what a linker’s job is:
What does a Linker do? A linker takes all of the undefined references to symbols from the relocation records, and rewrites the binary to populate those references with the address of that symbol (here updating
00 00 00 00to the actual address oftest_alloc).
In our Makefiles, we always use
gcc both for compiling *.c files
into objects, but also linking objects together into a
binary.
Lets back up and collect our thoughts.
.c file.This is the foundation for creating large programs out of small pieces, and to enable some of those pieces to be shared between programs.
We’ll also see that libraries enable an opportunity to save memory. For example,
A static library is simply a collection of ELF object
(*.o) files created with gcc -c.
They are collected together into a static library file that
has the name lib*.a. You can think of these
files as a .zip or .tar file
containing the ELF objects.
Static libraries of the form lib*.a and are
created with the ar (archive) program from a
collection of objects. An example from the
ptrie library (expanded from its
Makefile) that creates
libptrie.a:
$ ar -crs libptrie.a *.o
So a static library is just a collection of ELF objects created by our compilation process. If we are writing a program that wants to use a library, first you make sure to include its header file(s). Then, when compiling, we have to tell the compiler which library we want to link with by using a few compiler flags:
$ gcc -o tests/01_add.test 01_add.o -L. -lptrie
The last two flags are the relevant ones:
-L. says “look for static libraries in the
current directory (i.e. .)” – other directories
can be included, and a few default directories are used,
and-lptrie says “please link me with the
ptrie library” which should be found in one of
the given directories and is found in a file called
libptrie.a.Note that the linker already searches a few directories
for libraries (i.e. default -L paths):
$ gcc -print-search-dirs | grep libraries | sed 's/libraries: =//' | tr -s ":" '\n'
/usr/lib/gcc/x86_64-linux-gnu/9/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/9/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../x86_64-linux-gnu/lib/../lib/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/9/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../lib/
/lib/x86_64-linux-gnu/9/
/lib/x86_64-linux-gnu/
/lib/../lib/
/usr/lib/x86_64-linux-gnu/9/
/usr/lib/x86_64-linux-gnu/
/usr/lib/../lib/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../../x86_64-linux-gnu/lib/
/usr/lib/gcc/x86_64-linux-gnu/9/../../../
/lib/
/usr/lib/
As many of these paths are in directories that any user
can access, this is how the functionality of these libraries
can be accessed by any program wishing to use them. As we
compile our ptrie library as a static library,
you’ve already seen one example of these in use.
Static libraries do provide some facilities for trying to shrink the amount of memory required for library code. If this was all done naively, then all of the object files in a static library could get loaded into a program with which it is linked. This means that for N programs that link with a library whose objects take X bytes, we’ll have N × X bytes devoted to storing programs on disk, and running the programs in memory (if they all run at the same time). Some static libraries can be quite large.
Instead, static libraries are smarter. If a static
library contains multiple .o files, only
those object files that define symbols that are undefined in
the program being linked with, are compiled into the
program. This means that when designing a static
library, you often want to break it into multiple
.o files along the lines of different
functionalities that separately used by different programs22.
Some projects take this quite far. For example musl libc is a libc
replacement, and it separates almost every single function
into a separate object file (i.e. in a separate
.c file!) so that only the exact functions that
are called are linked into the program.
Shared or dynamic libraries (for brevity, I’ll call them
only “dynamic libraries”, but both terms are commonly used)
are linked into a program at runtime when the
program starts executing as part of exec.
Recall that even executable binaries might still have
undefined references to symbols. For example, see
calloc in the example below:
$ nm tests/01_add.test | grep calloc
U calloc@@GLIBC_2.2.5
Though we can execute the program
tests/01_add.test, it has references to
functions that don’t exist in the program! How can we
possibly execute a program that has undefined functions;
won’t the calls to calloc here be
NULL pointer dereferences?
To understand how dynamic linking works, lets look at the
output of a program that tells us about dynamic library
dependencies, ldd.
$ ldd tests/01_add.test
linux-vdso.so.1 (0x00007ffff3734000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fb502adf000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb502cfa000)
ldd also simply parses through the ELF
program, and determines which dynamic libraries are required
by the program. For now, we’ll ignore the
linux-vdso, but the other two entries are
interesting. We can see that the C standard library,
libc is being provided by
/lib/x86_64-linux-gnu/libc.so.6. These are both
dynamic libraries, as are most *.so.?* and
*.so files. If we check out that object we see
that it provides calloc.
$ objdump -T /lib/x86_64-linux-gnu/libc.so.6 | grep calloc
000000000009ec90 g DF .text 0000000000000395 GLIBC_2.2.5 __libc_calloc
000000000009ec90 w DF .text 0000000000000395 GLIBC_2.2.5 calloc
So this library provides the symbols we require (i.e. the
calloc function)!
If we want our program to use a dynamic library, we have to compile it quite similarly to when we wanted to use static libraries:
$ gcc -o dyn_prog *.o -L. -lptrie
So everything is somewhat normal here; we’re saying “look
for the library in this directory”, and compile me with the
ptrie library. To create the dynamic
library:
$ gcc -Wall -Wextra -fpic -I. -c -o ptrie.o ptrie.c
$ gcc -shared -o libptrie.so ptrie.o
The first line is the normal way to create an object file
from C, but includes a new flag, -fpic. This
tells the compiler to generate “Position
Independent Code”, or PIC, which is code that can,
seemingly magically, be executed when the code is loaded
into any address! The dynamic library cannot assume which
addresses it will be loaded into as many dynamic libraries
might be loaded into an executable, thus the PIC
requirement.
The second line creates the dynamic (shared) library. By
convention, all dynamic libraries end with .so
which you can think of as “shared object”. This is the line
that is dynamic-library equivalent to the ar
command for static libraries.
Now we have a binary that has been told were the library is; lets execute it!
$ ./dyn_prog
./dyn_prog: error while loading shared libraries: libptrie.so: cannot open shared object file: No such file or directory
Whaaaaaa? If we dive in a little bit, we can see:
$ nm ./dyn_prog | grep ptrie_alloc
U ptrie_alloc
But I thought that we wanted all symbols to be defined
when we create a binary? Why is the library’s
ptrie_alloc not linked into the program? We can
see that not all symbols are defined in a program when we
are using dynamic libraries as they are linked when we
try and run the program!
We now can see the main practical difference between static and dynamic libraries:
-lptrie directive.exec).We can start to see why the program wont execute when we look at its dynamic library dependencies:
$ ldd ./dyn_prog
linux-vdso.so.1 (0x00007ffdf3ca0000)
libptrie.so => not found
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ff9921e8000)
/lib64/ld-linux-x86-64.so.2 (0x00007ff9923f8000)
It isn’t finding libptrie.so!
To execute the program, we have to properly set an environment variable that will tell the program, when it is executed, where to look for the dynamic library:
$ LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./dyn_prog
...success...
This is some strange shell syntax.
LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH essentially
says “The environment variable called
LD_LIBRARY_PATH should be updated to prepend
the diretory ./ onto the front of it. This is
conceptually similar to something like
lib_path = "./:" + lib_path in a language that
supports string concatenation. When ./dyn_prog
is executed, the updated LD_LIBRARY_PATH
environment variable is visible in that program using the
normal getenv. So there is some part of the
initialization of ./dyn_prog that looks at this
environment variable to figure out where to look for the
dynamic library.
To confirm why we can now execute the program, we can again look at the library dependencies:
$ LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ldd prog_dynamic
linux-vdso.so.1 (0x00007ffc939f8000)
libptrie.so => ./libptrie.so (0x00007f70d316f000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f70d2f66000)
/lib64/ld-linux-x86-64.so.2 (0x00007f70d3236000)
If we expand the LD_LIBRARY_PATH to include
the current directory, the program can find the library.
If we want to see the default paths that are searched for dynamic libraries:
$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/lib:
/lib/x86_64-linux-gnu:
/lib32:
/libx32:
/lib:
In fact, ldconfig -v will also print out all
of the potential default dynamic libraries we can link
with.
exec
with Dynamic LinkingHow does the libc.so library get linked into
our program? Diving in a little bit further, we see that
ld is our program’s “interpreter”:
$ readelf --program-headers tests/01_add.test
Elf file type is DYN (Shared object file)
Entry point 0x1180
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x00000000000008e8 0x00000000000008e8 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000e55 0x0000000000000e55 R E 0x1000
...
A lot is going on here, but we see the reference to
ld-linux-x86-64.so.2 that we saw previously in
the ldd output. We can now see that the library
(that I’ll call simply ld) is a “program
interpreter” (see
“Requesting program interpreter: /lib64/ld-linux-x86-64.so.2”)
for our program23. This is
the core of understanding how our program can execute while
having undefined references to functions provided by another
library.
When we call exec, we believe that the
program we execute takes over the current process and starts
executing. This is not true! Instead, if a program
interpreter is defined for our program (as we see: it is),
the interpreter program’s memory is loaded along side our
program, but then the interpreter is executed instead of our
program! It is given access to our program’s ELF object, and
ld’s job is to finish linking and loading our
program. We have to add ./ to the
LD_LIBRARY_PATH to make the program execute
because ld reads that environment
variable when it is trying to link and load in your
program!
execBut wait. Why does exec end up loading
ld, which then loads our program? Why load
ld to link our program instead of just running
our program? Because ld also loads and
links all of the dynamic libraries (.sos) that
our program depends on!!!
We can confirm that ld is loaded into our
program by executing it in gdb, blocking it on
breakpoint, and outputting its memory maps
(cat /proc/262648/maps on my system):
555555554000-555555555000 r--p 00000000 08:02 527420 /home/gparmer/repos/gwu-cs-sysprog/22/hw_solns/02/tests/01_add.test
555555555000-555555556000 r-xp 00001000 08:02 527420 /home/gparmer/repos/gwu-cs-sysprog/22/hw_solns/02/tests/01_add.test
555555556000-555555557000 r--p 00002000 08:02 527420 /home/gparmer/repos/gwu-cs-sysprog/22/hw_solns/02/tests/01_add.test
555555557000-555555558000 r--p 00002000 08:02 527420 /home/gparmer/repos/gwu-cs-sysprog/22/hw_solns/02/tests/01_add.test
555555558000-555555559000 rw-p 00003000 08:02 527420 /home/gparmer/repos/gwu-cs-sysprog/22/hw_solns/02/tests/01_add.test
7ffff7dbe000-7ffff7de3000 r--p 00000000 08:02 2235639 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7ffff7de3000-7ffff7f5b000 r-xp 00025000 08:02 2235639 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7ffff7f5b000-7ffff7fa5000 r--p 0019d000 08:02 2235639 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7ffff7fa5000-7ffff7fa6000 ---p 001e7000 08:02 2235639 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7ffff7fa6000-7ffff7fa9000 r--p 001e7000 08:02 2235639 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7ffff7fa9000-7ffff7fac000 rw-p 001ea000 08:02 2235639 /usr/lib/x86_64-linux-gnu/libc-2.31.so
7ffff7fac000-7ffff7fb2000 rw-p 00000000 00:00 0
7ffff7fc9000-7ffff7fcd000 r--p 00000000 00:00 0 [vvar]
7ffff7fcd000-7ffff7fcf000 r-xp 00000000 00:00 0 [vdso]
7ffff7fcf000-7ffff7fd0000 r--p 00000000 08:02 2235423 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7fd0000-7ffff7ff3000 r-xp 00001000 08:02 2235423 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ff3000-7ffff7ffb000 r--p 00024000 08:02 2235423 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ffc000-7ffff7ffd000 r--p 0002c000 08:02 2235423 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ffd000-7ffff7ffe000 rw-p 0002d000 08:02 2235423 /usr/lib/x86_64-linux-gnu/ld-2.31.so
7ffff7ffe000-7ffff7fff000 rw-p 00000000 00:00 0
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]
Recall that this dumps all of the memory segments of the process.
There are two important observations:
/usr/lib/x86_64-linux-gnu/libc-2.31.so is
present in the process so somehow got linked and loaded into
the process./usr/lib/x86_64-linux-gnu/ld-2.31.so is
also present in the process; indeed it was the first code to
execute in the process!Lets summarize the algorithm for executing a dynamically loaded program.
exec is called to execute a new
program.
The program’s ELF file is parsed, and if a “program interpreter” is set, the interpreter is instead executed.
In most cases, the interpreter is ld,
and it is loaded into the process along with the target
program.
Execution is started at ld, and it:
LD_LIBRARY_PATH
environment variable to help look for them.calloc above) are set to
point to library functions.main) of our program.This is all quite complex. Why would we ever want all of this complexity over the simplicity of static libraries?
First, lets look at how much memory a library takes up.
$ ls -lh /usr/lib/x86_64-linux-gnu/libc-2.31.so
-rwxr-xr-x 1 root root 2.0M Dec 16 2020 /usr/lib/x86_64-linux-gnu/libc-2.31.so
So libc takes around 2MiB. Other libraries
take quite a bit more:
libradeonsi_dri.so is 23 MiB)libmysqlclient.so
is 7.2 MiB)libmozjs.so is 11 MiB, and webkit’s
libjavascriptcoregtk.so is 26 MiB)If N programs
must load in multiple libraries (totaling, say, X MiB), then N × X MiB is
consumed. For context, my system is currently running 269
programs (ps aux | wc -l), so this memory can
add up!
Dynamic libraries enable us to do a lot better. The contents memory for the library is the same regardless the process it is present in, and an Operating System has a neat trick: it can make the same memory appear in multiple processes if it is identical and cannot be modified24. As such, dynamic libraries typically only require memory for each library once in the system (only X), as opposed for each process.
When a program is linked with a static library, the result is relatively small as only the objects in the library that are necessary are linked into the program. When executed as a process, the memory consumed will only include the necessary objects of the library. In contrast, a program that uses a dynamic library links in the entire library at runtime into a process. Thus, for a single process, a static library approach is almost always going to be more memory-efficient.
However, in a system with many executing processes, the per-library memory, rather than per-process memory, will result in significantly less memory consumption. Requiring only a single copy of a larger dynamic library shared between all processes, uses less memory than multiple copies of smaller compilations of the library in each program.
Lets dive a little deeper into how software is structured on our system, and the trade-offs that different libraries make in the system.
Lets investigate the libexample from the
figures in last week’s lectures. See the 08/libexample/
directory for examples of using normal linking
(make naive), static library linking
(make static), and dynamic library linking
(make dynamic). The source includes:
prog and prog2 that are
programs that use libraries.example.c is a single file that creates the
first library which generates libexample.a and
libexample.so.example2.c and example2b.c are
the files that create the second library which generates
libexample2.a and
libexample2.so.Each of the library objects uses around 750KB of
memory. To help you set the LD_LIBRARY_PATH to
execute the dynamic libraries, you can run the dynamic
library examples using
$ ./run_dyn.sh ./prog_dynamic
nm and ldd to understand
which libraries and object files within those libraries each
of the programs require.ls -lh to see the size of each of the
resulting executable programs. Explain the differences in
sizes.libc), and static libraries for the rest.
Why?We saw how libraries enable the sharing of functionality
between programs. Static libraries are convenient, and only
required objects are linked with your program at compile
time. In contrast, dynamic libraries are linked at runtime
before the main of your program
executes.
In addition to enabling shared library code, dynamic libraries also enable explicit means to load libraries programmatically. Namely, UNIX provdes an API that enables your process to load a dynamic library into itself. This is kinda crazy: a running process with a set of code and data can add a library with more code and data into itself dynamically. This effectively means that your program can change its own code as it executes by adding and removing dynamic libraries. Being able to load library code into your process as your program executes (at runtime) is an instance of self-modifying code, and it enables a few very cool features including:
ffmpeg) enable different formats to be
handled with separate dynamic libraries, thus enabling the
processing of any image or video by simply loading a new
library, andpython,
dynamically load libraries (many or most are written in C)
into a process when the python code imports
them. Since the imports are part of the code,
the corresponding dynamic libraries must be loaded by the
python runtime dynamically. This enables your dynamic (slow)
python code to load and leverage fast C libraries.This API is provided by the linker (ld.so)
that loaded the elf object for the currently executing
process. What does the API that enables this dynamic loading
look like?
handle = dlopen(file_path, flags) - Open
and load in a dynamic library. This returns a handle to the
library that you should pass into the latter functions. You
can read about the flags in
man dlopen. Unless otherwise noted, I’d
recommend using RTLD_NOW | RTLD_LOCAL25. Returns
NULL if it cannot find the dynamic
library.dlclose(handle) - Close and unload a
library that was previously loaded.symb_ptr = dlsym(handle, symbol) - pass in
a symbol which is a string holding a function
or variable name in the dynamic library. It should return a
pointer to that symbol, which you can then call (if it is a
function pointer), or dereference if it is a variable.#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int
main(void)
{
void *handle;
typedef int *(*bar_fn_t)(int a);
bar_fn_t bar;
int *ret;
/* Open a dynamic library: the example is in ../08/libexample/ */
handle = dlopen("08/libexample/libexample.so", RTLD_NOW | RTLD_LOCAL);
if (!handle) return EXIT_FAILURE;
/*
* Lets find the function "bar" in that dynamic library.
* We know that `bar` is the function: int *bar(int a)
*/
bar = dlsym(handle, "bar");
if (!bar) return EXIT_FAILURE;
/* bar allocates an int, populates it with the argument + 2 and returns it */
ret = bar(40);
printf("%d\n", *ret);
free(ret);
return 0;
}Program output:
42
As dlsym returns a pointer to the symbol
passed in (as a string), it means that you end up seeing a
lot of function pointers when you’re using plugin
infrastructures. See (a) in the image below.
We can also use dlsym to ask for a symbol in
any of the installed libraries. To do so, we pass
in the “special pseudo-handles” RTLD_DEFAULT or
RTLD_NEXT.
#define _GNU_SOURCE /* added this due to `man 3 dlsym` */
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <string.h>
#include <unistd.h>
int
main(void)
{
typedef int (*write_fn_t)(int fd, char *buf, size_t s);
write_fn_t w;
char *buf = "Look at me! Diving into your dynamic libraries...pulling out the symbols\n";
/* Now lets find `write`! */
w = dlsym(RTLD_DEFAULT, "write");
if (!w) return EXIT_FAILURE;
w(STDOUT_FILENO, buf, strlen(buf));
return 0;
}Program output:
Look at me! Diving into your dynamic libraries...pulling out the symbols
Though this example doesn’t have much of a point, as we
can directly call write, thus have to go
through the symbol we retrieved from dlsym, See
(b) in the image below.
One powerful mechanism that dynamic libraries enable is
to interpose on library function calls. Your
program might believe that it calls malloc when
it invokes the function “malloc”, but you can orchestrate it
so that it instead calls a malloc in your
library! You can then, if you choose, invoke the normal
library calls, thus why this is known as
interposing on the library calls. But how?
The environment variable LD_PRELOAD can be
used to specify a dynamic library to load into the process
to interpose on any library calls. A trivial use of
this might be to log (i.e. write out to a file) all
of a set of library calls. We’ll use it to add intelligence
on the memory allocation paths.
When we use LD_PRELOAD=lib, the symbols in
the dynamic library that we specify in lib have
priority over those of other libraries. Thus, if we include
a malloc function definition, when the
application calls malloc, it will call
our malloc! In the image above, (c) demonstrates
using LD_PRELOAD.
In malloclog,
we can see a library that interposes on malloc
and free, and prints information to standard
error. This includes the following:
typedef void *(*malloc_fn_t)(size_t s);
static malloc_fn_t malloc_fn;
/* This `malloc` will override libc's `malloc` */
void *
malloc(size_t sz)
{
void *mem;
/* if we don't yet know where libc's malloc is, look it up! */
if (malloc_fn == NULL) {
/*
* Lets find the `malloc` function in libc! The `RTLD_NEXT`s
* specify that we don't want to find our *own* `malloc`, we'd like to
* search starting with the *next* library.
*/
malloc_fn = dlsym(RTLD_NEXT, "malloc");
assert(malloc_fn);
}
/* call libc's `malloc` here! */
mem = malloc_fn(sz);
fprintf(stderr, "malloc(%ld) -> %p\n", sz, mem);
return mem;
}We do the same for free, and should do the
same also for calloc and other memory
allocation functions.
Question: Why can’t we just call
malloc directly instead of doing this complex
thing with dlsym?
We can use it on a simple program that simply calls
malloc and free on the allocated
memory. To test this out, in malloclog:
$ make
$ LD_PRELOAD=./libmalloclog.so ./main.bin
malloc(4) -> 0x55fe6cd492a0
free(0x55fe6cd492a0)
...
This shows us that dynamic libraries are
programmable and that we can use them in
interesting ways to coordinate between libraries, provide
new functionalities, and to interpose on functionalities.
For example, many of the largest companies use
LD_PRELOAD to swap out the malloc
implementation for one that is higher performance.
Questions:
valgrind
functionality using the ability to interpose on library
functions?Note that running a program that uses
LD_PRELOAD environment variables are more
difficult to use in gdb. The ldpreload_w_gdb.sh
script demonstrates how to use both technologies
together.
We’ve talked about the many resources that are provided by the system. These include files, pipes, and domain sockets. But we’ve been vague about what the system is. Though we’ll leave the details for a later course, lets dive in to what constitutes the “system” a little bit.
We’ve claimed many times that the system provides pipes, domain sockets, files, the file system, etc… What is “the system”? First, lets start from what we know: is this system just a dynamic library? To some extent this makes sense.
Question:
read, open, etc… all be
simply provided solely by dynamic libraries?A few observations:
pipes, involve
multiple processes, thus there must be something in the
system beyond specific processes.The system, as we’ve discussed it so far, is
called the kernel. One of the main components of
every operating system is the kernel. Every process in the
system uses the kernel’s functionality. It provides the
logic for most of the calls we’ve learned in the
class, fork, exec,
pipe, read, write,
open, creat, etc… and also most of
the resources we’ve discussed including channels, domain
sockets, pipes, files, and the filesystem. It is some of the
most trusted code on the system. If it has a bug, the whole
system can crash – or worse!
We have to be able to call functions in the kernel from any process. This means we want to make what seem like function calls to the kernel. And we want to make sure that the faults of one process don’t spread to another! This means that somehow the kernel must be isolated from each process.
System calls are the mechanism that behaves like a function call (and looks like one to us!), but that switches over to the kernel. System calls are a special instruction in the instruction set architecture (Intel x86 or ARM) that triggers the switch to the kernel. When the kernel’s done with its functionality, it can return (via a special instruction) to the process.
So system calls behave like function calls, but inherently involves switches from the process and to the kernel and back.
Question:
In UNIX, system calls are available in
unistd.h, so whenever you include that file to
use a function, you know the function is likely a direct
system call! You can also tell if a UNIX function is a
system call by looking at the sections of
man (from man man):
MANUAL SECTIONS
The standard sections of the manual include:
1 User Commands
2 System Calls
3 C Library Functions
4 Devices and Special Files
5 File Formats and Conventions
6 Games et. al.
7 Miscellanea
8 System Administration tools and Daemons
Questions:
Do man write. What is this documentation
for?
Do man 2 write. What is this
documentation for? What does the 2 mean?
You can always tell which section you’re looking at in
the header line of the man output:
WRITE(2) shows that we’re reading about
write in the “System Calls” section.
Is malloc a system call or a C Library
Function?
We know what a system call looks like, as we’ve used them many times throughout the class. Lets look at the various ways to make the same system call. First, the normal way:
#include <unistd.h>
int
main(void)
{
write(1, "hello world\n", 12);
return 0;
}Program output:
hello world
The C standard library even provides you a “generic” way
to invoke any system call. The syscall
function lets you make an integer-identified system call.
Each system call on the system is identified with a unique
integer. In this case, we’ll use SYS_write.
#include <unistd.h>
#include <sys/syscall.h>
int
main(void)
{
syscall(SYS_write, 1, "hello world\n", 12);
return 0;
}Program output:
hello world
Lets look at a few of these integer values to see what’s happening here:
#include <stdio.h>
#include <sys/syscall.h>
int
main(void)
{
printf("write: %d\nread: %d\nopen: %d\nclose: %d\n",
SYS_write, SYS_read, SYS_open, SYS_close);
return 0;
}Program output:
write: 1
read: 0
open: 2
close: 3
Note, these integer values are not related to
descriptors (e.g. STDIN_FILENO == 0). The
system calls in Linux (and in all operating systems!) are
each assigned an integer with which to identify them. You
can see all of the system calls
and their integer numbers, and some documentation in man syscalls
(link).
Exercise:
OK, now lets do a little more of a deep-dive. You won’t be familiar with the Intel’s/AMD’s x86-64 assembly language that follows, but the comments on the right give you the jist.
.data ; Add some global variables - our "hello world" message
msg:
.ascii "hello world\n"
len = . - msg ; magic length computation "here" - start of `msg`
.text ; The code starts here!
.global _start
_start: ; Note: the %rxx values are registers, $x are constants
movq $1, %rax ; Syscall's integer value, 1 == write, now set up the arguments...
movq $1, %rdi ; Arg 1: file descriptor 1
movq $msg, %rsi ; Arg 2: the string!
movq $len, %rdx ; Arg 3: the length of the string
syscall ; Kick off the system call instruction!
movq $60, %rax ; System call #60 == exit
xorq %rdi, %rdi ; Arg 1: xor(x, x) == 0, pass NULL as argument!
syscallWe compile this assembly with make (in
10/). Program output:
hello world
We can see that the requirements of making a system call
are quite minimal! A special instruction (on x86-64,
syscall) provides the magical incantation to
make what feels like a function call to the
kernel.
How does this work? Recall that ELF, the file format for objects and programs includes many pieces of information necessary to execute the program. Among them, the starting or entry address which is the initial instruction to start executing when the program runs.
$ readelf -a ./asm_syscall | grep "Entry"
Entry point address: 0x401000
$ readelf -a ./asm_syscall | grep "401000"
Entry point address: 0x401000
...
6: 0000000000401000 0 NOTYPE GLOBAL DEFAULT 1 _start
We can see that the “entry point address” for the binary
is 0x401000, and we can see that the
_start symbol has that address. The
exec logic of the kernel will set the program
to start executing at that _start address.
Note that this same analysis rings true for the normal C
system call code above (not only for the assembly)! The
_start function is where all execution starts
in each program. This explains why our program above labels
its code with _start!
The strace program lets us monitor the
system calls a program makes. Running int on the assembly
program above:
$ strace ./asm_syscall
execve("./asm_syscall", ["./asm_syscall"], 0x7fff3c06ae10 /* 50 vars */) = 0
write(1, "hello world\n", 12hello world
) = 12
exit(0) = ?
+++ exited with 0 +++
This likely has what you’d expect: We see the
exec that kicks off the program, the
write, and, finally, the exit.
You’d likely expect the same from the normal C program that
also writes out hello world.
$ trace ./normal_syscall
execve("./normal_syscall", ["./normal_syscall"], 0x7ffc036c1520 /* 50 vars */) = 0
brk(NULL) = 0x5638ed59d000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffd8bfb49c0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=138102, ...}) = 0
mmap(NULL, 138102, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7efddf831000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360A\2\0\0\0\0\0"..., 832) = 832
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\237\333t\347\262\27\320l\223\27*\202C\370T\177"..., 68, 880) = 68
fstat(3, {st_mode=S_IFREG|0755, st_size=2029560, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7efddf82f000
pread64(3, "\6\0\0\0\4\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0"..., 784, 64) = 784
pread64(3, "\4\0\0\0\20\0\0\0\5\0\0\0GNU\0\2\0\0\300\4\0\0\0\3\0\0\0\0\0\0\0", 32, 848) = 32
pread64(3, "\4\0\0\0\24\0\0\0\3\0\0\0GNU\0\237\333t\347\262\27\320l\223\27*\202C\370T\177"..., 68, 880) = 68
mmap(NULL, 2037344, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7efddf63d000
mmap(0x7efddf65f000, 1540096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22000) = 0x7efddf65f000
mmap(0x7efddf7d7000, 319488, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x19a000) = 0x7efddf7d7000
mmap(0x7efddf825000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7efddf825000
mmap(0x7efddf82b000, 13920, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7efddf82b000
close(3) = 0
arch_prctl(ARCH_SET_FS, 0x7efddf830540) = 0
mprotect(0x7efddf825000, 16384, PROT_READ) = 0
mprotect(0x5638eb775000, 4096, PROT_READ) = 0
mprotect(0x7efddf880000, 4096, PROT_READ) = 0
munmap(0x7efddf831000, 138102) = 0
write(1, "hello world\n", 12hello world
) = 12
exit_group(0) = ?
+++ exited with 0 +++
Whaaaaa!? If you look carefully, at the bottom we see the
expected output. But we also see a lot of system
calls previous to that. Between when we start execution at
_start, and when we execute in
main, quite a lot happens.
Question:
main?There are many reasons why some functionality might be in a library, and accessed via function calls, or in the kernel, and accessed with system calls.
Lets look at an example of memory allocation. We can, of
course, use malloc and free to
allocate and free memory. But we can also call
mmap and munmap that make
system calls to allocate and free memory.
Questions:
malloc/free or
mmap/munmap?#include "10/timer.h"
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
#include <stdlib.h>
#define ITER 256
int
main(void)
{
int i;
unsigned long long start, end;
int *mem;
/* How many cycles does `write` take? */
start = cycles();
for (i = 0; i < ITER; i++) {
mem = malloc(256);
mem[0] = 42;
free(mem);
}
end = cycles();
printf("\nmalloc + free overhead (cycles): %lld\n", (end - start) / ITER);
/* How many cycles does `fwrite` take? */
start = cycles();
for (i = 0; i < ITER; i++) {
mem = mmap(NULL, 256, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
mem[0] = 42;
munmap(mem, 256);
}
end = cycles();
printf("\nmmap + munmap overhead (cycles): %lld\n", (end - start) / ITER);
return 0;
}Program output:
malloc + free overhead (cycles): 235
mmap + munmap overhead (cycles): 15943
What is a “cycle”? When you read that you have a 3GHz, that essentially says that the internal “clock” that drives the logic in the processor cycles by at 3 billion cycles per second. That’s a lot. Each cycle is 1/3 a nanosecond. A cycle is the smallest unit of measurement on a processor, and the
cyclesfunction we implement here gives us how many cycles have elapsed since the system boots.
We can also look at the output APIs like
write and compare them to the stream APIs that
buffer their output.
Questions:
write or
to a stream (via a FILE) using
fwrite, what performance would you expect?fflush after the
fwrite?#include "10/timer.h"
#include <unistd.h>
#include <stdio.h>
#define ITER 256
int
main(void)
{
int i;
unsigned long long start, end;
/* How many cycles does `write` take? */
start = cycles();
for (i = 0; i < ITER; i++) write(1, " ", 1);
end = cycles();
printf("\nwrite overhead (cycles): %lld\n", (end - start) / ITER);
/* How many cycles does `fwrite` take? */
start = cycles();
for (i = 0; i < ITER; i++) fwrite(" ", 1, 1, stdout);
end = cycles();
printf("\nfwrite (stream) overhead (cycles): %lld\n", (end - start) / ITER);
/* How many cycles does `fwrite + fflush` take? */
start = cycles();
for (i = 0; i < ITER; i++) {
fwrite(" ", 1, 1, stdout);
fflush(stdout);
}
end = cycles();
printf("\nfwrite + fflush overhead (cycles): %lld\n", (end - start) / ITER);
return 0;
}Program output:
write overhead (cycles): 2092
fwrite (stream) overhead (cycles): 88
fwrite + fflush overhead (cycles): 1961
We’ve seen that
fork,
exec) and inter-process interactions
(wait, pipe, etc…) as these
functionalities require implementation beyond any single
process,This leads us to the conclusion that for performance,
we’d like to implement as many functions as possible in
libraries, and to rely on the kernel when processes require
modifications (exec, fork), when
there are multiple processes involved (IPC), or when the
finite resources of the system must be split up between
processes (open, creat,
sbrk).
Here, we’re going to dive into the design of the
memory management functionalities in UNIX to better
understand the trade-offs. Specifically, how does
malloc and free work!?
So malloc, calloc,
realloc, and free are implemented
in a library (in libc), and when we request
memory from them, they might call sbrk (or
mmap on some systems) to get memory from the
kernel, but from that point on, they will manage that memory
themselves. What does this mean? When the heap
doesn’t have enough free memory to satisfy a
malloc request, we call the kernel to
expand the heap. To do so, we ask the kernel for
memory via sbrk or mmap, and we
generally get a large chunk of memory. For example, we might
ask sbrk for 64KB of memory with
mem = sbrk(1 << 16)26. Then it is
malloc’s job to split up that memory into the
smaller chunks that we’ll return in each subsequent call to
malloc. When memory is freed, it
is returned to the pool of memory which we’ll consider for
future allocations. Because we’re reusing freed
memory, we’re avoiding system calls when the heap contains
enough freed memory to satisfy a request.
This is what it means that our
malloc/free code in the
libc library need to track the memory in the
heap. We need to track where allocated memory is
located and where freed spans of memory are
located, to enable intelligent allocation decisions for
future malloc calls.
Questions:
The malloc and free
implementation must track which chunks of memory are
allocated, and which are freed, but it cannot use
malloc to allocate data-structures. That would
result in malloc calling itself recursively,
and infinitely! If malloc requires memory
allocation for its tracking data-structures, who tracks the
memory for the tracking data-structures? Yikes.
We solve this problem by allocating the tracking
data-structures along with the actual memory allocation
itself. That is to say, we allocate a header that
is a struct defined by malloc that is
directly before an actual memory allocation.
Because the header is part of the same contiguous memory
chunk, we allocate the header at the same time as we
allocate memory! When memory is freed, we use the header to
track the chunk of memory (the header plus the free memory
directly following the header) in a linked list of free
chunks of memory called a freelist.
malloc, we can find the free chunk
large enough for the allocation, remove it from the
freelist, …malloc request that
can’t be satisfied by the freelist, only then do we
make a system call to allocate more heap.smallocLook at the 10/smalloc.c
(link) file for a simple implementation of a
malloc/free that assumes only
allocations of a specific size. This means that we
don’t need to track the sizes of allocations and free chunks
in the headers, simplifying our implementation. Build, and
run the program:
$ make
$ ./smalloc.bin
It currently implements this simple malloc!
Read through the smalloc and sfree
functions, and the main. Answer
QUESTION 1-3. Then, move on to
QUESTION 4. In expand_heap, we
make the system call to allocate more memory from the system
with sbrk. We need to set up that newly
allocated heap memory to include memory chunks that we can
allocate with smalloc. Answering how this is
done is the core of QUESTION 4.
This material adapted from Prof. Aviv’s material.
We’ve seen how the filesystem is used to store and name files and channels, and we’ve seen the APIs to access directories and files. If everyone had access to every else’s files, then there would be no way to secure your information. Thus, the kernel provides means to control different user’s access to different files and directories. This means that I cannot access your files, and vice-versa!
To understand the core concept of access control in UNIX, we have to understand
The filesystem stores files and directories are somehow
associated with each of us, separately. Lets check out the
properties of a file that are returned by running
ls -l:
.- Directory?
| .-------Permissions .- Directory Name
| ___|___ .----- Owner |
v/ \ V ,---- Group V
drwxr-x--x 4 aviv scs 4096 Dec 17 15:14 im_a_directory
-rw------- 1 aviv scs 400 Dec 19 2013 .ssh/id_rsa.pub
^ \__________/ ^
File Size -------------' | '- File Name
in bytes |
|
Last Modified --------------'
An example from the class’ server where we create a file, a directory, and see the results:
$ echo hi > file.txt
$ mkdir hello
$ ls -l
-rw-r--r-- 1 gparmer dialout 3 Mar 18 12:19 file.txt
drwxr-xr-x 2 gparmer dialout 4096 Mar 18 12:20 hello
We see that each file and directory has a owner and group associated with it.
Question:
The owner of a file is the user that is directly responsible for the file and has special status with respect to the file permissions. Users can also be collected together into a set called a group, a collection of users who posses the same permissions. A file is associated with a group. A file’s or directory’s owners and groups have different permissions to access the file. Before we dive into that, lets understand how to understand users and groups.
You all are already aware of your username. You can get
your username with the whoami command:
$ whoami
gparmer
To have UNIX tell you your username and connection
information on this machine, use the command,
who:
$ who
gparmer pts/1 2022-03-18 12:12 (128.164...)
The first part of the output is the username. The rest of the information in the output refers to the terminal you’re using (it is a “pseudo-terminal device”, something that pretends it is a typewriter from the 60s), the time the terminal was created, and from which host you are connected.
You can determine which groups you are in using the groups command.
$ groups
dialout CS_Admins
On the class’ server, I’m in the dialout and
CS_Admins groups. On my work computer, I’m in a
few more groups:
$ groups
ycombinator adm cdrom sudo dip plugdev lpadmin sambashare
The good news is that I can use my cdrom.
Fantastic. It will pair well with my pseudo-typewriter.
Cool.
The permissions associated with each file/directory specify who should be able to have the following access:
r - the ability to read the file,
or read the contents of a directory.w - the ability to write to the
file, or update the contents of a directory.x - the ability to execute a file
(i.e. a program), or chdir into a
directory.We think about different users as having different sets
of these permissions to access files and directories. When
we see the permissions string for a file, we see three sets
of these rwx permissions:
-rw-r--r-- 1 gparmer dialout 3 Mar 18 12:19 file.txt
Here we have three sets of permissions rw-
(read and write), r-- (read only), and
r-- (read only). These three sets of
permissions each specify the access permissions for the
owner, the group, and everyone else’s
global permissions.
.-- Directory Bit
|
| ,--- Global Permission (i.e. "every else")
v / \
-rwxr-xr-x
\_/\_/
| `--Group Permission
|
`-- Owner Permission
Given this, now we can understand the permissions for each file in:
ls -l 11/perm_example/
total 0
-rw-r--r-- 1 gparmer gparmer 0 Mar 25 18:43 bar
-rw-rw-r-- 1 gparmer gparmer 0 Mar 25 18:43 baz
-rwx------ 1 gparmer gparmer 0 Mar 25 18:42 foo
Question:
Each permission has a numerical representation.
We represent it in octal, which is a fancy way to
say that we think of each digit of our numerical
representation as being between 0-7 – it is
“base 8” (in contrast to the normal digits numbers we use
every day that are “base 10”). Thus, each of the three sets
of permissions is 3 bits.
Each of the three permissions, r,
w, and x correspond to the
most-significant to the least significant bits in each octal
digit. For example:
rwx -> 1 1 1 -> 7
r-x -> 1 0 1 -> 5
--x -> 0 0 1 -> 1
rw- -> 1 1 0 -> 6
On the left we see a permission, its bit representation, and the associated octal digit. The shorthand, you can think of this is this:
r = 4 as it is represented with the most
significant bitw = 2 as the second bitx = 1 as it is the least significant
bitThus the octal digit is simply the addition of each of these numeric values for each access right. We can combine these digits to encapsulate permissions for the owner, group, and everyone else:
-rwxrwxrwx -> 111 111 111 -> 7 7 7
-rwxrw-rw- -> 111 110 110 -> 7 6 6
-rwxr-xr-x -> 111 101 101 -> 7 5 5
So now we know how to interpret, and understand these permission values, and how to represent them as digits.
To change a file permission, you use the
chmod command, or the chmod system
call. In each, we specify the file to update, and the new
permissions in octal. Lets play around with permission and
see their impact:
$ gcc helloworld.c -o helloworld.bin
$ ls -l helloworld.bin
-rwxrwxr-x 1 gparmer gparmer 41488 Mar 22 08:52 helloworld.bin
$ ./helloworld.bin
hello world!
I can execute the program, and I could read the program,
for example using nm. I could even overwrite
the file.
Note: on the server, the default permissions are more restrictive –
-rwxr-xr-x 1 gparmer dialout 16464 Mar 25 19:16 helloworld.bin. We don’t want anyone other than us writing to the file on a shared server!
What if I removed all permissions for everyone but the owner, and even removed my own ability to execute?
$ chmod 600 helloworld.bin
$ ls -l helloworld.bin
-rw------- 1 gparmer gparmer 41488 Mar 22 08:52 helloworld.bin
$ ./helloworld.bin
-bash: ./helloworld.bin: Permission denied
We removed our own ability to execute the file, so we no
longer can! Note that permission 6 is
4 + 2 which is r + w.
If I wanted on the system to be able to execute the
program:
$ chmod 711 helloworld.bin
$ ls -l helloworld.bin
-rwx--x--x 1 gparmer gparmer 41488 Mar 22 08:52 helloworld.bin
$ ./helloworld.bin
hello world!
Recall, 7 = 4 + 2 + 1 which is
r + w + x. We can see
that we’ve given everyone on the system execute permissions,
so the file is again about to execute! Any other user on the
system would also be able to execute the program.
Questions:
mkdir), and seeing what accesses cannot
be made if permissions are removed.chmod to concretely understand when you can
access each other’s files. Remember, you can find out where
your files are with pwd, what groups you’re in
with groups. Which files can another user can
access at that path with sufficient permissions? You can use
the /tmp/ directory to store files that you can
both mutually access, permissions allowing.We’ve seen that each file and directory is associated
with a user, and with a group. Which means that, of course,
we can change those settings! Two commands (and their
corresponding system call: chown):
chown <user> <file/directory> -
change owner of the file/directory to the userchgrp <group> <file/directory>
- change group of the file to the groupUNIX limits who is able to execute these operations.
root is
allowed to chown a file or directory.
root is special in that they are able to change
the permissions of any file/directory, and most of the most
sensitive files on the system are owned by root. For
example, the devices (think, hard-drives holding the
filesystem, monitors, and network devices) are only
accessible by root (check out
/dev/*).root) is allowed to change the group of a file,
and only to a group that user is in (recall
groups).We can see this:
$ chown root helloworld.bin
chown: changing ownership of 'helloworld.bin': Operation not permitted
Question:
We’ve seen that a lot of UNIX security centers around
users, groups, and the corresponding permissions associated
with files and directories. When a process is
forked, it inherits the user and group
of the parent process. Thus, when you’re typing at a shell,
the shell process is what really is executing as
your user id, and when you create programs, they inherit
your user id. These are the core of the security primitives,
so lets dive into a little more detail.
If every process has a user and group, we must be able to access them, right? There are two basic system calls for retrieving user and group information within a program.
uid_t getuid(void) - Returns the real user
id of the calling process.gid_t getgid(void) - Returns the real group
id of the calling process.Let’s look at an example program.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char * argv[])
{
uid_t uid;
gid_t gid;
uid = getuid();
gid = getgid();
printf("uid=%d gid=%d\n", uid, gid);
return 0;
}Program output:
uid=1000 gid=1000
Every user has different identifiers as does each group. If another user where to run the same program, they’d get a different value.
Question:
chmod to make sure that they
can! What do you observe?whoami print out a human-readable
string? Use strace whoami to come up with
a strong guess.groups (investigate with
strace groups)?Interestingly, whomever executes the program will have their uid/gid printed out; it won’t use the uid/gid of the owner of the program. Remember, that we inherit the uid/gid on fork, thus when the shell runs the program, it uses our shell’s (thus our) uid/gid.
If the UNIX programmatic APIs provide access to user and
group identifiers (i.e. integers), how is it that
we can see strings corresponding to these numeric values?
The mapping of the identifiers to strings are defined in two
places. The first is a file called /etc/passwd
which manages all the users of the system. Here is the
/etc/passwd entry on my local system (the
server uses another means to store user info):
$ cat /etc/passwd | grep gparmer
gparmer:x:1000:1000:Gabe Parmer,,,:/home/ycombinator:/bin/bash
| | | | | | |
V | V | V V V
user | user id | human-readable id home directory shell to use upon login
V V
password group id
We can lookup our username and associate it with a
numerical user id, and a group id for that user. These
numbers are what the system uses to track users, but UNIX
nicely converts these numbers into names for our
convenience. The file additionally holds the path to the
user’s home directory, and their shell. These are used upon
the user logging in: the login logic will switch to the
user’s home directory, and execute their shell. The
password entry is deprecated. The translation
between userid and username is in the password file. The
translation between groupid and group name is in the group
file, /etc/group. Here is the entry for the
administrators on the class’ server:
$ grep CS_Admins /etc/group
CS_Admins:x:1004:aaviv,timwood,gparmer,...
There you can see that the users aaviv,
timwood, and gparmer are all in
the group of administrators.
All of this information, including identifiers, and
human-readable strings can be retrieved with
id.
$ id
uid=1000(gparmer) gid=1000(gparmer) groups=1000(gparmer),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),120(lpadmin),131(lxd),132(sambashare)
Question: Take a moment to explore these files and the commands. See what groups you are in.
Systems that support multiple users are quite complex. They have services used by many users, but that must be protected from the potentially accidental or malicious actions of users. These programs require more privileges than a user to do their job. A few examples:
ssh and
clients connecting to webservers. The logs are stored in
files (often in /var/log/), and users must
not be able to directly modify (and potentially
corrupt) the logs. But they must be able to append to the
logs. How can we support this contradiction when a logging
process will run with our user id when we execute it!?at and cron enable exactly this!
They require adding entries into system files
(e.g. crontab) to record the actions, but one
user must not corrupt the entires of another user. Again, we
want to both be able to add entries to the files, but cannot
modify the file!We already know one way to solve this problem. We can write a service, and have clients request (e.g. via IPC and domain sockets) it perform actions on our behalf. So long as a service runs as a different user, its files will be in accessible from clients.
However, there is another way to solve the above
contradictions. We really want a program to run with a
different user’s permissions – those that go beyond
our own. We want the logger to run with the
permissions of the logging user, and we want
cron to run with the permissions of the
cron service. This is possible with the
set-uid-bits (and set-gid-bits).
The set-*-bits can be seen in the
man chmod manual page. We’re used to
permissions being three digits (for the owner, the group,
and everyone else), but it can be specified as four
digits. The most significant digit can be set on executable
binaries, and will result in the binary executing with the
owner’s uid/gid.
That is, we previously assumed a permission string contained 3 octal digits, but really there are 4 octal digits. The missing octal digit is that for the set-bits. There are three possible set-bit settings and they are combined in the same way as other permissions:
4 - set-user-id: sets the
program’s effective user id to the owner of the program2 - set-group-id: sets the
program’s effective group id to the group of the
program1 - the sticky bit: which, when
set on a directory, prevents others from deleting files from
that directory. We won’t cover the stick bit.These bits are used in much the same way as the other permission modes. For example, we can change the permission of our get_uidgid program from before like so:
chmod 6751 get_uidgid
And we can interpet the octals like so:
set group
bits user | other
| | | |
V V V V
110 111 101 001
6 7 5 1
When we look at the ls -l output of the
program, the permission string reflects these settings with
an “s” in the execute part of the string for user and
group.
$ ls -l get_uidgid
-rwsr-s--x 1 aviv scs 8778 Mar 30 16:45 get_uidgid
With the set-bits, when the program runs, the capabilities of the program are effectively that of the owner and group of the program. However, the real user id and real group id remain that of the user who ran the program. This brings up the concept of effective vs. real identifiers:
The system calls getuid and
getgid return the real user and group
identifiers, but we can also retrieve the effective
user and group identifiers with
uid_t geteuid(void) - return the effective
user identifier for the calling process.gid_t getegid(void) - return the effective
group identifer for the calling process.We now have enough to test set-x-bit programs using a the following program that prints both the real and effective user/group identities.
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
uid_t uid,euid;
gid_t gid,egid;
uid = getuid();
gid = getgid();
printf(" uid=%d gid=%d\n", uid, gid);
euid = geteuid();
egid = getegid();
printf("euid=%d egid=%d\n", euid, egid);
return 0;
}As the owner of the file, after compilation, the permissions can be set to add set-user-id:
$ gcc 11/get_uidgid.c -o get_uidgid
$ chmod 6755 get_uidgid
$ ls -l get_uidgid
-rwsr-sr-x 1 gparmer gparmer 16880 2022-03-29 08:30 get_uidgid
Notice the s values to denote the
set-*-bits.
Lets test this set uid stuff!
$ cp get_uidgid /tmp/
$ chmod 4755 /tmp/get_uidgid
Now you should all have access to the file, with it being set as set-uid.
Question:
Compile and run the get_uidgid program.
What would you expect to happen? What do you observe? Recall
that you can use id to print out both of your
ids and groups.
Run the get_uidgid program I placed in
/tmp/. What is the expected output? What do you
observe?
What happens if we enable a program to create files
while using the set-*-id bits? Lets use the
following program that will create a file named after each
command line arguments.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
int
main(int argc, char * argv[])
{
int i, fd;
for(i = 0; i < argc; i++){
/* create an empty file */
if((fd = open(argv[i],O_CREAT,0666) > 0) > 0){
close(fd);
} else {
perror("open");
}
}
return 0;
}Run the following program in your home directory (can
also be found in create_files.c), passing as
arguments the names of files you want to create:
./create_files blah will create a file
blah. Check out the owner of the files
(ls -l blah). I’ve also set up the file as
setuid on the server in
/tmp/create_files. Try running the
following
$ /tmp/create_files /tmp/blah
$ ls -l /tmp/blah
...
$ /tmp/create_files ~/blah
...
What do you think the output for the first
... will be – who is the owner of the file? The
second? What are the outputs, and why?
The set-bits automatically start a program with the effective user or group id set; however, there are times when we might want to downgrade privilege or change permission dynamically. There are two system calls to change user/group settings of an executing process:
setuid(uid_t uid) - change the effective
user id of a process to uidsetgid(gid_t gid) - change the effective
group id of a proces to gidThe requirements of setuid() (for all users
other than root) is that the effective user id can be
changed to the real user id of the program or to an
effective user id as described in the set-bits. The
root user, however, can downgrade to any user id
and upgrade back to the root user. For setgid()
the user can chance the group id to any group the user
belongs to or as allowed by the set-group-id bit.
Now we can look at a program that downgrades and upgrades a program dynamically:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main(void)
{
uid_t uid, euid;
gid_t gid, egid;
uid_t saved_euid;
uid = getuid();
gid = getgid();
printf(" uid=%d gid=%d\n", uid, gid);
euid = geteuid();
egid = getegid();
printf("euid=%d egid=%d\n", euid, egid);
saved_euid=euid;
setuid(uid);
printf("---- setuid(%d) ----\n",uid);
uid = getuid();
gid = getgid();
printf(" uid=%d gid=%d\n", uid, gid);
euid = geteuid();
egid = getegid();
printf("euid=%d egid=%d\n", euid, egid);
setuid(saved_euid);
printf("---- setuid(%d) ----\n",saved_euid);
uid = getuid();
gid = getgid();
printf(" uid=%d gid=%d\n", uid, gid);
euid = geteuid();
egid = getegid();
printf("euid=%d egid=%d\n", euid, egid);
return 0;
}Question:
set-uid bit?We previously covered the general desire to have a log in the system of the events for different activities. For example, logging when each user logs in, when clients make web requests of an http server, etc…
We might support two high-level operations:
If we don’t care to censor the log, we can support the
second very easily by, for example, making the log file’s
group the log group, and giving read
(4) permissions to the log for anyone in that
group. Now any user in the loggroup can read
the log.
However, to support adding logging events to the log, we
might use the set-uid-bit to ensure that a log
program runs with the effictive user-id of the logger user.
Thus, it will be able to append to the log in a
straightforward manner.
One example of how you might want to use the identity
management APIs is when logging in. The process of logging
in requires that the login program access
sensitive files that include (encrypted) user passwords
(/etc/shadow). Understandably, users don’t
generally have access to these files. Thus, the act of
logging in requires programs running as root to
read in username and password. Only if they match the proper
values for a user, will the login logic then change uid/gid
to the corresponding user and group id, and execute a shell
for the user (as the user). See the figure for more
details.
sudo and su: Increasing Access
RightsThere are many times in systems that we want to increase the privileges of a human. For example, for security, it is often not possible to login as root. But it is certainly true that sometimes humans need to be able to run commands as root. For example, updating some software, installing new software, creating new users, and many other operations, all require root.
To control who is able to execute commands as
root, the sudo and su
programs enable restricted access to root privileges.
Specifically, sudo and su will
execute a command as a specified user or switch
to a specified user. By default, these commands execute
as the root user, and you need to know the root password or
have sudo access to use them.
We can see this as the case if I were to run the
get_euidegid program using sudo.
First notice that it is no longer set-group or set-user:
$ sudo ./get_euidegid
[sudo] password for gparmer:
uid=0 gid=0
euid=0 egid=0
After sudo authenticated me, the program’s effective and real user identification becomes 0 which is the uid/gid for the root user.
Who has permission to run sudo commands?
This is important because on many modern unix systems, like
ubuntu, there is no default root password.
Question:
root account with
a password instead of having sudo and
su?Instead certain users are deemed to be sudoers or
privileged users. These are set in a special configuraiton
file called the /etc/sudoers.
aviv@saddleback: lec-23-demo $ cat /etc/sudoers
cat: /etc/sudoers: Permission denied
aviv@saddleback: lec-23-demo $ sudo cat /etc/sudoers
#
# This file MUST be edited with the 'visudo' command as root.
#
# Please consider adding local content in /etc/sudoers.d/ instead of
# directly modifying this file.
#
# See the man page for details on how to write a sudoers file.
#
Defaults env_reset
Defaults mail_badpass
Defaults secure_path="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# Host alias specification
# User alias specification
# Cmnd alias specification
# User privilege specification
root ALL=(ALL:ALL) ALL
# Members of the admin group may gain root privileges
%admin ALL=(ALL) ALL
# Allow members of group sudo to execute any command
%sudo ALL=(ALL:ALL) ALL
# See sudoers(5) for more information on "#include" directives:
#includedir /etc/sudoers.d
Notice that only root has access to read this file, and
since I am a sudoer on the system I can get
access to it. If you look carefully, you can perform a basic
parse of the settings. The root user has full
sudo permissions, and other sudoer’s are
determine based on group membership. Users in the
sudo or admin group may run
commands as root, and I am a member of the
sudo group:
$ id
uid=1000(gparmer) gid=1000(gparmer) groups=1000(gparmer),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),120(lpadmin),131(lxd),132(sambashare)
Question:
sudo is implemented?
What mechanisms from this week’s lecture is it using to
provide the requested functionality?What if we implemented an assignment submission program
that copied a student’s directory into the instructor’s home
directory? For example, if you wanted to submit the
directory of code at my_hw/ for homework
HW1,
$ ./3410_submit HW1 my_hw
This program, run by you, has your user and group permissions, but it is able to take your submission and copy/save those submissions to my home directory, with my permissions at a location where you do not have access to write. How is that possible?
Naively, we’d have a problem:
Yikes.
Question:
This material adapted from Prof. Aviv’s material.
It is an unfortunate truth of security that all programs have faults because humans have faults — human’s write programs. As such, we will take some time to understand the kinds of mistakes you, me, and all programmers may make that can lead to security violations.
This is a broad topic area, and we only have a small amount of time to talk about it. We will focus on three classes of attack that are very common for the kinds of programs we’ve been writing in this course.
In isolation, each of these attacks can just make a
program misbehave; however, things get interesting when an
attack can lead to privilege escalation. Privilege
escalation enables a user to maliciously access resources
(e.g. files) normally unavailable to the user, by exploiting
a program with heightened privilege (e.g. due to
setuid bits, or explicit id upgrades) to
perform arbitrary tasks. The ultimate goal of an attacker is
to achieve privilege escalation to root, which
essentially gives them access to all system resources.
You might think that the kernel has more privilege than the
rootuser. However, therootuser is given access to the/dev/mempseudo-device that is “all of memory” including all of the kernel’s memory, androothas the ability to insert dynamic libraries (called “kernel modules”) into the kernel to dynamically update kernel code! Beingroottruly is a super-power!
Each of these topics are quite nuanced, and the hope is to give you a general overview so you can explore the topic more on your own.
We’ve been using path lookup throughout the class. Perhaps the best example is when we are in the shell and type a command:
$ cat Makefile
BINS=$(patsubst %.c,%,$(wildcard *.c))
all: $(BINS)
%: %.c
gcc -o $@ $<
clean:
rm -rf $(BINS)
The command cat is run, but the program that
is actually cat’s the file exists in a different place in
the file system. We can find that location using the
which command:
$ which cat
/bin/cat
So when we type cat, the shell look for the
cat binary, finds it in /bin/cat,
and executes that. Recall when we discussed environment
variables that binaries are found by looking in all of the
directories in the PATH enviroment
variable.
$ echo $PATH
/home/gparmer/.local/bin/:/home/gparmer/.local/bin:/home/gparmer/.cargo/bin:/home/gparmer/.local/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/gparmer/local/bin:/usr/racket/bin/:/home/gparmer/local/bin:/usr/racket/bin/
Lets look at that by converting the :, which
separate the directories, with newlines:
$ echo $PATH | tr : "\n"
/home/gparmer/.local/bin/
/home/gparmer/.local/bin
/home/gparmer/.cargo/bin
/home/gparmer/.local/bin/
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/usr/games
/usr/local/games
/snap/bin
/home/gparmer/local/bin
/usr/racket/bin/
/home/gparmer/local/bin
/usr/racket/bin/
Each of the directories listed is searched, in order, until the command’s program is found.
Environment variables are global variables set
across programs that provide information about the current
environment. The PATH environment variable is a
perfect example of this, and the customizability of the
environment variables. If you look at the directories along
my path, I have a bin directory Win my home directory so I
can load custom binaries. The fact that I can customize the
PATH in this way can lead to some interesting
security situations.
system()To help this conversation, we need to introduce two
library functions that work much like execvp()
with a fork(), like we’ve done all along, but
more compact. Here’s an abridged description in the manual
page:
NAME
system - execute a shell command
SYNOPSIS
#include <stdlib.h>
int system(const char *command);
DESCRIPTION
system() executes a command specified in command by calling
/bin/sh -c command, and returns after the command has been
completed. During execution of the command, SIGCHLD will be
blocked, and SIGINT and SIGQUIT will be ignored.
That is, the system() function will run an
arbitrary shell command, inheriting all of the file
descriptors of the caller of system. Let’s look
at a very simple example, a “hello world” that uses two
commands.
#include <stdio.h>
#include <stdlib.h>
int
main(void)
{
/*
* Execute `cat`. It inherits the stdin/stdout/stderr
* of the current process.
*/
system("cat");
return 0;
}$ echo "Hello World" | ./system_cat
Hello World
The system_cat program runs cat with
system(), thus it will print whatever it reads
from stdin to the screen (recall: cat
essentially echos stdin to stdout). It turns out, that this
program, despite its simplicity, actually has a relatively
bad security flaw. Let’s quickly consider what might happen
if we were to change the PATH value to
include our local directory:
$ export PATH=.:$PATH
$ echo $PATH
./:/home/gparmer/.local/bin/:/home/gparmer/.local/bin:/home/gparmer/.cargo/bin:/home/gparmer/.local/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/gparmer/local/bin:/usr/racket/bin/:/home/gparmer/local/bin:/usr/racket/bin/
$ echo $PATH | tr : "\n" | head -n 2
./
/home/gparmer/.local/bin
The export builtin command in our shell
updates an environment variable. We see that at the start of
the PATH variable is now the current directory!
system (and execvp) will look for
the program in the current directory before the
others. So now the local directory is on the path, thus if I
were to create a program named cat, then that cat would run
instead of the one you would expect. For example, here is
such a program (in fake_cat.c):
#include <stdlib.h>
int
main(void)
{
system("echo 'Goodbye World'");
return 0;
}This is our imposter program! In this case it is
innocuous, but you can imagine it being more nefarious. To
do this, we’ll make a new cat command
in the current directory.
$ echo "Hello World" | ./system_cat
$ cp fake_cat cat
$ echo "Hello World" | ./system_cat
Goodbye World
This is not just a problem with the system()
command, but also execvp(), which will also
look up commands along the path.
#include <unistd.h>
int
main(void)
{
char * args[] = { "cat", NULL };
execvp(args[0], args);
return 0;
}$ rm ./cat
$ echo "Hello World" | ./execvp_cat
Hello World
$ cp fake_cat cat
$ echo "Hello World" | ./execvp_cat
Goodbye World
Question:
How do we fix this? There are two solutions:
setenv()You can actually control the current PATH
setting during execution. To do this you can set the
enviorment variables using setenv() and
getenv()
#include <stdlib.h>
#include <unistd.h>
int
main(void)
{
/* ensure the enviorment only has the path we want and overwrite */
setenv("PATH","/bin",1);
char * args[] = { "cat", NULL };
execvp(args[0], args);
return 0;
}$ rm ./cat
$ echo "Hello World" | ./setenv_cat
Hello World
$ cp fake_cat cat
$ echo "Hello World" | ./setenv_cat
Hello World
Success! Because we used a controlled and known
PATH, we can avoid the attacker-chosen
cat from executing.
The previous examples were bad as we made the
program execute different than the programmer intended. But
we didn’t necessarily see the risk of privilege
escalation yet. So let’s see what happens when we
introduce set-uid to the mix. This time, I’ve
set the program system_cat to be set-user-id in a place that
others can run it:
$ chmod 4775 system_cat
$ ls -l system_cat
-rwsrwxr-x 1 gparmer gparmer 16704 Apr 4 13:01 system_cat
Now lets imagine there is another user on the system
hacker-mchackyface, and as that user we update
the PATH…
$ export PATH=.:$PATH
…and do something this time that is not so innocuous.
Lets run a shell after changing our real user id to that
of gparmer. Yikes.
#include <stdlib.h>
int
main(void)
{
char * args[]={"/bin/sh",NULL};
/* set our real uid to our effective uid */
setreuid(geteuid(),geteuid());
execvp(args[0],args);
return 0;
}This time when we run the system_cat program, it will run bash as the set-user-id user. Now we’ve just escalated privilege.
$ whoami
hacker-mchackyface
$ ./system_cat
$ whoami <---- new shell!
gparmer
$
Uh oh. Now user hacker-mchackyface has
escalated their privilege to that of user
gparmer!
Path Attacks Summary: Multiple UNIX APIs
rely on the PATH environment variable to help
us find a program we want to execute. The attacker can
modify PATH by adding their attack path, and
ensure that a program with the same name as the target is in
that path. This will lead to the program executing the
attacker’s code! Should this attack exist in a program that
is set-uid, the attacker can now execute the
program at the higher privilege.
LD_PRELOAD Attacks?We might be clever, and think that we can use
LD_PRELOAD to take over some of the library
APIs with our own code! Fortunately, this doesn’t work. The
dynamic linker (ld.so) will only use the
LD_PRELOAD environment variable if the real
user id is the same as the effective user id! A simple
solution to a threatening problem.
Now, lets consider a situation where you use
system() in a safer way. You call all programs
with absolute path, or update and control PATH
before using it. Unfortunately, even that is not
enough. You must also consider injection
attacks which is when the attacker can inject code that
will run. In our case, the injected code will be bash.
Consider this program which prompts the user for a file to cat out.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(void)
{
char cmd[1024] = "/bin/cat ./"; /* will append to this string */
char input[40];
printf("What input do you want to 'cat' (choose from below)\n");
system("/bin/ls"); /* show available files */
printf("input > ");
fflush(stdout); /* force stdout to print */
scanf("%s",input);/* read input */
strcat(cmd,input); /* create the command */
printf("Executing: %s\n", cmd);
fflush(stdout); /* force stdout to print */
system(cmd);
return 0;
}If we were to run this program, it hopefully does what you’d expect:
$ ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat execvp_cat execvp_cat.c fake_cat fake_cat.c inject_system inject_system.c Makefile setenv_cat setenv_cat.c system_cat system_cat.c
input > Makefile
Executing: /bin/cat ./Makefile
BINS=$(patsubst %.c,%,$(wildcard *.c))
all: $(BINS)
%: %.c
gcc -o $@ $<
clean:
rm -rf $(BINS)
Ok, now consider if we were to provide input that doesn’t fit this model. What if we were to provide shell commands as input instead of simply the file name?
$ ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat execvp_cat execvp_cat.c fake_cat fake_cat.c inject_system inject_system.c Makefile setenv_cat setenv_cat.c system_cat system_cat.c
input > ;echo
Executing: /bin/cat ./;echo
/bin/cat: ./: Is a directory
$
The input we provided was ;echo the
semi-colon closes off a bash command allowing a new one to
start. Specifically, the programmer thought that the command
to be executed would only take the shape
/bin/cat ./<file> for some user-specified
<file>. Instead, we make the command
execute /bin/cat ./;echo! Notice that there is
an extra new line printed, that was the echo printing. This
is pretty innocuous; can we get this program to run
something more interesting?
We still have the cat program we wrote that
prints “Goodbye World” and the PATH is set up
to look in the local directory. Setting that up, we get the
following result:
$ ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat execvp_cat execvp_cat.c fake_cat fake_cat.c inject_system inject_system.c Makefile setenv_cat setenv_cat.c system_cat system_cat.c
input > ;cat
Executing: /bin/cat ./;cat
/bin/cat: ./: Is a directory
Goodbye World
At this point, we own this program (pwn in
the parlance) – we can essentially get it to execute
whatever we want. If the program were set-uid,
we could escalate our privilege.
Injection Attacks Summary: When the user
can enter commands that are passed to functions that will
execute them, we must be exceedingly careful about
sanitizing the input. For shell commands, this
often means avoiding ; and | in
inputs. Interestingly, one of the most common attacks on
webpages is SQL injection. When user input
is used to create SQL commands, an attacker might add
; followed by additional queries. For example,
we might append ; DROP TABLE USERS to
destructively harm the database!
Two attacks down, moving onto the third. Let’s now assume that the programmer has wised up to the two previous attacks. Now we are using full paths to executables and we are scrubbing the input to remove any potential bash commands prior to execution. The result is the following program:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(void)
{
char cmd[1024] = "/bin/cat ./"; /* will append to this string */
char input[40];
int i;
printf("What input do you want to 'cat' (choose from below)\n");
system("/bin/ls"); /* show available files */
printf("input > ");
fflush(stdout); /* force stdout to print */
scanf("%s",input);/* read input */
/* clean input before passing to /bin/cat */
for (i = 0; i < 40; i++) {
if (input[i] == ';' || input[i] == '|' || input[i] == '$' || input[i] == '&') {
input[i] = '\0'; /* change all ;,|,$,& to a NULL */
}
}
/* concatenate the two strings */
strcat(cmd,input);
printf("Executing: %s\n", cmd);
fflush(stdout);
system(cmd);
return 0;
}$ ./overflow_system
What input do you want to 'cat' (choose from below)
cat execvp_cat inject_system inject_system.c~ Makefile overflow_system overflow_system.c~ run_foo.c setenv_cat setenv_cat.c~ system_cat #system-ex.c#
cat.c execvp_cat.c inject_system.c input.txt mal-lib overflow_system.c run_foo run_foo.o setenv_cat.c shared-lib system_cat.c
input > ;cat
Executing: /bin/cat ./
/bin/cat: ./: Is a directory
This time, no dice, but the jig is not up yet. There is an overflow attack.
Question:
To increase the input side programatically, we’ll use a
small trick to print a bunch of As:
$ python -c "print 'A'*10"
AAAAAAAAAA
$ python -c "print 'A'*30"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
$ python -c "print 'A'*50"
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
This produces strings of varying lengths 10,
30, and 50. Those strings can be
sent to the target program using a pipe. And we can see the
result:
$ python -c "print 'A'*10" | ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat execvp_cat execvp_cat.c fake_cat fake_cat.c inject_system inject_system.c Makefile overflow_system.c setenv_cat setenv_cat.c system_cat system_cat.c
input > Executing: /bin/cat ./AAAAAAAAAA
/bin/cat: ./AAAAAAAAAA: No such file or directory
$ python -c "print 'A'*30" | ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat execvp_cat execvp_cat.c fake_cat fake_cat.c inject_system inject_system.c Makefile overflow_system.c setenv_cat setenv_cat.c system_cat system_cat.c
input > Executing: /bin/cat ./AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
/bin/cat: ./AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA: No such file or directory
$ python -c "print 'A'*50" | ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat execvp_cat execvp_cat.c fake_cat fake_cat.c inject_system inject_system.c Makefile overflow_system.c setenv_cat setenv_cat.c system_cat system_cat.c
input > Executing: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
sh: 1: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA: not found
Something changes at when the input string is
50 bytes long. We overflow the buffer for the
input.
Question:
/bin/cat.Recall that the input buffer is only 40
bytes and size, and it is placed adjacent to the cmd
buffer:
char cmd[1024] = "/bin/cat ./"; /* will append to this string */
char input[40];
When the input buffer overflows, we begin to
write As to cmd which replaces
/bin/cat with whatever string is written at
that location! Yikes. Finally, we concatenate
cmd with input, resulting in a long string of
As for the command being executed by
system().
How do we leverage this error to pwn this program? The
program is trying to execute a command that is
AAA... and we can control the
PATH. Lets get dangerous. Let’s create such a
program named AAA..., so that it is executed
when we overflow!
$ cp cat AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
$ ./AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Goodbye World
Now when we execute the overflow, we get our desired “Goodbye World” result:
python -c "print 'A'*50" | ./inject_system
What input do you want to 'cat' (choose from below)
01_lecture.md cat fake_cat inject_system.c setenv_cat system_cat.c
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA execvp_cat fake_cat.c Makefile setenv_cat.c
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA execvp_cat.c inject_system overflow_system.c system_cat
input > Executing: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Goodbye World
Uh-oh: we’re executing attacker-controlled code again!
Question:
The most direct way is to always “bound check” accesses
to strings, just like Java and Python do. For example,
always use library functions like strncp() or
strncat() that we can use to prevent
input from overflowing, but that is even
sometimes not sufficient. This can all get quite complex.
This is an introduction to this issue, and not the final
say.
Find the slides.
Find the slides.
C gives you quite a bit of freedom in how you write your code. Conventions are necessary in C because they bring order to the unrestricted freedom that allows code that is too complicated to debug, too clever to understand, and completely unmaintainable. Conventions are a set of rules that we follow that inherently limit the way in which we write code. The intention is to put guard rails onto our development process and to only deviate from the conventions in exceptional cases. An important aspect of this is that once you understand the conventions, it is drastically easier to read the code because you only have to consider how convention-based code is written. This means that there is a significant importance to uniformly applying conventions. Because of this, it is less important if one likes the conventions, than it is to adhere to them for uniformity.
Put another way, it is easy for any programmer to code themself into a corner where the program is more complex than our brains can handle. Being an experienced programmer is much less about being able to handle more complexity, and much more about understanding that you need to spend significant development time simplifying your code. Conventions are a set of codified simplifications, in some sense. They provide your a set of rules (that you don’t need to think about, you just do) to simplify the structure of your code.
OK, onward to conventions!
There are many potential naming conventions, for example, camel case, snake case, and hungarian notiation. Some are antiquated (hungarian), but most are a matter of taste and convention.
Convention. Most C uses lower-case
snake_casefor variables, types, and functions, capitalSNAKE_CASEfor constants and macros.
Stick to this convention. A new convention will likely not feel “right” because it is new and different. Remember that they are subjective, but the aim to apply them uniformly.
For names that are visible across .c files
(i.e. functions in a .h file), use names that
are the shortest they can be to convey the functionality.
Within functions, names can be short, and here there are a
few conventions as well:
ret is a variable that stores a return
value (i.e. a value that will later be used as in
return ret;).i, j, k,
l are iterator variables.When you create a global variable, type, or function,
you’re creating a symbol that can be accessed for any
.c file. This has a significant downside: if
you choose a name that is already used in a library, or that
is common, you might get a linker error along the lines
of:
multiple definition of `foo'; /tmp/ccrvA8in.o:a.c:(.text+0x0): first defined here
...
collect2: error: ld returned 1 exit status
This means that the naming choices each application and library makes can easily cause such a problem.
Convention. Thus, the convention is that you should “namespace” your functions and variables by pre-pending them with a functionality-specific prefix. For example, a key-value store might have all of its types, variables, and functions prefixed with
kv_, for example,kv_get.
C does not have classes and the rules therein to support
per-class/object encapsulation. Instead, C views each
.c file as a unit of visibility. Functions and
global variables (known as “symbols”) can be hooked up with
references to them in other objects by the linker. However,
it is possible to write code in a .c file that
is not visible outside of that file. In contrast to
the linking error above emphasizing good naming, if a symbol
is marked as static, no other .c
file will be able to reference that symbol.
static int global_var;
static void foo(void) { }No other .c file can define code that will
be able to access global_var nor call
foo. You can see the list of all symbols within
a .c file that are accessible outside of it by
compiling into a .o file (using
gcc -c ...), and
nm -g --defined-only x.o | awk '{print $3}'
(replace x with your object file name).
Convention. All functions and variables that are not part of the public interface with which to interact with the
.cfile should bestaticso that they cannot be used from other objects.
This convention enables .c files to support
encapsulation. It is also important to support encapsulation
at compile time (which is before linking). We want our
compiler to tell us if we try and access a symbol we
shouldn’t be able to access. Header files are
#included into .c files, thus
providing access to the symbol names and types, so that the
.c file can compile while referencing symbols
in another.
Convention. The focus of header files should be to provide the types for the data and functions of a public interface, and nothing more.
When we include a file, it just copies the file at the
point of the #include. Thus, if the same header
is included separately into two .c files, then
it effectively replicates the header’s contents into the two
.o files. This can cause problems if the two
.o files are linked together. An example:
a.c:
#include "c.h"b.c:
#include "c.h"
int main(void) { return 0; }c.h:
int foo(void) { return -1; }After compiling these files, we get
$ gcc -c a.c -o a.o
$ gcc -c b.c -o b.o
$ gcc a.o b.o -o test
/usr/bin/ld: b.o: in function `foo':
b.c:(.text+0x0): multiple definition of `foo'; a.o:a.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status
Because foo was replicated into each
.o file, when they were linked together the
linker said “I see two foos!”. Note that we
would have avoided this if we put only the function
declaration (i.e. the type information) in the
.h file (e.g. int foo(void);).
Convention. Never put global data in a
.hfile. Only put type information (e.g. function declarations) in.hfiles27.
You must include the header files that provide the functionality that you rely on and use. This sounds obvious, but it is very easy to mess this up. A couple of examples:
Example 1: Implementing a library. You’re
implementing a library, and you have a .h file
for it that exposes the types of the library’s functions.
You test your library, and it works wonderfully – victory!
However, because it is a library, it might be used by other
people’s code, and all they are going to do is
#include <your_lib.h>, and expect
everything to work.
What if the following happened? Your
your_lib.h file does this:
/* ... */
size_t my_function(struct my_type *t);
/* ... */In your test file, you do this:
#include <stddef.h>
#include <your_lib.h>
/* ... */Your tests of your library work wonderfully! But if the user of your library simply does:
#include <your_lib.h>
/* ... */The user will get a compiler error that says it doesn’t
understand what size_t is! The core problem is
that your library’s header file didn’t include all of the
headers it relies on! Instead, it should updated to look
like so:
#include <stddef.h> /* we require `size_t` */
/* ... */
size_t my_function(struct my_type *t);
/* ... */A good way to defensively code to try to minimize the chance that this error happens, you should include your header files before the rest of the header files. The only real solution is to be very careful, and always include what headers are necessary for the functionality you depend on.
Example 2: implementing a program or test. A
.c file, lets say my_prog.c,
includes some header file, head.h. It compiles
just fine. But if my_prog.c relies on some
header file – lets say dep.h – that is only
included itself in the head.h file, then if
head.h is updated and removes the include, it
will break any program that depends on it! In this case,
my_prog.c should explicitly include
dep.h: include the headers that are required
for your functionality!
How do you know which headers to include? If
you’re using a system function, the man page
for it is explicit about which headers provide the necessary
functionality. If you’re implementing a library in the
class, we are explicit about what headers are required.
Syntactic nesting, or depth often significantly increase the cognitive complexity in C28. Two examples of this:
Depth through indentation nesting. Each
additional nesting level (each nested set of
{}) adds another condition that you must keep
in your mind29 when
understanding the code. This includes the conditions, or
looping conditions associated with the nesting level. This
is a lot to keep in your head, and makes developing
and reading code quite a bit more difficult.
What can you do to avoid this nesting? First, return early for errors. Instead of the following code…
if (a) {
/* normal code */
} else {
/* error code */
return;
}…do the following…
if (!a) {
/* error code */
return;
}
/* normal code */Though seemingly trivial, it enables you to separate your understanding of error conditions from the complexity of the following code. Try to perform as much error checking as soon as it possibly can be done so that you can switch into “normal case” thinking.
Second, if you nest deeper than four levels of brackets, then pull code out into appropriately-named functions. Long functions aren’t necessarily bad, but complex functions are.
Convention.
- Separate error handling where possible, and put it as close to the top of the function as possible.
- If you nest brackets deeper than 4 levels, pull logic out into separate, appropriately named functions.
I use indentation levels in C of eight visual spaces to make it visually clear when I’m indenting too much.
Depth through field nesting. It can be quite difficult to follow code that nests access to fields. For example:
a.b->c[d].e = f.g->h;Of course, this is fake code, but even if it wasn’t, keeping track of the types of each of the levels of increasing depth of field reference is nearly impossible. This will make writing the code fickle and challenging, and reading the code, needlessly complex.
The main means we have to remove this complexity is to
Create an set of typed variables
struct *b = &a.b;
struct *c[] = b->c;
struct *g = &f.g;
c[d].e = g->h;It is up to you to decide where it is right to provide an “intermediate” type for a certain depth in the fields, and when to directly dive into a few fields.
If any the structs (or sub-structs, or arrays) represents an API layer, then you can call a function that performs the necessary operation on that part of the data-structure. This both provide abstraction over the data access, and documents the types.
Convention. When faced with walking through nested composite data-structures, use intermediate variables to make types explicit, or functions to abstract operations.
C is an imperative programming language as it came before Object Orientation (OO) took hold. That said, the OO concept of encapsulation behind an abstract data type is a universal requirement of programming in complex environments. Though C doesn’t support this explicitly, conventions can provide much of the same benefit. Such an abstraction requires
If we wanted to provide a key-value datatype,
kv, we’d provide these in C by
struct kv for the datatype in
which the rest of the data is contained,kv.h) a set of
functions with names all prefixed with kv_,
that take the kv to operate on as the first
argument
(e.g. void *kv_get(struct kv *kv, int key);) –
this first argument acts as “this” in Java,
andstruct kv *kv_alloc(...) and
void kv_free(struct kv *kv)30.C can support polymorphism, inheritance, dependency injection and composition as well, but we’ll leave that for a future discussion.
Convention.
- Meticulously name your header files, your structure, and the methods to act on the datatype to associate them with each other.
- Pass the object to be acted on as the first argument to each of the “methods” to act as the C equivalent of “
this”.- Provide allocation and deallocation functions for the datatype.
With this, we encourage the client (user of the API) to think of the datatype as encapsulated.