+compiling:
+
+ Both glibc and sun's libc require 64-bit atomic operations, first found in
+ the Pentium Pro. The suggested method of compiling on 32-bit x86 is to set
+ CC='gcc -march=i586'.
+
+headers:
+
+ In order to avoid duplicating OpenSolaris-specifc headers, most extensions
+ define any constants/structs in accompanying private/"P" headers. For
+ example, zone_* is implemented in zone.c and constants/structs are defined in
+ zoneP.h.
+
+auxiliary vector (auxv_t):
+
+ Proper OpenSolaris does not support statically-linked executables (i.e. via
+ gcc -static). However, glibc does, but with certain restrictions. The kernel
+ only builds the auxv_t if the elf file is not of type ET_EXEC or if the
+ PT_INTERP program header exists. This means that dynamically-linked
+ executables and libaries get an auxv_t while statically-linked executables
+ don't. This means that statically-linked executables won't see PT_TLS, which
+ is needed for __thread support. We can test for the SHARED macro for libc
+ library code, but in general, __thread will not work for statically-linked
+ executables.
+
+ In order to fix this, it should be a matter of changing the kernel to
+ unconditionally supply the auxv_t.
+
scheduling:
The OpenSolaris kernel allows for loadable schedule classes. A scheduling
a class id with a posix scheduler (i.e. SCHED_*). The only exception is
SCHED_SYS, which is guaranteed to have cid == 0.
+ The following schedulers are defined:
+
+ SCHED_* | Name | Min Prio | Max Prio |
+ -----------------------------------------------
+ SCHED_OTHER | TS | -60 | 60 |
+ SCHED_FIFO | RT | 0 | 59 |
+ SCHED_RR | RT | 0 | 59 |
+ SCHED_SYS | SYS | 60 | 99 |
+ SCHED_IA | IA | -60 | 60 |
+ SCHED_FSS | FSS | -60 | 60 |
+ SCHED_FX | FX | 0 | 60 |
+
+ Internally the maximum and minimum are stored in a short int. Further, for
+ mutexes, the ceiling is stored in a uint8_t. This means that it can be
+ assumed that priorities must be between -128 and 127.
+
+privileges:
+
+ Each process has a set of privileges, represented by a prpriv_t. This struct
+ contains a header, followed by a number of priv_chunk_t blocks (the priv
+ sets), and finally followed by a number of priv_info_t blocks (per process
+ additional info).
+
threads:
The sun libpthread/libthread implementation assumes a 1:1 mapping between
mutex_waiters (8-bits): This is set to 1 when there is another thread
waiting and 0 when there are no other waiters.
- mutex_spinners (8-bits): This byte is apparently unused. We reuse it (as
- mutex_cond_waiters) to mark a mutex that has been unlocked by cond_wait
- (hence it will be reacquired later).
+ mutex_spinners (8-bits): This byte is apparently unused.
+
+ mutex_ownerpid (32-bits): Set to the mutex owner's process pid when the
+ mutex is shared.
+
+ The data field (aka mutex_owner) is used by sun libc to store a pointer to
+ the thread-descriptor of the owning thread. We split this 64-bit field into
+ two fields:
- mutex_ownerpid (32-bits):
+ mutex_owner (32-bits): The lwpid of the owning thread.
- The mutex_owner field is a 64-bit field that stores a pointer to the owning
- thread descriptor. It is not checked in the kernel and is only cleared by
- the kernel during cleanup.
+ mutex_cond_waiters (32-bits): An in-use counter that is incremented when
+ waiting on a condition and decremented when we return (or are cancelled).
+
+ The kernel only touches the data field when it is cleared during cleanup for
+ certain mutex types.
The kernel does not handle recursive or error-checking mutexes.
The kernel does not set mutex_lockbyte for mutexes with the
LOCK_PRIO_INHERIT bit set.
+ The kernel does not use data.flag2. We use this to track the current priority
+ ceiling (mutex_real_ceiling) for LOCK_PRIO_PROTECT mutexes.
+
semaphore:
condition variable:
condition variable and 0 otherwise. The cond_waiters_user byte is not
used by the kernel.
- The only clock types supported are CLOCK_REALTIME and CLOCK_HIGHRES.
+ The only clock types supported by sun libc are CLOCK_REALTIME and
+ CLOCK_HIGHRES.
+
+ The data field is not used by the kernel.
reader-writer lock:
private implementation that used the embedded mutex and cv's would also work
correctly in the shared case.
- Three additional fields are included for tracking the owner (thread and
- process) of a reader-writer lock.
+ Our implementation adds three additional fields for tracking the owner (thread
+ and process) of a reader-writer lock.
[0] http://docs.sun.com/app/docs/doc/819-2243/rwlock-init-3c?a=view
+
+nsswitch:
+
+ nss_search
+
+ This is used to search a database given a key. Examples that use nss_search
+ include gethostbyname_r and _getauthattr.
+
+ nss_getent
+ nss_setent
+ nss_endent
+ nss_delete
+
+ These are used when for iterating over a database. nss_getent, nss_sent,
+ and nss_endent are used in gethostent, sethostent, and endhostent,
+ respectively. nss_delete is used to free resources used by the interation;
+ it usually directly follows a call to nss_endent.
+
+ _nss_XbyY_fgets
+
+ This function is used to parse a file directly, rather than going through
+ nsswitch.conf and its databases.
+
+syscalls:
+
+ Dealing with 64-bit returns in 32-bit code is tricky. For 32-bit x86, %eax
+ and %edx are not saved across function calls. Since syscalls always return
+ a 32-bit integer we always have to push/pop %eax across functions. However,
+ since there are very few 64-bit returning syscalls, we don't save %edx unless
+ we have a 64-bit returning syscall. The following is a list of 64-bit
+ returning syscalls:
+
+ getgid, getuid, getpid, forkx, pipe, lseek64
+
+ Currently, the only time we actually call functions is in the case of
+ cancellation points (we call pthread_async_enable/disable). lseek64 is the
+ only syscall listed above that is a cancellation point. To deal with this,
+ we define SYSCALL_64BIT_RETURN in lseek64.S, which triggers inclusion of
+ %edx saving.
+
+ Additionally, 64-bit returning syscalls set both %eax and %edx to -1 on
+ error. Similarily this behaviour is enabled by SYSCALL_64BIT_RETURN. Note
+ that getegid, geteuid, and getppid are special in that their libc
+ equivalents actually return 32-bit integers so we don't need to worry about
+ %edx on error. With forkx and pipe, it suffices to check only the lower
+ 32-bits (one of the pid/fd's returned) for -1. For lseek64 we do have to
+ check the full 64-bit return for -1.
+
+sysconf:
+
+ Many of the _SC_ sysconf values are obtained via the systemconf syscall. The
+ following is a table of mappings from _SC_ to _CONFIG_ values. The third
+ column lists the value returned by sysdeps/posix/sysconf.c.
+
+ _SC_CHILD_MAX _CONFIG_CHILD_MAX _get_child_max
+ _SC_CLK_TCK _CONFIG_CLK_TCK _getclktck
+ _SC_NGROUPS_MAX _CONFIG_NGROUPS NGROUPS_MAX
+ _SC_OPEN_MAX _CONFIG_OPEN_FILES __getdtablesize
+ _SC_PAGESIZE _CONFIG_PAGESIZE __getpagesize
+ _SC_XOPEN_VERSION _CONFIG_XOPEN_VER _XOPEN_VERSION
+ _SC_STREAM_MAX _CONFIG_OPEN_FILES STREAM_MAX
+ _SC_NPROCESSORS_CONF _CONFIG_NPROC_CONF __get_nprocs_conf
+ _SC_NPROCESSORS_ONLN _CONFIG_NPROC_ONLN __get_nprocs
+ _SC_NPROCESSORS_MAX _CONFIG_NPROC_MAX
+ _SC_STACK_PROT _CONFIG_STACK_PROT
+ _SC_AIO_LISTIO_MAX _CONFIG_AIO_LISTIO_MAX AIO_LISTIO_MAX
+ _SC_AIO_MAX _CONFIG_AIO_MAX AIO_MAX
+ _SC_AIO_PRIO_DELTA_MAX _CONFIG_AIO_PRIO_DELTA_MAX AIO_PRIO_DELTA_MAX
+ _SC_DELAYTIMER_MAX _CONFIG_DELAYTIMER_MAX DELAYTIMER_MAX
+ _SC_MQ_OPEN_MAX _CONFIG_MQ_OPEN_MAX MQ_OPEN_MAX
+ _SC_MQ_PRIO_MAX _CONFIG_MQ_PRIO_MAX MQ_PRIO_MAX
+ _SC_RTSIG_MAX _CONFIG_RTSIG_MAX RTSIG_MAX
+ _SC_SEM_NSEMS_MAX _CONFIG_SEM_NSEMS_MAX SEM_NSEMS_MAX
+ _SC_SEM_VALUE_MAX _CONFIG_SEM_VALUE_MAX SEM_VALUE_MAX
+ _SC_SIGQUEUE_MAX _CONFIG_SIGQUEUE_MAX SIGQUEUE_MAX
+ _SC_SIGRT_MAX _CONFIG_SIGRT_MAX
+ _SC_SIGRT_MIN _CONFIG_SIGRT_MIN
+ _SC_TIMER_MAX _CONFIG_TIMER_MAX TIMER_MAX
+ _SC_PHYS_PAGES _CONFIG_PHYS_PAGES __get_phys_pages
+ _SC_AVPHYS_PAGES _CONFIG_AVPHYS_PAGES __get_avphys_pages
+ _SC_COHER_BLKSZ _CONFIG_COHERENCY
+ _SC_SPLIT_CACHE _CONFIG_SPLIT_CACHE
+ _SC_ICACHE_SZ _CONFIG_ICACHESZ
+ _SC_DCACHE_SZ _CONFIG_DCACHESZ
+ _SC_ICACHE_LINESZ _CONFIG_ICACHELINESZ
+ _SC_DCACHE_LINESZ _CONFIG_DCACHELINESZ
+ _SC_ICACHE_BLKSZ _CONFIG_ICACHEBLKSZ
+ _SC_DCACHE_BLKSZ _CONFIG_DCACHEBLKSZ
+ _SC_DCACHE_TBLKSZ _CONFIG_DCACHETBLKSZ
+ _SC_ICACHE_ASSOC _CONFIG_ICACHE_ASSOC
+ _SC_DCACHE_ASSOC _CONFIG_DCACHE_ASSOC
+ _SC_MAXPID _CONFIG_MAXPID
+ _SC_CPUID_MAX _CONFIG_CPUID_MAX
+ _SC_EPHID_MAX _CONFIG_EPHID_MAX
+ _SC_SYMLOOP_MAX _CONFIG_SYMLOOP_MAX SYMLOOP_MAX
+
+fgetattr, fsetattr, getattrat, setattrat:
+
+ The *attr calls are new to OpenSolaris and are similar to Linux's extended
+ attributes functions. They are implemented as openat(fd, attr_name, O_XATTR)
+ and then read/written via the respective syscall.