# Resizable arrays in optimal time and space relationship

### Dynamic programming - Wikipedia

Arroyuelo, D., Navarro, G.: Space-efficient construction of LZ-index. J.I., Rao, S.S.: Succinct indexes for strings, binary relations and multi-labeled trees. Demaine, E., Munro, J.I., Sedgewick, R.: Resizable arrays in optimal time and space. obvious way from its definitions is often not the best way to get a solution. A simple example of In this lecture we describe two linear-time algorithms finding the kth smallest out of an unsorted array of n elements. The costs 1, and the cost of resizing the array is the number of elements moved. Although an array name can be treated as a pointer at times, and array We illustrate the realloc function in the section Using the realloc Function to Resize an Array. This type of array needs to be mapped to the one-dimension address space illustrate the relationship between the concept of a two-dimensional array.

**Data structures: Array implementation of stacks**

We will use the realloc function to allocate additional space by a fixed increment amount. The code to implement this function is shown below: If the malloc function is unable to allocate memory, the first if statement will force the function to return NULL. An infinite loop is entered where the characters are processed one at a time. Within the while loop, a character is read in. If it is a carriage return, the loop is exited. Otherwise, the character is added to the current position within the buffer.

This block is sizeIncrement bytes larger than the old one. If it is unable to allocate memory, we free up the existing allocated memory and force the function to return NULL. Otherwise, currentPosition is adjusted to point to the right position within the new buffer and we assign the variable buffer to point to the newly allocated buffer.

The realloc function will not necessarily keep your existing memory in place, so you have to use the pointer it returns to figure out where your new, resized memory block is. We needed a separate variable, not buffer, in case the realloc was unable to allocate memory. This allows us to detect and handle the condition. We did not free buffer if realloc was successful because realloc will copy the original buffer to the new buffer and free up the old buffer.

The buffer has been extended four times, as indicated by the rectangle containing the input string. Memory allocation for getLine function The realloc function can also be used to decrease the amount of space used by a pointer. To illustrate its use, the trim function shown below will remove leading blanks in a string: The second while loop copies the remaining characters in the string to the beginning of the string.

It will evaluate to true until NUL is reached, which will evaluate to false. A zero is then added to terminate the string.

The memory in red is the old memory and should not be accessed. This makes the transfer of information more efficient since we are not passing the entire array and having to allocate memory in the stack for it. Unless there is something integral to the array to tell us its bounds, we need to pass the size information when we pass the array.

- Dynamic programming
- Data structure
- Hashed array tree

In the case of a string stored in an array, we can rely on the NUL termination character to tell us when we can stop processing the array. This can be a very expensive operation, and the necessity for it is one of the hash table's disadvantages. In fact, some naive methods for doing this, such as enlarging the table by one each time you add a new element, reduce performance so drastically as to make the hash table useless.

If in the end it contains n elements, then the total add operations performed for all the resizings is: Because the costs of the resizings form a geometric seriesthe total cost is O n.

But we also perform n operations to add the n elements in the first place, so the total time to add n elements with resizing is O nan amortized time of O 1 per element. On the other hand, some hash table implementations, notably in real-time systemscannot pay the price of enlarging the hash table all at once, because it may interrupt time-critical operations. One simple approach is to initially allocate the table with enough space for the expected number of elements and forbid the addition of too many elements.

Another useful but more memory-intensive technique is to perform the resizing gradually: Allocate the new hash table, but leave the old hash table and check both tables during lookups.

## Hash table

Each time an insertion is performed, add that element to the new table and also move k elements from the old table to the new table. When all elements are removed from the old table, deallocate it. Linear hashing is a hash table algorithm that permits incremental hash table expansion. It is implemented using a single hash table, but with two possible look-up functions. Another way to decrease the cost of table resizing is to choose a hash function in such a way that the hashes of most values do not change when the table is resized.

This approach, called consistent hashingis prevalent in disk-based and distributed hashes, where resizing is prohibitively costly. Ordered retrieval issue[ edit ] Hash tables store data in pseudo-random locations, so accessing the data in a sorted manner is a very time consuming operation. Other data structures such as self-balancing binary search trees generally operate more slowly since their lookup time is O log n and are rather more complex to implement than hash tables but maintain a sorted data structure at all times.

See a comparison of hash tables and self-balancing binary search trees. Problems with hash tables[ edit ] Although hash table lookups use constant time on average, the time spent can be significant. Evaluating a good hash function can be a slow operation. In particular, if simple array indexing can be used instead, this is usually faster. Hash tables in general exhibit poor locality of reference —that is, the data to be accessed is distributed seemingly at random in memory.

Because hash tables cause access patterns that jump around, this can trigger microprocessor cache misses that cause long delays. Compact data structures such as arrays, searched with linear searchmay be faster if the table is relatively small and keys are cheap to compare, such as with simple integer keys. According to Moore's Lawcache sizes are growing exponentially and so what is considered "small" may be increasing.

The optimal performance point varies from system to system; for example, a trial on Parrot shows that its hash tables outperform linear search in all but the most trivial cases one to three entries. More significantly, hash tables are more difficult and error-prone to write and use. Hash tables require the design of an effective hash function for each key type, which in many situations is more difficult and time-consuming to design and debug than the mere comparison function required for a self-balancing binary search tree.

In open-addressed hash tables it's even easier to create a poor hash function. Additionally, in some applications, a black hat with knowledge of the hash function may be able to supply information to a hash which creates worst-case behavior by causing excessive collisions, resulting in very poor performance i. In critical applications, either universal hashing can be used or a data structure with better worst-case guarantees may be preferable. Other hash table algorithms[ edit ] Extendible hashing and linear hashing are hash algorithms that are used in the context of database algorithms used for instance in index file structures, and even primary file organization for a database.

### Hashed array tree - Wikipedia

Generally, in order to make search scalable for large databases, the search time should be proportional log N or near constant, where N is the number of records to search. Log N searches can be implemented with tree structures, because the degree of fan out and the shortness of the tree relates to the number of steps needed to find a record, so the height of the tree is the maximum number of disc accesses it takes to find where a record is.

However, hash tables are also used, because the cost of a disk access can be counted in units of disc accesses, and often that unit is a block of data. Since a hash table can, in the best case, find a key with one or two accesses, a hash table index is regarded as generally faster when retrieving a collection of records during a join operation e.

Extendible hashing and linear hashing have certain similarities: In linear hashing, the traditional hash value is also masked with a bit mask, but if the resultant smaller hash value falls below a 'split' variable, the original hash value is masked with a bit mask of one bit greater length, making the resultant hash value address recently added blocks. The split variable ranges incrementally between 0 and the maximum current bit mask value e.

When the split variable reaches 4, the level increases by 1, so in the next round of the split variable, it will range between 0 to 7, and reset again when it reaches 8. This overflow location may be completely unrelated to the block going to be split pointed to by the split variable. However, over time, it is expected that given a good random hash function that distributes entries fairly evenly amongst all addressable blocks, the blocks that actually require splitting because they have overflowed get their turn in round-robin fashion as the split value ranges between 0 - N where N has a factor of 2 to the power of Level, level being the variable incremented whenever the split variable hits N.

New blocks are added one at a time with both extendible hashing, and with linear hashing.

In extendible hashing, a block overflow a new key-value colliding with B other key-values, where B is the size of a block is handled by checking the size of the bit mask "locally", called the "local depth", an attribute which must be stored with the block.

The directory structure, also has a depth, the "global depth". If the local depth is less than the global depth, then the local depth is incremented, and all the key values are rehashed and passed through a bit mask which is one bit longer now, placing them either in the current block, or in another block. The efficiency of a data structure cannot be analyzed separately from those operations.

This observation motivates the theoretical concept of an abstract data typea data structure that is defined indirectly by the operations that may be performed on it, and the mathematical properties of those operations including their space and time cost. List of data structures There are numerous types of data structures, generally built upon simpler primitive data types: Elements are accessed using an integer index to specify which element is required.

Typical implementations allocate contiguous memory words for the elements of arrays but this is not always a necessity. Arrays may be fixed-length or resizable. A linked list also just called list is a linear collection of data elements of any type, called nodes, where each node has itself a value, and points to the next node in the linked list. The principal advantage of a linked list over an array, is that values can always be efficiently inserted and removed without relocating the rest of the list.

Certain other operations, such as random access to a certain element, are however slower on lists than on arrays. A record also called tuple or struct is an aggregate data structure. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names.