09.1 (conspect) Filesystems

It would be nice to tell about the files and how they are arranged, how the file system is implemented. We are used to the fact that a file in the system is a tree of files, and the final elements of this tree are files and directories. There is a directory - it can include files, and it can include directories again. I must say that this hierarchical structure

This is all manifestation of the property - all computer science is a computer technology which is arranged as follows: once there was a problem and this problem was solved by valiant methods and even still is solved.

That's the hierarchy.

../диск1.png

They can be repeated on different branches, but then a unique identifier is the path from the root of the system. Even if the names are the same, the path from the root is different!!! So, in general, exactly such an organization of the file system thing is not convenient if we use classic hard drives. The last 40 years the external devices on which we organise the file system were slow, which also had some internal speed characteristics.

Hard Drive

It's a surface that rotates and is driven by a reading head. Accordingly, the surface itself is divided into tracks and the tracks into sectors. The reading unit is that sector. To read from it, we position the hard drive head on the inheritance of the desired track. And wait for the work surface to rotate to the right sector. If we want to store data there, on such a device we suddenly have a desire, for example, to write a file on one track or on the neighboring data which belong to one group, but not somewhere near the disk.

Solid step drive - everything's gone to zero.

The sector stores data in kilobytes. Even the logical file system was not always like this - it is not effective enough. Database with file path key. To access - open the root and so on - it is very long does not suit us. That is, ideally the right file system would be a system with a key search with a name (tag, ordinal number), so that the key search itself has logarithmic complexity.

Binary tree, for example

ReiserFS is the author who made about this system, but then he killed his wife and went to prison and everything stopped. During that time, people pulled up other file systems. The concept has changed. On devices like ccd, this optimization is not interesting. That is, even a logical file system is not effective. It was born 10 years ago as far as we don't have a task to work with files hyper-effectively, but if we work with large arrays then we work with large data - and this is another area and other issues and systems of working with files.

Files can be of different sizes, quantities different. Let's forget that a disk is a disk, let it be a sequence of sectors from 1 to the last. Let's try to turn the tree into what we originally had in Figure 1;

Option
We have a file name of unknown length, then we put the second name and so on names, names, names.
Problem 1
matching file name and content - We actually have pieces of data at the beginning of which are the name and other descriptions. The files are written in lexicographical order .Take the files and swell them up. When we delete one file, an empty space is formed. We just change the links. The more files on the disk, the more we walk to the end - a linear complexity.

Should I keep the file in a non-continuous state? Or do you want to keep the pieces ?

- Or do you want to keep the pieces ? -and that's the right answer.

First idea
to represent the file as a sequence of sectors, and it can be logical clusters or shorter, We divide the area into metadata and file data.
Metadata
information about the file is diverse (what kind of file, name) and on the other hand this is what kind of thing we take and build such a sign it is called `FAT file action table'

In which(s) we say look if you want to find out what this file is from this block, it will be the first block number. If we say that the file that owns 1 block, it is 1 block - the number 1.The next block in this file is the next block. Instead of the end of the file, we write 1. And we write everything into it. This table takes as much space as the number of sectors on the disk (or rather clusters). On the other hand, it is very primitive - it assumes that we are. We can store files as we like. If we delete and add files for a long time, we get 'porridge'. The disadvantage - the ornament file - is when one file consists of a block of scattered abysses as in the whole file. Because disk operations are slow - let the file take 1 gigabyte of reading the file partition - slower than reading the file lying in pieces. You can read one track in one track revolution.

../карина.png

There's no file name anywhere on this.

File object - directory - the same file behaves as a file somehow placed on the disk and it is a special object of another type and there are names and initial blocks from where they begin mb still has information about the size of the right and so on.

Such conglomerates of the metadata sequence - they fit perfectly into that ideology - except that the length of the file name is always the same (but they did it back then) - there are very long names of 8 bytes 8.3 . In this case the directory structure is an array. It's easier to search, no need to rewind. All names are the same length, and where the file is 3 characters long, the other values up to 8 bytes are 0.

Note
Directory = folder = directory - these are simply different types of objects.

A directory is this file object . We read the file object from there - and this is a directory. The place of the root directory is nailed down.

A catalogue is a sequence of blocks. This is not how all modern file systems work! - but this method solves the problem of separating metadata from files and the problem of organizing a directory. Directories are just such kinds of .

A little more reflection leads us to the fact that the unique indicator of the file is not the name, but the beginning of the chain. Sector numbers on the disk are not unique.

Inode
is a modern method - such a separate piece on the disk occupying one sector which among metadata (rights type access time size and any) - file descriptor.

../inode.png

Since one is a whole sector of meta-information you don't drink to a whole sector, so the information will be taken care of for a long time. Block number 1 block number 2 and part of this block will be fully occupied or not fully occupied. The inode has its own unique number and is a unique file indicator on most modern systems; however, it is not unique if there are two disks. Within the same file system, it is unique. We take the file system and construct it as follows: create a superblk that mainly contains the system's file descriptor (the size of the type how many times it has been used, whether or not the amount of free space inode has been repaired). Inode - unique file indicator within one file system (file indicator)

../FAT.png

Next comes the inode table: - fixed size and large (Index discreteness). If the file is small then all the sectors it consists of, if it is larger then the reference to the same thing in which there are references to other sectors and so on by nesting levels. Catalogs - do not pour out in any way from the file that they are file objects of other types

How to find out if the corresponding node is busy with the file or not. This is a problem! But we want to know if it's free. There is a bit scale - a piece in which as many bits as the node we have planned are free . Approximately half of the file systems have a fixed node table and a percentage of 1%. There's a bit scale, a nodes field. And the bitmap field of the blocks They have a linear search speed - linear bit rate is easy. That is, when we allocate a newly created file and a node and a block and - check the bit, allocate memory for it.

Directory - file name, inode are in the directory.

The topic is outdated, however, even on fairly old file systems, the following thing was done: when we work with bit blocks. The disk is divided into regions. These things are called cylinder groups - when we look at a child's disk and it has several work surfaces and several or more disk heads. And to read this track or this track or a friend, this is called a cylinder - cylinders group - block of sectors. If we start a new file system directory, the file system will inspect and talk about a small region in that region and allocate memory to it. When we start a file in this directory, we already know. Where the block was allocated for the directory. She tries to isolate the files so that they lie in the same region This technology on real operating systems is quite good at avoiding fermentation If we start the file on a clean operating system, it falls into one cylinder group. If you remove it, it's a big one with a group cylinder.

The basic idea that directories are just such files in the sense that they are no different when working with them, we interpret them as tablets. In some operating systems it's fashionable to output. The contents of the directory are on the screen. A unique indicator of a file is its lexemographic discrimination - one file can have as many names as it wants - it's just a record in a directory. File forwarding - setting a name in another directory and deleting in another directory What modus operandi do we face in this rather complete structure? When there's too much Let's suppose that we have the pit files in the directory - million - they are a list - the search in the list is linear - we have to make a binary tree instead.

We're done: Encryption: 1. For example, a variable size - wanted to reduce, wanted to increase. 2. The file system volume 3. Reliability. 4. Versioning - it's called storing the history of changes of objects.

HSE/ProgrammingOS/09_Filesystems/Conspect_en (последним исправлял пользователь NataliaKrauze 2020-02-26 11:30:20)