Persistent sessions using FileStore: Split session files to sub-directories? Persistent sessions using FileStore: Split session files to sub-directories? unix unix

Persistent sessions using FileStore: Split session files to sub-directories?


The problem is not necessarily that a file system can't handle millions of files. They can.

The problem is that few of the tools typically available to manipulate the files do not scale well to millions of files.

Consider both ls and rm.

By default ls sorts its filenames. If you do a simple ls on a huge directory, it basically becomes unresponsive while scanning and sorting all of those millions of files. You can run ls and tell it not to sort, it works, but it's still slow.

rm suffers simply from the problem of filename expansion. Modern shells have very high base resource availability, but you don't want to run shell expansion (e.g. "123*") on millions of files. You need to jump through working with things like find and xargs, but it's actually even better to write custom code.

And heaven forbid you accidentally hit TAB in an autocompleting shell while in a directory with millions of entries.

The database does not suffer these issues. A table scan of millions of records is routine for the database. Operations on millions of anything takes time, but the DB is much better suited for it, especially small things like session entries (assuming your sessions are, indeed, small -- most tend to be).

The JDBCStore deftly routes around the file system problems and puts the load on a data store more adept to handling these kinds of volumes. File systems are key can make good, ad hoc "key-value" stores, but most of our actual work with file systems tend to be scanning of values. And those tools don't work very well with large volumes.

Addenda after looking at the code.

It's easy to see why a large file store will crush the server.

Simply, with the FileStore, every time it wants to try and expire sessions, it reads in the entirety of the directory.

So, best case, imaging reading in a 50M file directory once per minute. This is not practical.

Not only does it read the entire directory, it then proceeds to read every single file within the directory to see if it's expired. This is also not practical. 50M files, utilizing a simple, say, 1024 byte buffer to just read the header of the file, that's 50G of data processing...every minute.

And that's on the optimistic assumption that it only checks once per minute, and not more often.

In contrast, within the JDBCStore, the expiration time is a first class element of the model, so it simply returns all rows with a date less than the expiration time. With an index on that field, that query is essentially instantaneous. Even better, when the logic goes to check if the session has, indeed, expired, it's only checking those that meet the base criteria of the date, instead of every single session.

This is what's killing your system.

Now.

Could a FileStore be made to work better? I don't think so. There's no easy way to match wildcards (that I know of) IN the file system. Rather, all of that matching and such is done against a simple "table scan" of the files. So, even though you'd think it would be easy to simply, say, append the expiration time to the end of file name, you can't find that file (i.e. "Find file with filename that starts with "SESSIONID") without scanning all of them.

If the session meta data were all stored in RAM, then you can index it however you want. But you're in for an ugly start up time when the container starts as it reloads all of the lingering sessions.

So, yea, I think at scale, the JDBCStore (or some other database/indexed solution) is the only real practical way to do things.

Or, you could use the database simply for the meta-data with the file storing the actual session information. Still need a database, but if you're uncomfortable storing your session BLOBs in the DB, that's an alternative.

Perhaps there are some filesystem specific utilities that can better leverage the actual file system architecture that you could fork and then read the results of (or use JNI to talk to the FS directly), but obviously that would be quite file system dependent. I'm not that intimate with the underlying capabilities of the different file systems.


Is large number of files in a folder a problem: Yes.

What to do: Use JDBCStore instead of FileStore.


So out of the box it seems you get both the JDBC and File Based Stores, according to the: Tomcat 8.5 documentation, (make sure you read all of that page if you have not already on the choice between the StandardManager and the PersistentManager)

But I don't see why the File Based Store has to become an issue if you tune your filesystem settings accordingly (at least with ext2/ext3/ext4 you can, so if you are using zfs, xfs, reiserfs etc. you have to look up their documentation, but of course you could mount a seperate disk(partition) at this directory with its own specially tuned filesystem parameters).

This user has posted some relevant experience:

I just ran out of file space in a directory on a 4TB ext4 filesystem, with dir_index enabled. I had about 17 million files in the directory. The answer was to turn on large_dir with tune2fs. – lunixbochs Feb 6 at 20:09

Quoted from: How many files can I put in a directory?

See for more detail on these file system tunables like dir_index and large_dir the man page of tune2fs:

http://man7.org/linux/man-pages/man8/tune2fs.8.html

The only thing that is alas not tunable (not even with resize2fs) and you have to keep an eye on (with: df -i) is if you expect many small files that you may run out of inodes before you run out of disk space, so if you make a special filesystem for this you may want to change the default with:

mkfs -t ext4 -N iNumberOfINodes /dev/yourstoragedevicepartition

However I have not tested all of this with Tomcat myself, so you may want to test/compare it with Gatling, JMeter or any other load testing tool.

Of course if high availability or zero data loss is a requirement and you already have a ha-database cluster that you regularly backup than the JDBC store might be a good fit (not that you could not easily share your directory over NFS to other Linux servers but I digress)