Today I was working with a partner in Paris and he wanted to know how to calculate the size of a specific folder in the eXo Java Content Repository (JCR).
For this specific need the goal is to calculate the size of all the documents stored inside a specific location in the content repository. This could be used for example to manage quotas, estimate the size of a shared or personal storage, … For this specific sample I will only take in consideration the size of the binary part of the document stored in the repository; this means I will not pay attention to the various attributes and meta-data that are also stored, neither the full text index created by Lucene that is embedded in eXo JCR.
How the files are stored in the JCR?
Files are stored in eXo JCR in the standard node type
nt:resource). So for this example I will simply list all the
nt:file of a folder and aggregate the size of the file itself. It is important to understand how JCR is storing the binary content. The best way to understand it, is to view it. For that I am using the print information given by CRaSH, a shell for content repository developed by eXo team and lead by Julien Viet.
Here the structure of a PDF document :
1 2 3 4 5 6 7 8
As you can see in this node the ‘binary’ is not visible, nothing bad here. As written in the specification in the section
188.8.131.52 nt:file, the binary content is an attribute of the child node
jcr:content that is exposed below:
1 2 3 4 5 6 7 8 9 10 11
You can see now that the
jcr:content contains some interesting attributes:
jcr:datathis is where the binary content, the PDF itself , is. So using the JCR API you just need to get the content length using the following java code:
This returns the number of bits of the binary data.
So to calculate the size of a folder you just need to navigate in all the documents (nt:file or jcr:content) and cumulate the size of all the files. In this following code, I am calculating the size of the folder “/Documents” by navigating into all the files contains in this folder and subfolders. (I could have chose to query all the
jcr:content type instead of
1 2 3 4 5 6 7 8 9 10 11 12
As you can guess since we are navigating in the hierarchy you have to be very careful when using such query. This example is just a simple code sample to show you some of the cool features provided by the JCR API.