Suite a mon précédant journal [http://linuxfr.org/~mildred/24648.html], j'ai pensé que je devais faire quelque chose. J'ai donc imaginé un spécification qui décrit un fichier qui contiendrait à la fois des méta-données sur un fichier, et le fichier en lui même. Il est aussi possible de séparer les deux parties.
Pour rappel, les magic numbers sont une chose formidable pour détecter le type d'un fichier, mais c'est insuffisant car :
- plusieurs types de fichiers différents peuvent avoir le même magic number (archives ZIP, JAR, OpenDocument)
- les fichiers texte ne contiennent en général pas de magic number (sauf les scripts avec la fameux #!, mais c'est uneexception)
Et les extensions de fichiers, bien que très pratique dans certains cas et qu'il faut continuer à utiliser dans certains domaines comme la programmation, ne sont pas non plus une solution idéale car elles se mélangent avec le nom du fichier alors que cela n'a rien à voir.
ma solution consiste donc proposer une spécification pour un fichier desc (joli non n'est-ce pas ?). Avant de la poster par exemple sur freedesktop.org (il faudra que je voie comment) j'aimerais vous la soumettre afin que vous puissiez me dire si :
- une spécification existante ne fait pas double usage
- des choses sont a améliorer.
Je l'ai fait en anglais car c'est pour la diffuser. Cependant je n'ai pas encore relu mon anglais, il peut donc y avoir beaucoup d'erreurs, excusez-cela par avence.
Desc file format
Created Sunday 01/07/2007
This document describe the desc file format. This file aims at describind another file, especially during transfers like attachements in e-mail or any other kind of transfert. This has been made by Mildred <mildred593(at)online.fr> who was inspired bu the application/applefile files attached by Apple Mail. Maybe another file format is standardized with the same goal, this document is only a draft and presents an idea.
This is the draft number 1 of this specification released the 1st of July 2007.
The desc file includes description for a file that can be included or not. If the file is included the content type is application/descfile. If the file is not included the content type can also be text/descfile.
The frst part of the file is always presend and is textual data. It is called the headers. There must not be any single white line in the body and the first white line begins the second part, that is the included file.
The headers part
The headers host metadata about the file. It is composed of multiples physical lines separated by the character LF (\n). It finnishes when two LF characters are found or at the end of the file. So blank lines are not authorized.
Logical lines are defines. A logical line is generally the same as a physical line except when multiples physical lines are used inside a singlo logical line. A logical line is composed of a finite positive and non null number of physical lines. The first physical line included must not begin by a whitespace character (defined as the ASCII space or the ASCII tabulation \t) and the following physical lines must begin with a whitespace.
The complete header is then split into logical lines except when the first lines of the headers starts with a whitespace character. These first lines are then completely ignored.
Each logical line must be either a header or a comment. The logical line include the line feeds between the physical lines that compose it but do not include the last line feed.
A comment is a logical line that starts wth the character '#'. It is completely ignored. Note that comments are logical lines so they can span across multiple physical lines.
A header is a line that starts with a header name and continue with a content. The header name must contains only characters from A to Z, from a to z, from 0 to 9, the character '-' and the character '.'. The header name and the content are separated by a colon ':' followed by optional whitespaces characters (space or tabulation) that are not part of the content.
The content can span across multiple physical lines. The content is defined as the binary data from the first non whitespace character after the colon following the header name until the end of te logical line.
We can also define the stripped content that is the same as the content except that the first whitespace character after any line feed is removed.
The included file part
The included file part is optional and is defines as the binary data that follow the first occurence of two LF characters.
A desc file is valid only if it matches the description given above and if all the headers are valid and if all required headers are present.
A header is valid only if either :
• The name is described as a valid name and the content matches the description given
• The name begins with "X-" or "x-"
• The name contains dots (.)
if a header specifies that is must be present only once that means that if the headers is present more than once, the file is invalid.
The valid headers are described below :
This header is not required. It defines the version of the file specification that must be followed. if not present, the version of the specification that should be taken is the version 1.0 or the latest draft if the version 1.0 of this specification does not exists. This header must be present only once.
This header is required only if the desc file does not hold the included file part. it contains a relative path to the filename that the desc file describes. The path is relative to the desc file location. This header must be present only once.
The Content-Type header is case-insensitive and holds the content-type of the file described. This header must be present only once and is not required.
Examples of desc files
Content-Type: text/plain; encoding=utf-8
This is a text file
encoded in UTF-8
Here the desc file is useful because it can specify the encoding of the text file
Example of uses
The desc file may be used to transfert metadata along with a file in a protocol when this is not usually permitted. For example we can imagine a mail client that will create a desc file for each attached file in an electronic mail. The desc file will host metadata and will reference the file with its name using the Filename header.
it can also be useful to store files and keep informations such as their content-type on filesystems that do not store this information.
We can imagine that on such systems, applications can open desc files and read the included content. So it would be possible to store files and not loose any metadata such as the content type. Thus not being forced to rely on magic numbers (note that magic numbers are not always relyable, for example with test files or ZIP files that can be also Jar archives or openDocuments).