[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Byte manipulation
John Ivens wrote:
>
> The following two paragraphs were lifted from your docs:
> ------------------------------------------
> >>> NOTE <<< Data less that one byte (8 bits) per pixel is assumed to
> be packed as densely as possible to a multiple of a pixel in a byte.
> For example, 8 1-bit pixels per byte, 2 4-bit pixels per byte, 1 6-bit
> pixel per byte (the remaining 2 bits are padding), etc. Pixel ordering
> is LSB to MSB. Unknown data types may be larger than a byte, always
> occupying a multiple of bytes. For example, 12-bit pixels would occupy
> 2 bytes (the remaining 4-bits are padding). A 128-bit complex data type
> would be classified as RASTER_UNKNOWN_DATA_TYPE.
>
> Raster image data is expected to be formated raster-wise: the pixel at
> the upper left corner of the first image frame is the first pixel in
> the file. The next "width" pixels that follow are for the first row of
> the image. Then follow the remaining rows for the first frame of the
> image, followed by the remaining frames of the image. This is
> left-to-right, top-to-bottom, front-to-back order.
> -------------------------------------------
> I know that we will have to eventually handle order. This needs to be built in.
> So the second paragraph will become a moot point, although the order listed
> will be a default. I don't believe that we discussed sub-byte sized pixel
> values or over-sized pixel values in detail. It seems that the default case
> here is a good one. However, we will need to allow for different packing
> strategies. Messy.
>
The Raster_IO module never had to actually deal with a case of an "odd"
pixel size. Nevertheless, single bit and nyble sized pixels were on my
mind because they are rather common (though not for the imagery we use).
So I laid out a capability - a place to get across that issue should I
ever need a bridge.
It does not seem that we need to actually do anything right now about
odd pixel sizes. However, if we maintain the habit of thinking in terms
of general patterns, beyond specific cases, and the information needed
to address capabilities, beyond particular uses, then we should be in
a position to cross those bridges if/when we arrive at them. For a great
many capabilities it's likely that we will setup processing systems that
will be able to handle virtually all cases because they grasp the root of
the issue; for example, moving sizeof bytes around rather than working
from hardcoded type sizes. In many cases we will set up sign posts and
leave building blocks for bridges that we anticipate need eventually to
be crossed.
My concern is that we build a system that expects to grow to meet new
needs and avoid building dead-end, fixed, fixes. If we can get something
working on this basis - and I'm sure we can - then we'll be in a position
to deal with the messy stuff later. If it turns out to be easy to handle
sub-byte pixel sizes on the first go, fine (perhaps my raster_io code
will help do this, but I'm not counting on it). If not, then erect a sign
post in the code (I am a big advocate of using lots of comments in code
for clarifying the thinking behind the code logic and for ruminating on
concerns and possibilities) and move on.
>
> The following is the structure you used to hold pixel data:
> ----------------------------------------------------
> int info_amount; Size (bytes) of this structure.
> unsigned short ID_number; Specific ID number.
> char code[2]; Code marker.
> char version[64]; Version identifier.
> char date[32]; Time stamp.
> char title[128]; Title.
> int width; Pixels (samples) per line.
> int height; Lines (rows) per frame.
> int frames; Frames (planes) in the image.
> int pixel_bits; Bits per pixel.
> int data_type; Pixel data type.
> char color_map[256]; Path to color map.
> char history[256]; Path to history file.
> char data_link[256]; Path to data link.
> ---------------------------------------------------
>
> It seems that we will drop info_amount, ID_number, code, version, and date.
The info_amount was important because the structure was stored as a binary
data block in a file describing the image (this is the "info" file). When
raster_io processes data arriving on a stream (i.e. not in a file) it needs
this information right up front so it knows the size of the data block to
read before the image data arrives. When we get to the point of serializing
an Image class object over a stream I think this problem will be dealt with
differently. So I agree that the size of the structure is not needed in our
Image class.
The other identifying info, however, I do think we should have in some form.
The ID_number and code provided magic bytes and structure type identifiers,
the idea being for raster_io to confirm that it was reading something known
to it; i.e. data and software version matching. I think that it will probably
turn out to be useful to give each instantiated Image object a unique ID
number (a static class variable that is incremented on each instantiation
and assigned to a private variable in the new object). I have also found
it to be helpful to use manufacturing stamps in my code: each class has
a private string variable (static constant) automatically kept current by
using code management keywords (e.g. %W% %G% for SCCS, the equivalent for
CVS). This manufacturing stamp should be included in all error messages
(placed in exceptions by the exception producing code) to identify exactly
which code release is responsible.
> The title will probably become the filename or some user-defined title
> describing the data.
Yes, I agree. In fact we should start a class card for a Description
class that our objects can carry around. This could include user provided
title and description, as well as being expanded to included GUI
descriptive info such as icon, screen location, etc.
> We want to keep width, height, and frames in the geometry area.
Yes. But more generalized as we've discussed, into a Dimensions and Limits
vector. I suspect that we will want to define a Pixel_Geometry class
at some point.
> We need pixel_bits and data_type in the pixel data.
Yes. I have found that both pixel_bits and pixel_bytes are useful. The
former because this is often how pixels are described in image labels
(and because of the potential for pixel sizes to not be even multiples of
bytes, as above), and the latter because this is usually how raw pixel
data is manipulated (and because not all bytes are 8 bits!). The data_type,
being an indication of how to mathematically interpret the meaning of the
pixel bits, has always been an interesting challenge. All possible
primitive data types supported by the host system (i.e. the compiler
environment being used) should be enumerated. We may eventually need
to use ifdefs to allow for them all (e.g. not all compilers support
long doubles). Extending the basic Pixel_Data class to encompass abstract
- complex, Big (indefinite precision), and user-defined - data types will,
of course, be inevitable. So keep this in mind when designing this class.
> The path to the color_map, history, and data_link will be handled in
> some other way, so they also are eliminated.
>
In the PPVL_parameter structure I provided a "user_data" variable as
a pointer to user-defined data, i.e. the user could attach any data
to each parameter that they wanted to. This is a hack to allow the
structure to be extended by the application developer (this would
more properly be done by subclassing, but in C ...). I think that
any information about the image file outside the strict requirements
of the ICL should be handled in the context of Attributes (or application
developer subclassing of our Image class).
> For an individual pixel record (stored in geometry?), this leaves:
> int pixel_bits; Bits per pixel.
> int data_type; Pixel data type.
>
> from the original list. We should probably add:
> int packing_strategy; How bytes are currently packed (consecutive close packed, padded,etc.)
>
Bit packing strategy can probably be left to a later iteration. I think
that all the pixel data handled by the Image class will be in the form of
consecutive bytes, and padding will not be an issue; these may be matters
that particular format drivers will need to handle due to how data is
stored, but not the ICL. Application data storage in high level abstractions
will be dealt with either as pipeline segments or subclassing, but in any
case do not seem to be an issue for us now.
However, byte ordering does need to be dealt with from the beginning. We
must provide byte order info in the Pixel_Data class. From my xv
experience with this it seems that there is only MSB-first (high endian)
or LSB-first (low endian) to be concerned about (are there more complex
cases?). Unless the format driver tells us otherwise, we have no choice
but to assume that image file byte ordering is the same as the host system.
Determining the host system byte ordering is simple. It can be done as
a preprocessor define - which creates the possibility of having it set
incorrectly - or as a little initialization step (as is done in the
xvPVL module). I think that any data reordering should be done at the
lowest possible level (some hardware has a special instruction just for
this purpose) and at the earliest possible time (so that it can be
safely presumed that data will have its expected mathematical meaning).
I suggest looking at the XDR specification as a good starting point
(http://pirlserver.lpl.arizona.edu:8888/ab2/coll.45.9/ONCDG/@Ab2PageView/20748?
and
http://pirlserver.lpl.arizona.edu:8888/ab2/coll.45.9/ONCDG/@Ab2PageView/27538?).
> I think that this captures everything we currently need to know. I'm going
> through looking at the details here. Specifically:
> ------------------------------------------------------
> switch (pixel_bits)
> {
> case 8:
> return (RASTER_UNSIGNED_BYTE_DATA_TYPE);
> case 16:
> return (RASTER_UNSIGNED_SHORT_DATA_TYPE);
> case 32:
> return (RASTER_FLOAT_DATA_TYPE);
> case 64:
> return (RASTER_DOUBLE_DATA_TYPE);
> }
> ------------------------------------------------------
> -- and ---
> ------------------------------------------------------
> switch (data_type)
> {
> case RASTER_UNKNOWN_DATA_TYPE:
> return (0);
> case RASTER_BYTE_DATA_TYPE:
> case RASTER_UNSIGNED_BYTE_DATA_TYPE:
> return (8);
> case RASTER_SHORT_DATA_TYPE:
> case RASTER_UNSIGNED_SHORT_DATA_TYPE:
> return (16);
> case RASTER_INT_DATA_TYPE:
> case RASTER_UNSIGNED_INT_DATA_TYPE:
> case RASTER_LONG_DATA_TYPE:
> case RASTER_UNSIGNED_LONG_DATA_TYPE:
> case RASTER_FLOAT_DATA_TYPE:
> return (32);
> case RASTER_DOUBLE_DATA_TYPE:
> return (64);
> }
> ------------------------------------------------------
>
> This may be different on the various architectures. Were you trying to
> make this generic or were you trying to make it specific to a particular
> architecture?
>
I was hacking. The first switch is almost the entire contents of the
raster_data_type function. As the description for this function points
out (with a wink): "This function is not necessarily determinate, it is
suggestive." The point is as we were discussing yesterday: It is not
possible to determine a data type from the examination of raw data (a
32 bit datum may be interpreted as an integer or a float; there is
nothing inherent in the data to make the distinction, though educated
guessing by careful examination of bit patterns is not to be discouraged ;-).
The second switch, from the raster_pixel_bits function, is just poor
programming. It should be:
case RASTER_BYTE_DATA_TYPE:
case RASTER_UNSIGNED_BYTE_DATA_TYPE:
return (sizeof (char));
etc. I wrote this about ten years ago as a "quick fix" when we needed
an image file format suitable for use with the QCR/Z filmwriter device.
This does not excuse its inadequacies. It did reinforce in me the need
to solve the image file format problem by dealing with image information,
not image structure.
BC
> Bradford Castalia wrote:
>
> > The raster_io module (/usr/local/image/lib/raster_io.c and its include
> > file in ../include/raster_io.h) contains examples of pixel data
> > characterization.
> >
> > BC
> >
> > John Ivens wrote:
> > > Let me know where that code lives. Those that don't read code are
> > > doomed to repeat it... :)
>
> --
> When I use a word it means just what | John Ivens
> I choose it to mean - neither more | Principal Programmer
> nor less. | Cassini VIMS
> -- Humpty Dumpty | (520) 621-7301
>
>
--
Bradford Castalia Castalia@azstarnet.com
Systems Analyst http://azstarnet.com/~castalia
idaeim 520-624-6629
"Build an image in your mind, fit yourself into it."
The Log of Cyradis, Seeress of Kell.