最近在讀取客戶發過來的tiff檔案是,底層竟然報錯了,錯誤:bandOffsets.length is wrong! 沒辦法,因為錯誤訊息出現在tiff的read中,因此就對
底層序中tiff讀取的程式碼進行了研究。
之前有一篇文章,我簡單的介紹了Geotools讀取Tiff的程式碼,Java 通過geotools讀取tiff,其實通過深入研究發現,原來幕後的大佬竟然是imageio-ext中的TiffImageReader,
imageio做為Java開發的人員肯定都知道,而ImageIO-ext是imageio的擴充套件類,我們可以到github上看到它的原始碼,這是一個非常強大的庫,對於Java處理各種柵格資料的讀寫非常有幫助!
藉助這篇文章,我們需要先了解Tiff檔案的具體結構,可以參考這篇文章,TIFF檔案結構詳解 https://blog.csdn.net/oYinHeZhiGuang/article/details/121710467 講的很好!
下面我們來看下imageio-ext中的tiff讀取程式碼,主要類TiffImageReader,我們來看下Java程式是如何讀取tiff檔案的。
public TIFFImageReader(ImageReaderSpi originatingProvider) { super(originatingProvider); }
這個類需要通過一個ImageReaderSpi來範例化,其實這種SPI的設計模式,Java的很多開源專案都在用到,這裡我們通過TIFFImageReaderSpi這個類即可。
其次設定檔案的路徑,以及其它一些引數,通過該類的如下方法:
public void setInput(Object input, boolean seekForwardOnly, boolean ignoreMetadata)
這個方法,裡面有input就是需要讀取的檔案,seekForwardOnly設定為true表示:只能從這個輸入源按升序讀取影象和後設資料。ignoreMetadata設定為true表示讀取忽略後設資料
接下來就是對tiff後設資料的讀取,具體參見getImageMetadata(int imageIndex)這個方法:
public IIOMetadata getImageMetadata(int imageIndex) throws IIOException { seekToImage(imageIndex, true); TIFFImageMetadata im = new TIFFImageMetadata(imageMetadata.getRootIFD().getTagSetList()); Node root = imageMetadata.getAsTree(TIFFImageMetadata.nativeMetadataFormatName); im.setFromTree(TIFFImageMetadata.nativeMetadataFormatName, root); if (noData != null) { im.setNoData(new double[] {noData, noData}); } if (scales != null && offsets != null) { im.setScales(scales); im.setOffsets(offsets); } return im; }
其中的seekToImage(imageIndex, true)為最主要的邏輯處理,這個方法中,第一個引數,imageIndex為tiff多頁中的第幾個,第二引數設定標示該tiff頁是否已經被解析過
private void seekToImage(int imageIndex, boolean optimized) throws IIOException { checkIndex(imageIndex); // TODO we should do this initialization just once!!! int index = locateImage(imageIndex); if (index != imageIndex) { throw new IndexOutOfBoundsException("imageIndex out of bounds!"); } final Integer i= Integer.valueOf(index); //optimized branch if(!optimized){ readMetadata(); initializeFromMetadata(); return; } // in case we have cache the info for this page if(pagesInfo.containsKey(i)){ // initialize from cachedinfo only if needed // TODO Improve if(imageMetadata == null || !initialized) {// this means the curindex has changed final PageInfo info = pagesInfo.get(i); final TIFFImageMetadata metadata = info.imageMetadata.get(); if (metadata != null) { initializeFromCachedInfo(info, metadata); return; } pagesInfo.put(i,null); } } readMetadata(); initializeFromMetadata(); }
這個方法當中,第一次載入tiff,通過readMetadata()和initializeFromMetadata()將tiff的元資訊快取起來,方便後面再次讀取。
主要是要結合Tiff的格式進行理解,大體主要是解析tiff頭,然後獲取到IFD(tiff的影象目錄資訊),然後再依次去解析每個目錄的具體內容,程式碼就不再這裡羅列了。
這裡主要說下,解析目錄資訊是獲取tiff的元資訊的過程,通常是解析每個tag的資訊,解析程式碼TIFFIFD類的initialize(ImageInputStream stream, boolean ignoreUnknownFields, final boolean isBTIFF)方法中
public void initialize(ImageInputStream stream, boolean ignoreUnknownFields, final boolean isBTIFF) throws IOException { removeTIFFFields(); List tagSetList = getTagSetList(); final long numEntries; if(isBTIFF) numEntries= stream.readLong(); else numEntries= stream.readUnsignedShort(); for (int i = 0; i < numEntries; i++) { // Read tag number, value type, and value count. int tag = stream.readUnsignedShort(); int type = stream.readUnsignedShort(); int count; if(isBTIFF) { long count_=stream.readLong(); count = (int)count_; if(count!=count_) throw new IllegalArgumentException("unable to use long number of values"); } else count = (int)stream.readUnsignedInt(); // Get the associated TIFFTag. TIFFTag tiffTag = getTag(tag, tagSetList); // Ignore unknown fields. if(ignoreUnknownFields && tiffTag == null) { // Skip the value/offset so as to leave the stream // position at the start of the next IFD entry. if(isBTIFF) stream.skipBytes(8); else stream.skipBytes(4); // XXX Warning message ... // Continue with the next IFD entry. continue; } long nextTagOffset; if(isBTIFF){ nextTagOffset = stream.getStreamPosition() + 8; int sizeOfType = TIFFTag.getSizeOfType(type); if (count*sizeOfType > 8) { long value = stream.readLong(); stream.seek(value); } } else{ nextTagOffset = stream.getStreamPosition() + 4; int sizeOfType = TIFFTag.getSizeOfType(type); if (count*sizeOfType > 4) { long value = stream.readUnsignedInt(); stream.seek(value); } } if (tag == BaselineTIFFTagSet.TAG_STRIP_BYTE_COUNTS || tag == BaselineTIFFTagSet.TAG_TILE_BYTE_COUNTS || tag == BaselineTIFFTagSet.TAG_JPEG_INTERCHANGE_FORMAT_LENGTH) { this.stripOrTileByteCountsPosition = stream.getStreamPosition(); if (LAZY_LOADING) { type = type == TIFFTag.TIFF_LONG ? TIFFTag.TIFF_LAZY_LONG : TIFFTag.TIFF_LAZY_LONG8; } } else if (tag == BaselineTIFFTagSet.TAG_STRIP_OFFSETS || tag == BaselineTIFFTagSet.TAG_TILE_OFFSETS || tag == BaselineTIFFTagSet.TAG_JPEG_INTERCHANGE_FORMAT) { this.stripOrTileOffsetsPosition = stream.getStreamPosition(); if (LAZY_LOADING) { type = type == TIFFTag.TIFF_LONG ? TIFFTag.TIFF_LAZY_LONG : TIFFTag.TIFF_LAZY_LONG8; } } Object obj = null; try { switch (type) { case TIFFTag.TIFF_BYTE: case TIFFTag.TIFF_SBYTE: case TIFFTag.TIFF_UNDEFINED: case TIFFTag.TIFF_ASCII: byte[] bvalues = new byte[count]; stream.readFully(bvalues, 0, count); if (type == TIFFTag.TIFF_ASCII) { // Can be multiple strings final List<String> v = new ArrayList<String>(); boolean inString = false; int prevIndex = 0; for (int index = 0; index <= count; index++) { if (index < count && bvalues[index] != 0) { if (!inString) { // start of string prevIndex = index; inString = true; } } else { // null or special case at end of string if (inString) { // end of string final String s = new String(bvalues, prevIndex,index - prevIndex); v.add(s); inString = false; } } } count = v.size(); String[] strings; if(count != 0) { strings = new String[count]; for (int c = 0 ; c < count; c++) { strings[c] = v.get(c); } } else { // This case has been observed when the value of // 'count' recorded in the field is non-zero but // the value portion contains all nulls. count = 1; strings = new String[] {""}; } obj = strings; } else { obj = bvalues; } break; case TIFFTag.TIFF_SHORT: char[] cvalues = new char[count]; for (int j = 0; j < count; j++) { cvalues[j] = (char)(stream.readUnsignedShort()); } obj = cvalues; break; case TIFFTag.TIFF_LONG: case TIFFTag.TIFF_IFD_POINTER: long[] lvalues = new long[count]; for (int j = 0; j < count; j++) { lvalues[j] = stream.readUnsignedInt(); } obj = lvalues; break; case TIFFTag.TIFF_RATIONAL: long[][] llvalues = new long[count][2]; for (int j = 0; j < count; j++) { llvalues[j][0] = stream.readUnsignedInt(); llvalues[j][1] = stream.readUnsignedInt(); } obj = llvalues; break; case TIFFTag.TIFF_SSHORT: short[] svalues = new short[count]; for (int j = 0; j < count; j++) { svalues[j] = stream.readShort(); } obj = svalues; break; case TIFFTag.TIFF_SLONG: int[] ivalues = new int[count]; for (int j = 0; j < count; j++) { ivalues[j] = stream.readInt(); } obj = ivalues; break; case TIFFTag.TIFF_SRATIONAL: int[][] iivalues = new int[count][2]; for (int j = 0; j < count; j++) { iivalues[j][0] = stream.readInt(); iivalues[j][1] = stream.readInt(); } obj = iivalues; break; case TIFFTag.TIFF_FLOAT: float[] fvalues = new float[count]; for (int j = 0; j < count; j++) { fvalues[j] = stream.readFloat(); } obj = fvalues; break; case TIFFTag.TIFF_DOUBLE: double[] dvalues = new double[count]; for (int j = 0; j < count; j++) { dvalues[j] = stream.readDouble(); } obj = dvalues; break; case TIFFTag.TIFF_LONG8: case TIFFTag.TIFF_SLONG8: case TIFFTag.TIFF_IFD8: long[] lBvalues = new long[count]; for (int j = 0; j < count; j++) { lBvalues[j] = stream.readLong(); } obj = lBvalues; break; case TIFFTag.TIFF_LAZY_LONG8: case TIFFTag.TIFF_LAZY_LONG: obj = new TIFFLazyData(stream, type, count); break; default: // XXX Warning break; } } catch(EOFException eofe) { // The TIFF 6.0 fields have tag numbers less than or equal // to 532 (ReferenceBlackWhite) or equal to 33432 (Copyright). // If there is an error reading a baseline tag, then re-throw // the exception and fail; otherwise continue with the next // field. if(BaselineTIFFTagSet.getInstance().getTag(tag) == null) { throw eofe; } } if (tiffTag == null) { // XXX Warning: unknown tag } else if (!tiffTag.isDataTypeOK(type)) { // XXX Warning: bad data type } else if (tiffTag.isIFDPointer() && obj != null) { stream.mark(); stream.seek(((long[])obj)[0]); List tagSets = new ArrayList(1); tagSets.add(tiffTag.getTagSet()); TIFFIFD subIFD = new TIFFIFD(tagSets); // XXX Use same ignore policy for sub-IFD fields? subIFD.initialize(stream, ignoreUnknownFields); obj = subIFD; stream.reset(); } if (tiffTag == null) { tiffTag = new TIFFTag(null, tag, 1 << type, null); } // Add the field if its contents have been initialized which // will not be the case if an EOF was ignored above. if(obj != null) { TIFFField f = new TIFFField(tiffTag, type, count, obj); addTIFFField(f); } stream.seek(nextTagOffset); } this.lastPosition = stream.getStreamPosition(); }
Tiff常用的Tag標籤類有BaseLineTiffTagSet、FaxTiffTagSet、GeoTiffTagSet、EXIFPTiffTagSet、PrivateTIFFTagSet等。
其中的GeoTiffTagSet用於geotiff的額外儲存資訊,在這裡說明下,Geotiff是Tiff格式對Gis資料的一種儲存支援,而PrivateTIFFTagSet是對gdal的支援,增加了NODATA、MEATADATA的資訊。
對於文章開頭提的關於bandOffsets.length is wrong!,主要原因出現在getImageTypes(int imageIndex)這個方法的下面這個實現中。
ImageTypeSpecifier itsRaw =
TIFFDecompressor.getRawImageTypeSpecifier
(photometricInterpretation,
compression,
samplesPerPixel,
bitsPerSample,
sampleFormat,
extraSamples,
colorMap);
最終我們在ImageTypeSpecifier這個類的Interleaved(ColorSpace colorSpace,int[] bandOffsets,int dataType,boolean hasAlpha,boolean isAlphaPremultiplied) 方法中發現問題。
public Interleaved(ColorSpace colorSpace, int[] bandOffsets, int dataType, boolean hasAlpha, boolean isAlphaPremultiplied) { if (colorSpace == null) { throw new IllegalArgumentException("colorSpace == null!"); } if (bandOffsets == null) { throw new IllegalArgumentException("bandOffsets == null!"); } int numBands = colorSpace.getNumComponents() + (hasAlpha ? 1 : 0); if (bandOffsets.length != numBands) { throw new IllegalArgumentException ("bandOffsets.length is wrong!"); }
我們發現只有當我們的影象偏移數量和我們的通道數不一致的時候,就會報這個錯誤!
通過研究這個問題,基本上梳理了Java基於ImageIO-ext讀取tiff的過程,基本跟tiff的資料結構對應起來。