dotnetspider.com
Login Login    Register      

TutorialsForumCareer DevelopmentResourcesReviewsJobsInterviewCommunitiesProjectsTraining

Subscribe to Subscribers
Talk to Webmaster
Tony John

Facebook
Google+
Twitter
LinkedIn
Online Membersbaskar
Aanshu singh
srirama
More...
Join our online Google+ community for Bloggers, Content Writers and Webmasters




Resources » .NET programming » .NET Framework

Introduction On Text Compression Using Lempel, Ziv, Welch (LZW) method


Posted Date:     Category: .NET Framework    
Author: Member Level: Bronze    Points: 10


Text Compression Using Lempel, Ziv, Welch (LZW) method



 


Introduction

LZW is one of the data compression method developed by Lempel, Ziv and Welch. The below is its algorithm explained in simple plain word :

Compression
Compression
Given an uncompressed string, text(q)text(p) where text(q) is the longest prefix of the uncompressed string found in dictionary and text(p) is the suffix of the uncompressed string.

To begin compress the string,
replaced the text(q) with q code where the dictionary key for q code must already be defined.
add new code inside the dictionary with key text(q)fc(p)where fc(p)represents the first character of the uncompressed string.
It sounds complicated. Let's take a look on the below example which will make your mind clear.
Consider the below uncompressed string and its predefined dictionary.

PPPQPPQQQ
Code 0 1
Key P Q


Here they go by using LZW compression method...

--------------------------------------------------------------------------------
0 PPQPPQQQ - New Key "PP" was added since the 1st compressed P is followed by the 2nd P
Code 0 1 2
Key P Q PP

--------------------------------------------------------------------------------
0 2 QPPQQQ - New Key "PPQ" was added since the 2nd compressed PP is followed by the 4th Q
Code 0 1 2 3
Key P Q PP PPQ

--------------------------------------------------------------------------------
0 2 1 PPQQQ - New Key "QP" was added since the 4th compressed Q is followed the 5th P
Code 0 1 2 3 4
Key P Q PP PPQ QP

--------------------------------------------------------------------------------
0 2 1 3 QQ - New Key "PPQQ" was added since the 5rd compressed PPQ is followed the 8th Q
Code 0 1 2 3 4 5
Key P Q PP PPQ QP PPQQ

--------------------------------------------------------------------------------
0 2 1 3 1 Q - New Key "QQ" was added since the 8th compressed Q is followed the 9th Q
Code 0 1 2 3 4 5 6
Key P Q PP PPQ QP PPQQ QQ

--------------------------------------------------------------------------------
Finally, the whole string PPPQPPQQQ is compressed to 0 2 1 3 1 1
Originally, the size of the string PPPQPPQQQ is 9 x 8 bits (size of ASCII) = 72 bits.

For the compressed string, if we define the code size as 4 bits in the beginning (of course, the 4 bits is too small in the real world application but sufficient enough for our example), the size of the compressed string 0 2 1 3 1 1 will be 6 x 4 bits = 24 bits. The compressed ration is 3 which will make everyone happy on it !

It is important to note that the dictionary is built during the execution of the compressor. It will be removed from the memory after the compressor finishs its execution. Only 0 2 13 1 1 will be written on the disk and not the dictionary.
--------------------------
Decompression

To decompress the compressed the string, consider the qp code.

Case 1 : P code is found in dictionary. Replaced p with text(p). Add new code into dictionary based on key text(q)fc(p)
Case 2 : P code is not found in dictionary. Replaced p with text(q)fc(q). Add new code into dictionary based on key text(q)fc(q)
Consider the below code and its predefined dictionary.
0 2 13 1 1
Code 0 1
Key P Q

Here they go by using LZW decompression method...

--------------------------------------------------------------------------------
It is noted that the first code will always defined in the dictionary. Thus, we will get
P 2 13 1 1 - Since this is the first code that we decompress. We will just left it without adding any new code to dictionary.
Code 0 1
Key P Q

--------------------------------------------------------------------------------
PPP 1 3 1 1 - Since 2 is not found in dictionary and it is case 2. We will replace 2 with text(0)fc(0) which is equivalent to PP. New code added into dictionary based on key text(0)fc(0) which is equivalent to PP.
Code 0 1 2
Key P Q PP

--------------------------------------------------------------------------------
PPPQ 3 1 1 - Since 1 is found in dictionary and it is case 1. We will replace 1text(1)which is equivalent to Q. New code added into dictionary based on key text(2)fc(1) which is equivalent to PPQ.
Code 0 1 2 3
Key P Q PP PPQ

--------------------------------------------------------------------------------
PPPQPPQ 1 1 - Since 3 is found in dictionary and it is case 1. We will replace 3text(3)which is equivalent to PPQ. New code added into dictionary based on key text(1)fc(3) which is equivalent to QP.
Code 0 1 2 3 4
Key P Q PP PPQ QP

--------------------------------------------------------------------------------

PPPQPPQQ 1 - Since 1 is found in dictionary and it is case 1. We will replace 1text(1)which is equivalent to Q. New code added into dictionary based on key text(3)fc(1) which is equivalent to PPQQ.
Code 0 1 2 3 4 5
Key P Q PP PPQ QP PPQQ

--------------------------------------------------------------------------------

PPPQPPQQQ - Since 1 is found in dictionary and it is case 1. We will replace 1text(1)which is equivalent to Q. New code added into dictionary based on key text(1)fc(1) which is equivalent to QQ.
Code 0 1 2 3 4 5 6
Key P Q PP PPQ QP PPQQ QQ

--------------------------
--------------------------
Compress.cs

using System;

public class Compress
{
private static System.String[] Files
{
set
{
System.String inputFile, outputFile;
if (value.Length >= 1)
{
inputFile = value[0];
in_Renamed = new System.IO.BufferedStream(new System.IO.FileStream(inputFile, System.IO.FileMode.Open, System.IO.FileAccess.Read));
outputFile = inputFile + ".zzz";
out_Renamed = new System.IO.BufferedStream(new System.IO.FileStream(outputFile, System.IO.FileMode.Create));
}
else
{
System.Console.Out.Write("usage:java Compress ");
System.Environment.Exit(1);
}
}

}
internal const int MAX_CODES = 4096;
internal const int BYTE_SIZE = 8;
internal const int EXCESS = 4;
internal const int ALPHA = 256;
internal const int MASK1 = 255;
internal const int MASK2 = 15;

internal static int leftOver;
internal static bool bitsLeftOver;
internal static System.IO.BufferedStream in_Renamed;
internal static System.IO.BufferedStream out_Renamed;

private static void output(int pcode)
{
int c, d;
if (bitsLeftOver)
{
d = pcode & MASK1;
c = (leftOver << EXCESS) + (pcode >> BYTE_SIZE);
out_Renamed.WriteByte((System.Byte) c);
out_Renamed.WriteByte((System.Byte) d);
bitsLeftOver = false;
}
else
{
leftOver = pcode & MASK2;
c = pcode >> EXCESS;
out_Renamed.WriteByte((System.Byte) c);
bitsLeftOver = true;
}
}

private static void compress()
{
System.Collections.Hashtable table = new System.Collections.Hashtable();
for (int i = 0; i < ALPHA; i++)
SupportClass.PutElement(table, i, i);

int codeUsed = ALPHA;

int c = in_Renamed.ReadByte();
if (c != - 1)
{
int pcode = c;
c = in_Renamed.ReadByte();
while (c != - 1)
{
int k = (pcode << BYTE_SIZE) + c;
System.Int32 e = (System.Int32) table[k];
if ((System.Object) e == null)
{
output(pcode);
if (codeUsed < MAX_CODES)
SupportClass.PutElement(table, (pcode << BYTE_SIZE) + c, codeUsed++);
pcode = c;
}
else
pcode = e;
c = in_Renamed.ReadByte();
}

output(pcode);
if (bitsLeftOver)
out_Renamed.WriteByte((System.Byte) (leftOver << EXCESS));
}
//UPGRADE_TODO: Method 'java.io.BufferedInputStream.close' was not converted. 'ms-help://MS.VSCC.2003/commoner/redir/redirect.htm?keyword="jlca1095"'
//System.IO.BufferedStream.Null ;
in_Renamed.Close();
out_Renamed.Close();
}

[STAThread]
public static void Main(System.String[] args)
{
Files = args;
compress();
}
}

--------------------------
--------------------------
Decompress.cs

using System;

public class Decompress
{
private static System.String[] Files
{
set
{
System.String inputFile, outputFile;
if (value.Length >= 1)
{
inputFile = value[0];
if (!inputFile.EndsWith(".zzz"))
{
System.Console.Out.WriteLine("The filename must end with \"zzz\" extension");
System.Environment.Exit(1);
}
in_Renamed = new System.IO.BufferedStream(new System.IO.FileStream(inputFile, System.IO.FileMode.Open, System.IO.FileAccess.Read));
outputFile = inputFile.Substring(0, (inputFile.Length - 4) - (0));
out_Renamed = new System.IO.BufferedStream(new System.IO.FileStream(outputFile, System.IO.FileMode.Create));
}
else
{
System.Console.Out.Write("usage:java Decompress ");
System.Environment.Exit(1);
}
}

}
private static int Code
{
get
{
int c = in_Renamed.ReadByte();
if (c == - 1)
return - 1;

int code;
if (bitsLeftOver)
code = (leftOver << BYTE_SIZE) + c;
else
{
int d = in_Renamed.ReadByte();
code = (c << EXCESS) + (d >> EXCESS);
leftOver = d & MASK;
}
bitsLeftOver = !bitsLeftOver;
return code;
}

}

internal const int MAX_CODES = 4096;
internal const int BYTE_SIZE = 8;
internal const int EXCESS = 4;
internal const int ALPHA = 256;
internal const int MASK = 15;

internal static int[] s;
internal static int size;
internal static Element[] h;
internal static int leftOver;
internal static bool bitsLeftOver;
internal static System.IO.BufferedStream in_Renamed;
internal static System.IO.BufferedStream out_Renamed;

private static void output(int code)
{
size = - 1;
while (code >= ALPHA)
{
s[++size] = h
.suffix;
code = h
.prefix;
}
s[++size] = code;
for (int i = size; i >= 0; i--)
out_Renamed.WriteByte((System.Byte) s[i]);
}

private static void decompress()
{
int codeUsed = ALPHA;
s = new int[MAX_CODES];
h = new Element[MAX_CODES];

int pcode = Code, ccode;
if (pcode >= 0)
{
s[0] = pcode;
out_Renamed.WriteByte((System.Byte) s[0]);
size = 0;

do
{
ccode = Code;
if (ccode < 0)
break;
if (ccode < codeUsed)
{
output(ccode);
if (codeUsed < MAX_CODES)
h[codeUsed++] = new Element(pcode, s[size]);
}
else
{
h[codeUsed++] = new Element(pcode, s[size]);
output(ccode);
}
pcode = ccode;
}
while (true);
}
out_Renamed.Close();
//UPGRADE_TODO: Method 'java.io.BufferedInputStream.close' was not converted. 'ms-help://MS.VSCC.2003/commoner/redir/redirect.htm?keyword="jlca1095"'
in_Renamed.Close();
}

[STAThread]
public static void Main(System.String[] args)
{
Files = args;
decompress();
}
}

--------------------------
--------------------------
SupportClass.cs

//
// In order to convert some functionality to Visual C#, the Java Language Conversion Assistant
// creates "support classes" that duplicate the original functionality.
//
// Support classes replicate the functionality of the original code, but in some cases they are
// substantially different architecturally. Although every effort is made to preserve the
// original architecture of the application in the converted project, the user should be aware that
// the primary goal of these support classes is to replicate functionality, and that at times
// the architecture of the resulting solution may differ somewhat.
//

using System;

///
/// Contains conversion support elements such as classes, interfaces and static methods.
///

public class SupportClass
{
///
/// Adds a new key-and-value pair into the hash table
///

/// The collection to work with
/// Key used to obtain the value
/// Value asociated with the key
/// The old element associated with the key
public static System.Object PutElement(System.Collections.IDictionary collection, System.Object key, System.Object newValue)
{
System.Object element = collection[key];
collection[key] = newValue;
return element;
}

}



Element.cs

using System;
class Element
{
internal int prefix;
internal int suffix;

public Element(int prefix, int suffix)
{
this.prefix = prefix;
this.suffix = suffix;
}
}





Did you like this resource? Share it with your friends and show your love!


Responses to "Introduction On Text Compression Using Lempel, Ziv, Welch (LZW) method"
Author: Ramon Hernandez    05 Jul 2005Member Level: Bronze   Points : 0
Hello
I have not been able to compile Decompress.cs. The code must be incomplete or with some error.
It could say to me if there is some difference between algiritmo Lempel-Ziv (LZ) and the Lempel, Ziv, Welch (LZW)
Thank you very much



Author: Suteu Cezar    19 Jan 2009Member Level: Bronze   Points : 2
Indeed Decompress.cs is flawed. There is a small mistake due to the formatting of the frames.

s[++size] = h[ code ].suffix;
code = h[ code ].prefix;

Also System.Int32 e = (System.Int32) table[k]; might not work, i have a workaround just use this block of code instead:

if (!table.ContainsKey(k))
{
output(pcode);
if (codeUsed < MAX_CODES)
SupportClass.PutElement(table, (pcode << BYTE_SIZE) + c, codeUsed++);
pcode = c;
}
else
{
System.Int32 e = (System.Int32)table[k];
pcode = e;
}

Great solid code otherwise!



Author: Raj Krishna    16 Apr 2009Member Level: Gold   Points : 0
http://rajkrishnaj.blogspot.com/


Feedbacks      

Post Comment:




  • Do not include your name, "with regards" etc in the comment. Write detailed comment, relevant to the topic.
  • No HTML formatting and links to other web sites are allowed.
  • This is a strictly moderated site. Absolutely no spam allowed.
  • Name:   Sign In to fill automatically.
    Email: (Will not be published, but required to validate comment)



    Type the numbers and letters shown on the left.


    Next Resource: Some Interview Questions.
    Previous Resource: Difference between 'ReadOnly' Field and 'Const'
    Return to Resources
    Post New Resource
    Category: .NET Framework


    Post resources and earn money!
     
    More Resources
    Popular Tags   Tag posting guidelines   Search Tags  
    (No tags found.)



    Follow us on Twitter: https://twitter.com/dotnetspider

    Active Members
    TodayLast 7 Daysmore...

    Awards & Gifts
    Email subscription
  • .NET Jobs
  • .NET Articles
  • .NET Forums
  • Articles Rss Feeds
    Forum Rss Feeds


    About Us    Contact Us    Copyright    Privacy Policy    Terms Of Use    Revenue Sharing sites   Advertise   Talk to Tony John
    Copyright © SpiderWorks Technologies Pvt Ltd., Kochi, India
    2005 - 2012 All Rights Reserved.
    .NET and other trademarks mentioned in this site belong to Microsoft and other respective trademark owners.
    Articles, tutorials and all other content offered here is for educational purpose only.
    We are not associated with Microsoft or its partners.