calculate

Your string is: sample_text

Alphabet of symbols in the string:_ a e l m p s t x
Frequencies of alphabet symbols:

  • 0.091 -> _
  • 0.091 -> a
  • 0.182 -> e
  • 0.091 -> l
  • 0.091 -> m
  • 0.091 -> p
  • 0.091 -> s
  • 0.182 -> t
  • 0.091 -> x

Shannon entropy can be calculated as follow:

H(X) = -[(0.091log20.091)+(0.091log20.091)+(0.182log20.182)+(0.091log20.091)+
(0.091log20.091)+(0.091log20.091)+(0.091log20.091)+(0.182log20.182)+(0.091log20.091)]

H(X) = -[(-0.314)+(-0.314)+(-0.447)+(-0.314)+(-0.314)+(-0.314)+(-0.314)+(-0.447)+(-0.314)]
H(X) = -[-3.0958]
H(X) = 3.0958

Ok, but what does it mean?
Shannon entropy tells you what is the minimal number of bits per symbol needed to encode the information in binary form (if log base is 2). Given above calculated Shannon entropy rounded up, each symbol has to be encoded by 4 bits and your need to use 44 bits to encode your string optimally.

Additionally, other formulas can be calculated, one of the simplest is metric entropy which
is Shannon entropy divided by string length. Metric entropy will help you to assess the randomness of your message. It can take values from 0 to 1, where 1 means equally distributed random string.Metric entropy for above example is: 0.28144

For further details see Wikipedia and Wikibooks pages about it.

Help Us Keep the Shannon entropy calculator Free!


If my service has helped you, please consider supporting me in any of the following ways below:

  1. Cite this site. Kozlowski, L. Shannon entropy calculator. www.shannonentropy.netmark.pl
  2. Link to Us. Help us spread the word. Put these link on your website.
    Shannon entropy calculator
    Source code:
    <a href="http://www.shannonentropy.netmark.pl">Shannon entropy calculator</a>
Currently this site is online thanks to income from Google Adsense – per click ads program

My other projects:
  • Protein isoelectric point calculator – isoelectric point and molecular weight from protein sequence
  • Proteome-pI – Proteome Isoelectric Point Database – predicted isoelectric point for ~21 million proteins accross 5,029 organisms
  • MetaDisorder – Prediction of Intrinsically Unstructured Proteins (protein disorder) from amino acid sequence only
  • GeneSilico fold recognition server – development and maintenance (over 100 bioinformatics tools integrated, 3000 registered users)
  • CompaRNA – continuous benchmarking of RNA structure prediction methods
  • GDFuzz3D – protein contact map to 3D structure retrieval service
  • RNA metaserver – Meta-tool for prediction of RNA secondary structure
  • gp2fasta – convert GenBank files to fasta with nice description

Date: 09:51, 22nd September 2018