Skip to main contentdfsdf

Paul Hayes's List: Data Wrangling

  • Dec 09, 13

    Understanding data storage and replication. 

    • Computer storage is measured in bytes, kilobytes (KB), megabytes (MB), gigabytes (GB) and increasingly terabytes (TB). One byte is one character of information, and is comprised of eight bits (or eight digital 1's or 0's).
    • For example, an hour of DV format video footage consumes about 12GB of storage. Non-compressed video requires even more space -- for example 2GB for every minute of standard definition footage, and 9.38GB for each minute of non-compressed 1920x1080 high definition video.

    3 more annotations...

    • if you download it in standard quality, almost always it will be 700mb. People like that size because it fits on a CD. HD tends to be around 1.8 to 4.7 Gb. 1080p HD can go up to 28GB (bluray quality)
    • Other

    5 more annotations...

    • Bit 

      A bit is the smallest increment of storage on a computer. Imagine each bit is like a light bulb. Each one is either on or off, so it can have one of two values (either 0 or 1).

       Byte 

      A byte is a string of 8 bits (eight light bulbs in a row). A byte is basically the smallest unit of data that can be processed on your family computer. As such, storage measurements are always done in bytes rather than bits. The largest decimal value that can be represented by a byte is 28 (2 x 2 x 2 x 2 x 2 x 2 x2 x2) or 256. For more information on binary numbers, including how to convert them to decimal, please see the resource area below.

       Kilobyte (KB) 

      A kilobyte in binary is 1024 bytes(210), but it also used to refer to 1000 bytes (the decimal interpretation). This is where things start to get really confusing! You can see that a binary KB is slightly bigger than a decimal KB.

       Megabyte (MB) 

      A megabyte in binary is 1,048,576 (220) bytes. In decimal it’s 1,000,000 bytes (106).

       Gigabyte (GB) 

      A gigabyte is either 230 (1073741824) bytes or 109 (1 billion) bytes. By now the difference between the binary version and the decimal version is quite significant.

    • That depends. Most songs are usually under 10 MB so you could probably put up to 20 songs on
      there.
    • Avg song size is 4-7MB.

    1 more annotation...

    • The rule that many people go by is 1GB=500 songs, 2GB=1000 songs, 4GB=2000 songs...
    • The range of memory and storage within and attached to a computer system is known as the Storage Hierarchy and to help understand this further can be categorised into 4 segments.
      • Primary Storage is the top level and is made up of CPU registers, CPU cache and memory which are the only components that are directly accessible to the systems CPU. The CPU can continuously read data stored in these areas and execute all instructions as required quickly in a uniform manner. Secondary Storage differs from primary storage in that it is not directly accessible by the CPU. A system uses input/output (I/O) channels to connect to the secondary storage which control the data flow through a system when required and on request
      •  
      • Secondary storage is non-volatile so does not lose data when it is powered down so consequently modern computer systems tend to have a more secondary storage than primary storage. All secondary storage today consist of hard disk drives (HDD), usually set up in a RAID configuration, however older installations also included removable media such us magneto optical or MO
      •  
      • Tertiary Storage is mainly used as backup and archival of data and although based on the slowest devices can be classed as the most important in terms of data protection against a variety of disasters that can affect an IT infrastructure. Most devices in this segment are automated via robotics and software to reduce management costs and risk of human error and consist primarily of disk & tape based back up devices
      •  
      • Offline Storage is the final category and is where removable types of storage media sit such as tape cartridges and optical disc such as CD and DVD. Offline storage is can be used to transfer data between systems but also allow for data to be secured offsite to ensure companies always have a copy of valuable data in the event of a disaster.
  • Dec 10, 13

    Description of Data Storage hierarchy and mathematical comparisons 

    • Megabyte = 1/1152921504606846976 yottabyte
        = 1/1125899906842624 zettabyte
        = 1/1099511627776 exabyte
        = 1/1073741824 petabyte
        = 1/1048576 terabyte
        = 1/1024 gigabyte
        = 1 megabyte
        = 8 Megabits
        = 1024 kilobytes
        = 8192 Kilobits
        = 1048576 bytes
        = 2097152 nibbles
        = 8388608 bits
    • gigabyte = 1/1125899906842624 yottabyte
        = 1/1099511627776 zettabyte
        = 1/1073741824 exabyte
        = 1/1048576 petabyte
        = 1/1024 terabyte
        = 1 gigabyte
        = 1024 megabytes
        = 8192 Megabits
        = 1048576 kilobytes
        = 8388608 Kilobits
        = 1073741824 bytes
        = 2147483648 nibbles
        = 8589934592 bits

    7 more annotations...

    • An exabyte of data is created on the Internet each day, which equates to 250 million DVDs worth of information.
    • And the idea of even larger amounts of data — a zettabyte — isn’t too far off when it comes to the amount of info traversing the web in any one year.

    7 more annotations...

1 - 8 of 8
20 items/page
List Comments (0)