The present invention overcomes deficiencies inherent in data compression in computer systems running the Microsoft Windows operating system, and accelerates disk read speeds.
1. A Windows computer system comprising a CPU, RAM, and at least one non-transitory mass storage medium wherein the CPU is used to compress the data stored on the mass storage medium to an operator-selectable degree of between 1:1 and 10:1, and the system accurately represents the amount of available free space on the mass storage medium by projecting the effects of the selected compression on further storage of data on the mass storage medium. 2. The system of 3. The system of 4. The system of 5. A method of compressing data on a non-transitory mass storage medium used in a Windows computer system comprising a CPU, RAM, and at least one non-transitory mass storage medium comprising the steps of:
a. Accepting an operator indication of a desired degree of data compression of between 1:1 and 10:1; b. Selecting a predefined compression method corresponding to the accepted indication and the version of Windows in use; c. Selecting a predefined set of files and directories stored on the mass storage medium, and designating such files and directories as uncompressible; d. Applying the selected compression method to all files and directories on the mass storage medium that were not selected in step (c); and e. Representing the remaining free space on the mass storage medium after step (d) by projecting the effects of the selected compression method on further storage of data on the mass storage medium.
This application claims priority from U.S. Provisional Application 62/243,764, filed Oct. 20, 2015. LZSDC, the system and method of the present invention, builds on top of core Microsoft Windows operating system compression services. As used herein, the term “Windows” refers to the several versions of the Microsoft Windows computer operating system, beginning with the XP version, and including all versions released since the XP version, unless otherwise specifically stated.) These core operating system compression services have severe limitations so severe that they are virtually unusable without the use of comprehensive third party tools. These inherent limitations are transcended by the present invention. Additionally, LZSDC layers multiple compression services to maximize available disk space under most usage scenarios, which is again not possible with default Microsoft implementations. LZSDC also supplies its own disk capacity and disk free space projection driver, which is completely lacking in Windows. This driver guarantees that files that would ordinarily not fit on disk without compression indeed can fit on disk with compression, which is normally not possible even despite compression because of insufficient free disk space reporting which does not take into account the benefits of transparent disk compression. Finally, LZSDC also accelerates disk read speeds, particularly on slower mechanical (spinning platter) hard disk drives, as well as faster SSD (solid state flash memory based) disk drives—including SATA 1, SATA 2, SATA 3, and NVMe connected drives. LZSDC supports all modern 32 bit and 64 bit Windows operating systems, beginning with Windows XP through Windows Server 2016. I. Windows Algorithms and Limitations Microsoft Windows provides compression utilities that implement the LZW compression techniques described in U.S. Pat. No. 4,558,302, with some additional variations. The compression techniques offered by native Microsoft Windows operating systems are LZNT1, WIMBoot, and XPRESS-LZX compression. Each compression method has drawbacks that are addressed by the present invention, and these are described below. i. The LZNT1 Algorithm and Limitations LZNT1 has been present since Windows NT 3.51. LZNT1 transparently compresses files as they are read from disk and written back to disk. LZNT1 is a per-file compressor; meaning that each file is separately compressed on-disk. Additionally, LZNT1 can be applied to folders; wherein a folder marked with LZNT1 compression will automatically enable LZNT1 compression on any new folders and files created inside it. LZNT1's most significant drawback is the weakness of its compression algorithm. Of all of Windows' compression algorithms, LZNT1 offers the least compression. LZNT1's compression is even far inferior to the compression savings afforded by ancient 16-bit Windows and MS-DOS compression solutions from two decades ago, despite exponential advances in processing power since then. LZNT1 is additionally unable to compress files above approximately 32 GB in size. While LZNT1 is able to compress files for read-only access below 32 GB size, for in-place compression during writes to files, the upper limit is approximately 16 GB. The exact GB limit depends on the NTFS volume structure, with the number of file fragments and their allocation on disk potentially decreasing these limits. The most severe limitation with LZNT1 compression, introduced since the new security model in Windows Vista, is that neither Windows' command line compact.exe tool, nor Windows' GUI explorer.exe tool is able to compress a majority of the disk. This is due to new security settings applied to a majority of files present on disk preventing file compression, despite such compression being entirely safe. An additional LZNT1 limitation is that while the NTFS file system is capable of handling simultaneous reads and writes to LZNT1 compressed files, as with the remainder of the file system, Windows' built-in tools do not expose this behavior. Despite modern SSD storage being able to handle an ever increasing amount of simultaneous read and write requests, LZNT1 remains single threaded. A further limitation in LZNT1 is that an effort to compress a file already compressed by the WIMBoot algorithm (described below) will result in a net loss of disk space. This is explained below in the WIMBoot section as the “space bleed” problem. One last notable limitation is that Windows' supplied command line or GUI tools do not have the ability to filter out certain types of files which are unsafe to be compressed (such as SQL Server databases) from compression. ii. The WIMBoot Algorithm and Limitations WIMBoot was been introduced with Windows 8.1 Update 1 and is also supported (although officially deprecated) in Windows 10. WIMBoot dramatically improves the compression strength compared to LZNT1, bringing it up to par to the compression that was afforded by ancient 16-bit Windows and MS-DOS tools. WIMBoot is also more similar to ancient era (pre-Windows XP) compression tools in that it creates a single WIM file, an image of the operating system; which is then “mounted” to provide access to all of the files and folders on the system. In other words, the compression is not on a per-file or per-folder basis, but on a per-image basis. All of the compressed files and folders are stored inside this WIM file. Unlike ancient era compression tools, and unlike LZNT1, a file that is updated inside a WIMBoot image is not automatically recompressed. Instead, it is extracted in uncompressed form back to disk, and then stored as a regular uncompressed file. Similarly, a file that is deleted on disk does not recover any disk space at all, because the deleted file still takes up space in the WIM image file which is not recovered. This introduces us to the most severe of WIMBoot's limitations, its space bleed problem. A WIMBoot volume requires regular maintenance, or it will “bleed” space very frequently as files are updated and deleted on volume. Soon becoming a cure worse than the disease, WIMBoot's compression gains are rapidly offset by uncompressed copies of files on disk, and the additional space wasted in their old, compressed copies still present in the WIM file. In fact, this space bleed problem is so severe, that decades-old LZNT1 can even outperform WIMBoot space savings when properly applied to all files and folders on a system: During a test on a 16 GB eMMC Windows tablet running Windows 8.1.1, after installing updates and the free Office 365, the empty space on the tablet is only 2 GB. In contrast, fully uncompressing this original disk and fully recompressing it with LZNT1 compression produces 4 GB free space (when the LZNT1 compression has been applied to all files and folders present on the tablet, and not just a very limited fraction as is normally available with Windows). There are many additional significant WIMBoot limitations: WIMBoot cannot compress files larger than 4 GB, even if LZNT1 could compress them; however these files will still take up space in the WIM image. WIMBoot is not supported by Microsoft on non-SSD, 32-bit, and BIOS systems. WIMBoot cannot be used as part of a live Windows operating system. In fact, WIMBoot can only be used as part of a first-time Windows installation; substantially exacerbating its space bleed problem. An existing WIMBoot system cannot be uncompressed. An existing WIMBoot system cannot be recompressed. There is only one level of compression available with WIMBoot. A 3 GB Windows ADK download is necessary to obtain the necessary Microsoft tools to prepare a WIMBoot image and apply it to a PC. Additionally, an external USB boot disk must be created using Windows PE. If the target PC has any custom disk access drivers such as RAID or SATA controllers, these must also be manually injected into both Windows PE and the target WIMBoot system. It is also not possible to exclude any folders from processing, to keep them out from the WIM image, for any reason. WIMBoot is not supported on Windows 7. iii. The XPRESS-LZX Algorithms and Limitations Windows 10 introduces several new algorithms XPRESS4 KB, XPRESS8 KB, XPRESS16 KB, and LZX (in order of increasing compression), to compress files with space savings almost as efficiently as WIMBoot. Similar to LZNT1, and unlike WIMBoot, these algorithms are applied on a per-file basis. Unlike LZNT1, files that are modified are immediately decompressed, similar to WIMBoot. However, this does not cause a very significant space bleed problem, because there is no storage waste induced from such files inside a WIM image file or similar even though the benefit of compression is still lost. Unlike LZNT1, compression cannot be applied to folders. Compression is on a per-file basis only. Even if a folder is completely Windows 10 compressed, any new files and folders created inside of it will need to be manually recompressed. Similar to LZNT1, any compression through the compact.exe command line tool (no GUI tooling is provided by Microsoft) is single-threaded, which is unreasonably inefficient on SSD or other fast storage media. It is not possible to downgrade the compression grade applied to a file without explicitly decompressing that file. Similar to LZNT1, an effort to compress a WIMBoot compressed file will again result in the WIMBoot space bleed problem. Similar to LZNT1, file types which are not safe to compress cannot be excluded from processing. Unlike LZNT1, because files that are modified during a disk access process are written back to disk in uncompressed form, disk space is lost. There is no method provided with the operating system to automatically recompress such files in the background, while the PC is idle, to regain disk space. iv. The Data Deduplication Algorithm and Limitations Windows Server 2012 introduces data deduplication for data disks. Data deduplication is applied on a per-disk basis. Data deduplication requires periodic cleanup runs, because data that is deleted, if included in a past compression operation, is not automatically recovered similar to how WIMBoot works. If a data deduplicated disk has less than 1 GB free space, no data deduplication operation succeeds on disk. Data deduplication is only supported for data disks. It is not supported for operating system boot disks. A GUI interface is not available to handle data deduplication operations. The present invention is a collection of methods for compressing information stored on a mass storage medium in a computer system comprising at least a CPU, RAM, and at least one non-transitory mass storage medium, and running Windows. There are four variants of the invention, denominated as LZS80, LZS85, LZS90, and LZS100. The present invention overcomes the limitations of the compression offered in Windows, and permits larger amounts of data to be stored on mass storage media in a manner that is faster, more convenient, and more efficient than is able to be achieved in the prior art. II. LZSDC: The Present Invention The compression techniques offered by LZSDC are LZS80, LZS85, LZS90, and LZS100. i. The LZS80 Algorithm and how it Extends LZNT1 LZS80 runs on Windows XP through Windows Server 2016. LZS80 first reads the number of CPU cores present on a system and whether the underlying disk is an SSD (solid state disk). It then uses up to double the number of CPU cores from as many number of threads to perform as many parallel LZNT1 grade compression operations simultaneously as the underlying SSD can handle. LZS80 does not compress WIMBoot compressed files. Re-compressing WIMBoot compressed files first extracts them from the WIM image while not recovering space consumed by the extracted files, also compressing substantially worse than WIMBoot compression wasting double the space on any given file. LZS80 excludes file patterns from compression, guaranteeing the integrity of, by default, Microsoft SQL Server database files; as well as any other operator customizable file extension. LZS80 safely processes files that Windows cannot by using the following approach:
LZS80 compresses files up to 32 GB in size. LZS80 compression is fully reversible, provided sufficient free disk space is available on the target disk to contain all uncompressed data. LZS80 may optionally skip compressing files marked without an archive attribute. LZS80 may optionally clear the archive attribute on files which do not create any free space when compressed. This process shortens the overall disk compression processing time by skipping files that are pre-compressed outside of LZS80. For similar performance acceleration reasons, LZS80 may optionally skip recompressing files when they are already in a compressed state. In order to ensure that the target operating system remains fully operational after compression, including the processing of critical boot time tasks which may not support LZS80 compression at startup time, as well as the ability to boot into Windows' Recovery Environment successfully after compression, and also including the ability to retain the processing of non-critical tasks as such as search indexing (the database files of which must remain uncompressed to ensure data integrity), LZS85 does not compress the following files and folders on BIOS based Windows installations:
ii. The LZS85 Algorithm and how it Extends XPRESS-LZX LZS85 runs on Windows 7, as well as Windows 10 through Windows Server 2016. LZS85 implements compression equivalent to that of LZNT1, XPRESS4 KB, XPRESS8K, XPRESS16 KB and LZX in space savings. Like LZS80, it uses multiple CPU cores during the compression process when it detects that the mass storage medium is an SSD. For LZNT1-like compression, threads numbering in the double the numbers of available CPU cores are used. For an XPRESS-like compression, all CPU cores are used uniformly by each thread. For LZX-like compression, the XPRESS factor is further multiplied by two thirds to determine the optimal core usage count. LZS85 ensures that new files created in folders or new subfolders are automatically compressed using its lowest strength algorithm. LZS85 supports five compression grades and can move in any direction among those grades, and is thus capable of increasing and decreasing the compression grade of any particular file. LZS85 excludes file patterns from compression, guaranteeing the integrity of, by default, Microsoft SQL Server database files; as well as any other operator customizable file extension. LZS85 compression is fully reversible, provided sufficient free disk space is available on the target disk to contain all uncompressed data. LZS85 safely processes files that Windows cannot by using the following approach:
LZS85 does not compress WIMBoot compressed files. Re-compressing WIMBoot compressed files first extracts them from the WIM image while not recovering space consumed by the extracted files, also compressing substantially worse than WIMBoot compression wasting double the space on any given file. LZS85 may optionally skip compressing files marked without an archive attribute. LZS85 may optionally clear the archive attribute on incompressible files. This process ensures that while processing a disk, files that could not be previously compressed (due to lack of redundancy) would not be attempted for re-compression; shortening the overall disk compression processing time. For similar performance acceleration reasons, LZS85 may optionally skip recompressing files when they are already in a compressed state, and their existing compression grade is identical to (not greater than or less than) the newly requested compression grade. LZS85 offers a background agent which detect modifications made to files by the system, applications, and users. While the lowest of the five compression grades supported by LZS85 automatically re-compresses files as they are modified on disk, the higher four compression grades write files back to disk in uncompressed form any time a file modification has been made. The LZS85 background agent detects when the PC is idle, as well as detecting which files within the category of the four higher compression grades have been modified, and thus reverted to an uncompressed state. The LZS85 background agent then recompresses these files when the PC is idle, to minimize interruption to the operator's ongoing work on the PC. It is also possible, at the operator's choosing, to schedule automatic recompression to happen immediately upon file modifications (instant recompression upon decompression), as well as scheduling automatic recompression to happen not when the PC is idle, but only when a pre-set number of days have elapsed since the files have last been modified. Windows 7 support for LZS85 is enabled automatically after a Windows 7 operating system has been first compressed using LZS90 compression. In order to ensure that the target operating system remains fully operational after compression, including the processing of critical boot time tasks which may not support LZS85 compression at startup time, as well as the ability to boot into Windows' Recovery Environment successfully after compression, and also including the ability to retain the processing of non-critical tasks as such as search indexing (the database files of which must remain uncompressed to ensure data integrity), LZS85 does not compress the following files and folders on BIOS based Windows installations:
iii. The LZS90 Algorithm and how it Extends WIMBoot LZS90 runs on a live Windows operating system. This includes, at the present time, Versions Windows 7, Windows 8.1 with Update 1, 10, Windows Server 2012 R2 with Update 1, and Windows Server 2016. LZS90 does not depend on a 3 GB Windows ADK download. It does not require the creation of a Windows PE bootable USB disk. Instead, LZS90 recycles the Windows RE instance already pre-installed on every Windows computer. This Windows RE instance has the following advantages over a manually created Windows PE boot disk:
LZS90 first creates a copy of the default Windows RE instance. Then, it injects its own program files and libraries into this Windows RE instance. In cases where the Windows RE instance does not already contain system drivers necessary to recognize WIMBoot mounted files (such as a Windows 7 installation, or a Windows 8.1 installation which was updated to Windows 8.1 with Update 1 by an operator installing Windows Updates, which causes the Windows RE environment to remain non-updated), LZS90 also injects the necessary drivers and registry keys for the Windows RE instance to recognize WIMBoot mounted files. The driver files that are injected are as follows:
The following files which are injected enable LZS90 to verify disk integrity before processing (but are not directly related to the WIMBoot driver files otherwise). These injections are skipped on Windows 10, because they are both unnecessary and cause corruption of the processing environment being created under Windows 10:
On Windows 7 computers, the following additional file is injected:
LZS90 then injects itself into the PC boot menu, taking full control over the host Windows operating system without requiring to be manually booted from an external thumb drive or other dual booted operating system. To guarantee the most successful boot menu injection possible, LZS90 executes the following pattern:
Once booted from this custom boot menu entry, LZS90 has the same flexibility that would be afforded by booting from a Windows PE instance created using a 3 GB ADK download, without requiring either the ADK download or an external bootable USB drive, or any operator interaction to enable external USB booting (such as the manual configuration of PC BIOS/UEFI boot settings). From the custom modified Windows RE instance created as described above, LZS90 can process the entire Windows operating system disk offline; having full and unrestricted access to all critical and non-critical system files as well as all other application files and data. During compression, LZS90 starts building a WIM archive, at the default Windows compression ratio, or using four higher levels of compression that are not provided by Microsoft in the standard Windows installation, while still remaining compatible with the Microsoft WIMBoot driver. If an external disk storage referred to as the “Undo Disk” is used, the WIM archive is created on this “Undo Disk”, while leaving original applications and data intact on the target disk until the WIM archive has been successfully created. This helps the operation to have no fatal results for the target PC if it is interrupted due to any reason such as power loss. Then, after WIM archive creation, the target disk is cleaned up (with the exception of any excluded folders), and the WIM archive copied back to the target disk from the “Undo Disk”. Even if power loss or another fatal error occurs at this point, the “Undo Disk” contains a full copy of the PC data inside the already-built WIM archive. After the WIM archive has been successfully transferred to the target PC from the “Undo Disk”, the WIM archive is then applied to the target system in WIMBoot mode, completing processing. If external disk storage (an “Undo Disk”) is unavailable, then LZS90 starts building the WIM archive on the target disk directly, deleting each file that is added to the WIM archive from disk as soon as the compression is complete. Since LZS90 does not compress files which would take larger space than their uncompressed counterparts, this process ensures that even a full disk can be successfully processed without requiring any external storage as each compressed file effectively frees up disk space due to compression savings. The drawback to this mode of processing is that if power loss or other interruption occurs, the Windows operating system may be rendered in an unbootable state, and further data loss may occur based on the stage of the construction of the WIM archive. Where a disk has been compressed before and external storage (an “Undo Disk”) is unavailable, the same method can be used successfully to recompress a disk. However, in this scenario, files that are deleted from the host disk, or have been updated on the host disk (but still remain in the WIM archive per the “space bleed” problem in both of the cases of deletions and updates), are explicitly removed from the WIM archive being processed, before any new data is added. This explicit removal process is necessary to recover space in the WIM archive from these unneeded or stale copies of data. Where an “Undo Disk” is being used, this step is superfluous and unnecessary, as a brand-new WIM archive is built. Where a previously compressed disk is to be decompressed, an “Undo Disk” is mandatory, as is sufficient free disk space on the target disk to contain all data uncompressed. This may require the operator to free up sufficient space before decompression to contain all data uncompressed. The process is repeated as with the other “Undo Disk” scenarios above; but the rebuilt WIM file is not copied to the target disk. Instead, it is extracted fully onto the target disk as in extracting any other regular archive; completing the decompression process. If the operator does not desire to decompress the latest copy of the disk, a previously created “Undo Disk” may be used, skipping the creation of a new WIM archive; thus accelerating processing times by preventing the recompression of the latest on-disk data. It is also possible to enact a “System Refresh”, effectively taking the target disk back to the exact state it was in when it was last compressed. This process skips an “Undo Disk” completely and uses the already-existing WIM archive on the target system: It first deletes all files on the target disk, and then simply re-applies the WIM archive to the target disk; having effectively refreshed the target disk to the exact state it was in when the WIM archive was last created. Additional processing modes are also afforded for “Backup” and “Restore” operations. These operations are very similar to the above operations; where data is first copied onto a mandatory “Undo Disk” as part of a backup on a donor PC, and then restored from that “Undo Disk” on a recipient PC; having effectively enabled the cloning of the hard disk contents of the donor PC to the recipient PC. During the clone operation on the recipient PC, it is possible to apply data either in compressed (WIM archive with WIMBoot) or uncompressed (extracted WIM archive) modes. After the desired processing is complete, the temporary boot menu entry created by LZS90 is removed, the PC restarted, and returned to normal operation with substantially increased free disk space. LZS90 falls back to LZS85 when compressing files larger than 4 GB. This process ensures the entire disk is compressed successfully, regardless of the individual file sizes of files contained on disk. Of the five grades of compression provided by LZS90, the first is analogous to the standard WIMBoot compression ratio, and the remaining four grades perform substantially better. Additional operator fine tuning of the compression ratio is available on a 100 point scale. LZS90 automatically detects OEM-preinstalled WIMBoot partitions, and deletes them after safely relocating the data; liberating a substantial portion of storage back into the Windows partition directly (guaranteed to be at least 5 GB). In order to safely process this deletion, LZS90 undertakes the following actions:
LZS90 compression is reversible at any time. LZS90 supports BIOS, UEFI, 32 bit, 64 bit, and both solid state and rotating platter based disks. LZS90 may be used on any boot partition. LZS90 may exclude any number of folders from processing. LZS90 excludes the following well-known frequently updated folders from processing by default:
LZS90 instead relegates the compression of these folders to the LZS80 compression provider because these locations are unsafe to compress (either preventing normal Windows booting and/or recovery Windows booting operations, due to compression being unsupported at the preliminary startup/Windows recovery stages), or counter-productive to compress (containing frequently updated files, which would only aggravate the WIMBoot space problem should they be processed by the LZS90 algorithm directly). On Windows 7 operating systems which do not have native WIMBoot support, LZS90 includes an additional list of file and folder exclusions. When these operating systems are compressed without these exclusions, they will be rendered inoperable and fail to boot:
iv. The LZS100 Algorithm and how it Extends Data Deduplication LZS100 runs on Windows Server 2012, Windows Server 2012 R2, and Windows Server 2016. LZS100 offers a stand-alone GUI interface to visually manage data deduplication operations. LZS100 is able to monitor a data deduplicated disk in the background for remaining free disk space. Any time remaining free disk space approaches 1 GB, LZS100 automatically starts a disk cleanup operation and/or a disk recompression operation to ensure free disk space remains above 1 GB. The threshold of disk recompression and or disk cleanup is user customizable. LZS100 is able to compress boot disks using the following approach:
III. Disk Capacity and Available Space Projection Each of the LZSXX algorithms above support disk capacity projection. The code of the disk capacity projection method is listed herein: These drivers are exposed in a Windows Service which initiates increased total and available disk capacity projection when started and stops it when the service is stopped. For example, a compression ratio of 2.4:1 is possible with Windows 10 64-bit with a clean installation. With increased disk capacity projection, a 60 GB SSD grows to a 144 GB SSD. The ratio presented is operator customizable in any amount from 1.0:1 (the default uncompressed state of disk) to double the actual compression ratio present on disk. This facilitates the storage of extra data on disk which would otherwise be blocked from storage despite availability of sufficient compressed space. IV. Disk Acceleration Judicious application of the proper compression algorithms results in net disk read performance gains, even on very fast solid state (flash memory) based disks, as well as traditional mechanical (spinning platter) hard disk drives. The listing below describes the proper application of compression algorithms which accelerates disk read speeds by type of the target disk being compressed:
Additionally, the automatic folder recompression option in the LZS compression algorithms, as well as the LZS90 background compression agent, may be disabled to ensure that disk write speeds do not suffer a performance penalty. With these measures applied, disk read performance acceleration is observed with the operating system and applications. LZS80, layered use of LZS80 with LZS85, and LZS100 are not applicable for disk read speed acceleration on any medium as they effectively degrade this read performance. These algorithms are therefore intended for use in disk capacity increase scenarios only. While the invention has been described in its preferred embodiments, it is to be understood that the words which have been used are words of description rather than of limitation and that changes may be made within the purview of the appended claims without departing from the true scope and spirit of the invention in its broader aspects. Rather, various modifications may be made in the details within the scope and range of equivalents of the claims and without departing from the spirit of the invention. The inventor further requires that the scope accorded the claims be in accordance with the broadest possible construction available under the law as it exists on the date of filing hereof (and of the application from which this application obtains priority, if any) and that no narrowing of the scope of the appended claims be allowed due to subsequent changes in the law, as such a narrowing would constitute an ex post facto adjudication, and a taking without due process or just compensation.SUMMARY OF THE INVENTION
BACKGROUND OF THE INVENTION
SUMMARY OF THE INVENTION
DETAILED DESCRIPTION OF THE INVENTION