How to Find and Remove Duplicate Files on Linux
Abstract: [-] labor/package/pack9 We can check the result as below. # ls -lR labor/[-] labor/package/pack8
Hi all, today we're gonna learn how to find and remove duplicate files on you Linux PC or Server. So, here's tools that you may use anyone of them according to your needs and comfort.
Whether you’re using Linux on your desktop or a server, there are good tools that will scan your system for duplicate files and help you remove them to free up space. Solid graphical and command-line interfaces are both available. Duplicate files are an unnecessary waste of disk space. After all, if you really need the same file in two different locations you could always set up a symbolic link or hard link, storing the data in only one location on disk.
1) FSlintFSlint is available in various Linux distributions binary repository, including Ubuntu, Debian, Fedora, and Red Hat. Just fire up your package manager and install the 「fslint」 package. This utility provides a convenient graphical interface by default and it also includes command-line versions of its various functions.
Find Out & Remove Your Duplicate Fi...To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video
Find Out & Remove Your Duplicate Files to Free Up Storage SpaceDon’t let that scare you away from using FSlint’s convenient graphical interface, though. By default, it opens with the Duplicates pane selected and your home directory as the default search path.
InstallationTo install fslint, as I am running ubuntu, here is the default command:
# apt-get install fslint
But here are installation commands for other linux distributions:
Debian:
# svn checkout http://fslint.googlecode.com/svn/trunk/ fslint-2.45
# cd fslint-2.45
# dpkg-buildpackage -I.svn -rfakeroot -tc
# dpkg -i ../fslint_2.45-1_all.deb
Fedora:
# yum install fslint
For OpenSuse:
# [ -f /etc/mandrake-release ] && pkg=rpm
# [ -f /etc/SuSE-release ] && pkg=packages
# wget http://www.pixelbeat.org/fslint/fslint-2.42.tar.gz
# rpmbuild -ta fslint-2.42.tar.gz
# rpm -Uvh /usr/src/$pkg/RPMS/noarch/fslint-2.42-1.*.noarch.rpm
For Other Distro:
# wget http://www.pixelbeat.org/fslint/fslint-2.44.tar.gz
# tar -xzf fslint-2.44.tar.gz
# cd fslint-2.44
# (cd po && make)
# ./fslint-gui
Run fslint
To run fslint in GUI version run fslint-gui in Ubuntu, run command (Alt+F2) or terminal:
$ fslint-gui
By default, it opens with the Duplicates pane selected and your home directory as the default search path. All you have to do is click the Find button and FSlint will find a list of duplicate files in directories under your home folder.
Use the buttons to delete any files you want to remove, and double-click them to preview them.
Finally, you are done. Hurray, we have successfully removed duplicate files from your system.
Note that the command-line utilities aren’t in your path by default, so you can’t run them like typical commands. On Ubuntu, you’ll find them under /usr/share/fslint/fslint. So, if you wanted to run the entire fslint scan on a single directory, here are the commands you’d run on Ubuntu:
cd /usr/share/fslint/fslint
./fslint /path/to/directory
This command won’t actually delete anything. It will just print a list of duplicate files — you’re on your own for the rest.
$ /usr/share/fslint/fslint/findup --help
find dUPlicate files.
Usage: findup [[[-t [-m|-d]] | [--summary]] [-r] [-f] paths(s) ...]
If no path(s) specified then the current directory is assumed.
When -m is specified any found duplicates will be merged (using hardlinks).
When -d is specified any found duplicates will be deleted (leaving just 1).
When -t is specified, only report what -m or -d would do.
When --summary is specified change output format to include file sizes.
You can also pipe this summary format to /usr/share/fslint/fslint/fstool/dupwaste
to get a total of the wastage due to duplicates.
2) Fdupes
FDUPES is a program for identifying or deleting duplicate files residing within specified directories written by Adrian Lopez. You can look the GitHub project.
Install fdupesTo install fdupes, do as below:
On Centos 7:
# yum install fdupes
Loaded plugins: fastestmirror
base | 3.6 kB 00:00:00
epel/x86_64/metalink | 12 kB 00:00:00
epel | 4.3 kB 00:00:00
extras | 3.4 kB 00:00:00
updates | 3.4 kB 00:00:00
(1/2): epel/x86_64/updateinfo | 817 kB 00:00:00
(2/2): epel/x86_64/primary_db | 4.8 MB 00:00:00
Loading mirror speeds from cached hostfile
* base: mirrors.linode.com
* epel: fedora-epel.mirrors.tds.net
* extras: mirrors.linode.com
* updates: mirrors.linode.com
Resolving Dependencies
--> Running transaction check
---> Package fdupes.x86_64 1:1.6.1-1.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
======================================================================================================================================================
Package Arch Version Repository Size
======================================================================================================================================================
Installing:
fdupes x86_64 1:1.6.1-1.el7 epel 28 k
On Ubuntu 16.04:
# apt install fdupes
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libdvdnav4 libdvdread4 libenca0 libguess1 librubberband2v5 libsdl2-2.0-0 libsndio6.1 libva-wayland1 libva-x11-1 mpv rtmpdump
Use 'sudo apt autoremove' to remove them.
Search for duplicates files
fdupes command searches duplicates in a folder indicated. The syntax is as below
fdupes [ options ] DIRECTORY
Let us create some duplicates files. We will create a folder and 10 files with the same content
# mkdir labor && for i in {1..10}; do echo "Hello, let us try fdupes command" > labor/drago${i} ; done
# mkdir labor/package && for i in {1..10}; do echo "Hello, let us try fdupes recursively" > labor/package/pack${i} ; done
Let's check the result
# ls -lR labor/
labor/:
total 44
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago10
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago1
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago2
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago3
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago4
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago5
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago6
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago7
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago8
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago9
drwxr-xr-x 2 root root 4096 Sep 9 23:51 package
labor/package:
total 40
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack10
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack1
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack2
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack3
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack4
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack5
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack6
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack7
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack8
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack9
We see that all our file exist. Now we can search for duplicate files as below
# fdupes labor/
labor/drago8
labor/drago2
labor/drago7
labor/drago5
labor/drago1
labor/drago3
labor/drago4
labor/drago10
labor/drago6
labor/drago9
You can see that we have all the 10 duplicates files listed above.
Search duplicate file recursively and display the sizeYou have seen that the result above doesn' show the duplicated files created earlier in labor/package directory. To search duplicated files into a directory and its sub-directories, we use the option -r
and you can see the size of each duplicates files with -S
parameter as below
# fdupes -rS labor
33 bytes each:
labor/drago8
labor/drago2
labor/drago7
labor/drago5
labor/drago1
labor/drago3
labor/drago4
labor/drago10
labor/drago6
labor/drago9
37 bytes each:
labor/package/pack10
labor/package/pack6
labor/package/pack4
labor/package/pack7
labor/package/pack1
labor/package/pack3
labor/package/pack5
labor/package/pack2
labor/package/pack8
labor/package/pack9
With the result you can understand that the duplicates files have the same size so the same content.
It is possible to omit the first file when researching for duplicated files
Delete the duplicated filesTo delete duplicated files, we use the -d
parameter. fdupes will ask which files to preserve
# fdupes -rd labor/
[1] labor/drago8
[2] labor/drago2
[3] labor/drago7
[4] labor/drago5
[5] labor/drago1
[6] labor/drago3
[7] labor/drago4
[8] labor/drago10
[9] labor/drago6
[10] labor/drago9
Set 1 of 2, preserve files [1 - 10, all]: 1
[+] labor/drago8
[-] labor/drago2
[-] labor/drago7
[-] labor/drago5
[-] labor/drago1
[-] labor/drago3
[-] labor/drago4
[-] labor/drago10
[-] labor/drago6
[-] labor/drago9
[1] labor/package/pack10
[2] labor/package/pack6
[3] labor/package/pack4
[4] labor/package/pack7
[5] labor/package/pack1
[6] labor/package/pack3
[7] labor/package/pack5
[8] labor/package/pack2
[9] labor/package/pack8
[10] labor/package/pack9
Set 2 of 2, preserve files [1 - 10, all]: 8
[-] labor/package/pack10
[-] labor/package/pack6
[-] labor/package/pack4
[-] labor/package/pack7
[-] labor/package/pack1
[-] labor/package/pack3
[-] labor/package/pack5
[+] labor/package/pack2
[-] labor/package/pack8
[-] labor/package/pack9
We can check the result as below.
# ls -lR labor/
labor/:
total 8
-rw-r--r-- 1 root root 33 Sep 9 23:51 drago8
drwxr-xr-x 2 root root 4096 Sep 10 00:07 package
labor/package:
total 4
-rw-r--r-- 1 root root 37 Sep 9 23:51 pack2
You can see that we have preserved drago8 and pack2 files
ConclusionWe have seen how to delete duplicated files on Linux both graphically and command line. You can use one the tools depending on your needs. It is important to check the duplicated file in order to save space on your server.