How to Find and Remove Duplicate Files on Linux

Arun Pariyar 2015-01-09 Channel: Linux

Abstract: [-] labor/package/pack9 We can check the result as below. # ls -lR labor/[-] labor/package/pack8

Linux Commands Linux . How-to-Find-and-Remove-Duplicate-Unwanted-Files-in-Linux-Using-FSlint-Tool

Linux Commands Linux . 4-Useful-Tools-to-Find-and-Delete-Duplicate-Files-in-Linux

Linux Commands Linux . fdupes---A-Command-Line-Tool-to-Find-and-Delete-Duplicate-Files-in-Linux

Hi all, today we're gonna learn how to find and remove duplicate files on you Linux PC or Server. So, here's tools that you may use anyone of them according to your needs and comfort.

Whether you’re using Linux on your desktop or a server, there are good tools that will scan your system for duplicate files and help you remove them to free up space. Solid graphical and command-line interfaces are both available. Duplicate files are an unnecessary waste of disk space. After all, if you really need the same file in two different locations you could always set up a symbolic link or hard link, storing the data in only one location on disk.

1) FSlint

FSlint is available in various Linux distributions binary repository, including Ubuntu, Debian, Fedora, and Red Hat. Just fire up your package manager and install the 「fslint」 package. This utility provides a convenient graphical interface by default and it also includes command-line versions of its various functions.

Find Out & Remove Your Duplicate Fi...

To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video

Find Out & Remove Your Duplicate Files to Free Up Storage Space

Don’t let that scare you away from using FSlint’s convenient graphical interface, though. By default, it opens with the Duplicates pane selected and your home directory as the default search path.

Installation

To install fslint, as I am running ubuntu, here is the default command:

# apt-get install fslint

But here are installation commands for other linux distributions:

Debian:

# svn checkout http://fslint.googlecode.com/svn/trunk/ fslint-2.45
# cd fslint-2.45
# dpkg-buildpackage -I.svn -rfakeroot -tc
# dpkg -i ../fslint_2.45-1_all.deb

Fedora:

# yum install fslint

For OpenSuse:

# [ -f /etc/mandrake-release ] && pkg=rpm
# [ -f /etc/SuSE-release ] && pkg=packages
# wget http://www.pixelbeat.org/fslint/fslint-2.42.tar.gz
# rpmbuild -ta fslint-2.42.tar.gz
# rpm -Uvh /usr/src/$pkg/RPMS/noarch/fslint-2.42-1.*.noarch.rpm

For Other Distro:

# wget http://www.pixelbeat.org/fslint/fslint-2.44.tar.gz
# tar -xzf fslint-2.44.tar.gz
# cd fslint-2.44
# (cd po && make)
# ./fslint-gui

Run fslint

To run fslint in GUI version run fslint-gui in Ubuntu, run command (Alt+F2) or terminal:

$ fslint-gui

By default, it opens with the Duplicates pane selected and your home directory as the default search path. All you have to do is click the Find button and FSlint will find a list of duplicate files in directories under your home folder.

Use the buttons to delete any files you want to remove, and double-click them to preview them.

Finally, you are done. Hurray, we have successfully removed duplicate files from your system.

Note that the command-line utilities aren’t in your path by default, so you can’t run them like typical commands. On Ubuntu, you’ll find them under /usr/share/fslint/fslint. So, if you wanted to run the entire fslint scan on a single directory, here are the commands you’d run on Ubuntu:

cd /usr/share/fslint/fslint
./fslint /path/to/directory

This command won’t actually delete anything. It will just print a list of duplicate files — you’re on your own for the rest.

$ /usr/share/fslint/fslint/findup --help
 find dUPlicate files.
 Usage: findup [[[-t [-m|-d]] | [--summary]] [-r] [-f] paths(s) ...]

 If no path(s) specified then the current directory is assumed.
  
 When -m is specified any found duplicates will be merged (using hardlinks).
 When -d is specified any found duplicates will be deleted (leaving just 1).
 When -t is specified, only report what -m or -d would do.
 When --summary is specified change output format to include file sizes.
 You can also pipe this summary format to /usr/share/fslint/fslint/fstool/dupwaste
 to get a total of the wastage due to duplicates.

2) Fdupes

FDUPES is a program for identifying or deleting duplicate files residing within specified directories written by Adrian Lopez. You can look the GitHub project.

Install fdupes

To install fdupes, do as below:

On Centos 7:

# yum install fdupes
Loaded plugins: fastestmirror
base                                                                                                                           | 3.6 kB  00:00:00     
epel/x86_64/metalink                                                                                                           |  12 kB  00:00:00     
epel                                                                                                                           | 4.3 kB  00:00:00     
extras                                                                                                                         | 3.4 kB  00:00:00     
updates                                                                                                                        | 3.4 kB  00:00:00     
(1/2): epel/x86_64/updateinfo                                                                                                  | 817 kB  00:00:00     
(2/2): epel/x86_64/primary_db                                                                                                  | 4.8 MB  00:00:00     
Loading mirror speeds from cached hostfile
 * base: mirrors.linode.com
 * epel: fedora-epel.mirrors.tds.net
 * extras: mirrors.linode.com
 * updates: mirrors.linode.com
Resolving Dependencies
--> Running transaction check
---> Package fdupes.x86_64 1:1.6.1-1.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

======================================================================================================================================================
 Package                           Arch                              Version                                    Repository                       Size
======================================================================================================================================================
Installing:
 fdupes                            x86_64                            1:1.6.1-1.el7                              epel                             28 k

On Ubuntu 16.04:

# apt install fdupes
Reading package lists... Done
Building dependency tree 
Reading state information... Done
The following packages were automatically installed and are no longer required:
 libdvdnav4 libdvdread4 libenca0 libguess1 librubberband2v5 libsdl2-2.0-0 libsndio6.1 libva-wayland1 libva-x11-1 mpv rtmpdump
Use 'sudo apt autoremove' to remove them.

Search for duplicates files

fdupes command searches duplicates in a folder indicated. The syntax is as below

   fdupes [ options ] DIRECTORY

Let us create some duplicates files. We will create a folder and 10 files with the same content

# mkdir labor && for i in {1..10}; do echo "Hello, let us try fdupes command" > labor/drago${i} ; done

# mkdir labor/package && for i in {1..10}; do echo "Hello, let us try fdupes recursively" > labor/package/pack${i} ; done

Let's check the result

# ls -lR labor/
labor/:
total 44
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago10
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago1
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago2
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago3
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago4
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago5
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago6
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago7
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago8
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago9
drwxr-xr-x 2 root root 4096 Sep  9 23:51 package

labor/package:
total 40
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack10
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack1
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack2
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack3
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack4
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack5
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack6
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack7
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack8
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack9

We see that all our file exist. Now we can search for duplicate files as below

# fdupes labor/
labor/drago8
labor/drago2
labor/drago7
labor/drago5
labor/drago1
labor/drago3
labor/drago4
labor/drago10
labor/drago6
labor/drago9

You can see that we have all the 10 duplicates files listed above.

Search duplicate file recursively and display the size

You have seen that the result above doesn' show the duplicated files created earlier in labor/package directory. To search duplicated files into a directory and its sub-directories, we use the option -r and you can see the size of each duplicates files with -S parameter as below

# fdupes -rS labor
33 bytes each:                          
labor/drago8
labor/drago2
labor/drago7
labor/drago5
labor/drago1
labor/drago3
labor/drago4
labor/drago10
labor/drago6
labor/drago9

37 bytes each:
labor/package/pack10
labor/package/pack6
labor/package/pack4
labor/package/pack7
labor/package/pack1
labor/package/pack3
labor/package/pack5
labor/package/pack2
labor/package/pack8
labor/package/pack9

With the result you can understand that the duplicates files have the same size so the same content.

It is possible to omit the first file when researching for duplicated files

Delete the duplicated files

To delete duplicated files, we use the -d parameter. fdupes will ask which files to preserve

# fdupes -rd labor/
[1] labor/drago8
[2] labor/drago2
[3] labor/drago7
[4] labor/drago5
[5] labor/drago1
[6] labor/drago3
[7] labor/drago4
[8] labor/drago10
[9] labor/drago6
[10] labor/drago9

Set 1 of 2, preserve files [1 - 10, all]: 1

   [+] labor/drago8
   [-] labor/drago2
   [-] labor/drago7
   [-] labor/drago5
   [-] labor/drago1
   [-] labor/drago3
   [-] labor/drago4
   [-] labor/drago10
   [-] labor/drago6
   [-] labor/drago9

[1] labor/package/pack10
[2] labor/package/pack6
[3] labor/package/pack4
[4] labor/package/pack7
[5] labor/package/pack1
[6] labor/package/pack3
[7] labor/package/pack5
[8] labor/package/pack2
[9] labor/package/pack8
[10] labor/package/pack9

Set 2 of 2, preserve files [1 - 10, all]: 8

   [-] labor/package/pack10
   [-] labor/package/pack6
   [-] labor/package/pack4
   [-] labor/package/pack7
   [-] labor/package/pack1
   [-] labor/package/pack3
   [-] labor/package/pack5
   [+] labor/package/pack2
   [-] labor/package/pack8
   [-] labor/package/pack9

We can check the result as below.

# ls -lR labor/
labor/:
total 8
-rw-r--r-- 1 root root   33 Sep  9 23:51 drago8
drwxr-xr-x 2 root root 4096 Sep 10 00:07 package

labor/package:
total 4
-rw-r--r-- 1 root root 37 Sep  9 23:51 pack2

You can see that we have preserved drago8 and pack2 files

Conclusion

We have seen how to delete duplicated files on Linux both graphically and command line. You can use one the tools depending on your needs. It is important to check the duplicated file in order to save space on your server.