Skip to content

akhileshzmishra/Excel-comparion-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Excel-comparion-tool

We have created excel comparison tool here. This tool compares two tables of different rows and cols and segregates the matched rows from unmatched rows. There are three different views: Difference only view, Same only view and all view

You can also edit the excel sheets and save them as if you are actually working on them.

How to build

There are two projects inside: IMergeProject: Actual project SetupIMerge: Creation of setup files in case you want to redistribute the files.

You should have Visual Studios 2005 Standard edition with MFC installed in your machine. Download the code and find IMergeProject.sln inside IMergeProject folder. Open it and build it. Start without debugging. Please see that your project configuration is in Release mode.

Performance

Let m and n be the number of rows and cols respectively and l be the length of string

Sn Algorithms Used Complexity
1 LCS for comparing rows O(mnn)
2 Boyer Moore for string search. O(l) [ O(l* m *n) for searching entire cell fields (m * n)
Sn Data structure Usage
1 B Tree based storage tree For fast retrieval, deletion and insertion and importantly preservation of sequence as that happens in linked list.
2 Linked list based upon observer pattern Even if linked list is getting manipulated, iterator should not become invalid
3 Trie For finding out closest matched files in the folder
4 Map duals creating map from and to view and database

Detailed Discussion

This tool helps in comparing excel files and merging them. At present it has following capabilites

  1. Comparison on base of LCS for rows and exact matching for cols. That implies that cols have to be orderered good matching. Rows can be unordered. What I mean by ordering:
  2. If rows are ordered that means that rows with unique identity that is same in both files reside at same row number. If cols are ordered that means that cols with same heading reside at same col number.
  3. In case of unordered, rows with unique identity that is same in both files can reside at any location.

As an example:

H1 H2 H3
a11 a12 a13
a21 a22 a23
a31 a32 a33

Table 1

H1 H2 H3
a21 a22 a23
a11 a12 a13
a31 a32 a33

Table 2

The two rows are at different position relative to each other. So if I choose that row 1 of first table is correct relative to row 2 then, row 1 is at wrong position relative to row 2 in second table. This is called row ordering.

Row 3 would be marked as same row as its content and order are same in both tables.

So if rows are not ordererd properly, the tool can find them out and segregate them from rows that have differences

We have tried to give professional look and feel to the application. It has search capabilites presently based upon Boyer Moore string search algorithm.

The tool also has ways of editing/copying/transferring contents from left file to right file.

This tool is build with MFC. I am really sorry about using this old technology. This project started as an intern project in which I acted as a mentor. But with growing demand inside our company, we had to make it more professional by adding various features.

Data structure used for table has been upgraded to modified B tree now:

Sn Description
1. Insertion takes at most log n time.
2. Lookup takes at most log n time
3. Deletion takes at most log n time
4. After any insertion/deletion, the index rearranges itself so that index is consecutive.

As an example There are 10 elements indexed from 1 - to - 10.

Operation Type Data with index. eg. 1 is index and A is data
Initially 1(A) 2(B) 3(C) 4(D) 5(E) 6(F) 7(G) 8(H) 9(I) 10(J)
Insertion at 4 1(A) 2(B) 3(C) 4(Insert) 5(D) 6(E) 7(F) 8(G) 9(H) 10(I) 11(J)
Deletion at 8 1(A) 2(B) 3(C) 4(Insert) 5(D) 6(E) 7(F) 8(H) 9(I) 10(J)

This can be achieved by a list but every other operation would be terribly slow. With any combination of primitive data structure available in stl, it was not possible to have a fast look up, fast insertion/ deletion while still preserving index.

My concept is like this Store every thing in a leaf node. When leaf node is full, bifurcate it and create a new parent element. Add the bifurcated parts to the parent node. Now parent here is not a leaf node but its children are. So there are two types of storage capacity Storage capacity of the leaf(actual data also called Pocket Size in the code) Storage capacity of every other node.(also called Order in the code) When there is any insertion, keep the data in the appropriate storage leaf node. if it is full, bifurcate it and rearrange the whole node tree.

Finally

In coming releases, I would add side bar, tool tip among other things to make this application more professional.

Users are free to use/modify/delete code as their need. In case you are using our code, please mention our names/emails in your application about dialog/main page. We have tried to test it on several test cases. In case you find any bug, please mail it to us at akhileshzmishra@gmail.com along with excel sheets that you were using (if possible).