Monday, February 21, 2011

how to remove duplicates from an array without sorting

Hello, i have an array which might contain duplicate objects. I wonder if it's possible to find and remove the duplicates in the array: - without sorting (strict requirement) - without using a temporary secondary array - possibily in O(N), with N the nb of the elements in the array

In my case the array is a Lua array, which contains tables:

t={
{a,1},
 {a,2},
 {b,1},
 {b,3},
 {a,2}
} 

In my case, t[5] is a duplicate of t[2], while t[1] is not.

From stackoverflow
  • Iterate the array, stick every value in a hash, checking if the it exists first. If it does remove from original array (or don't write to the new one). Not very memory efficient, but only 0(n) since you are only iterating the array once.

    Valerio Schiavoni : One of the requirements was "- without using a temporary secondary array". I guess I should have written "- without using a temporary secondary storage", but maybe it's good enough.
  • Can't be done in O(n) but ...

    what you can do is

    • Iterate thru the array
    • For each member search forward for repetitions, remove those.

    Worst case scenario complexity is O(n^2)

  • To summarize, you have the following options:

    • time: O(n^2), no extra memory - for each element in the array look for an equal one linearly
    • time: O(n*log n), no extra memory - sort first, walk over the array linearly after
    • time: O(n), memory: O(n) - use a lookup table (edit: that's probably not an option as tables cannot be keys in other tables as far as I remember)

    Pick one. There's no way to do what you want in O(n) time with no extra memory.

    Judge Maygarden : Tables can be used as keys in other tables.
    Valerio Schiavoni : tables can used as keys but the value used to compare if 2 tables are equals is their memory reference and not the values they store.

0 comments:

Post a Comment