Add a test to check for duplicate data at initialization in exercise "custom-set"

Kurlgargyey · May 6, 2024, 2:08pm

As per the comment in NMeulemann’s solution, the tests in custom-set currently pass even if CustomSet incorrectly carries over duplicate data from input provided at initialization.

This PR adds a test to rectify that!

Please let me know what you think.

kotp · May 6, 2024, 2:18pm

Duplicate keys in a set probably should not be ignored, but should be more than that, as a set suggests uniqueness.

I only saw the comment that the deduplication mechanism was T not that there was a problem with the exercise, and also saw that they chose to use a built in data structure. Can you provide the quote from the comments in the solution? (It will hopefully inspire others to spend the time to click through on the links and will also allow them to find the relevant specific information more easily as well.)

Did you find something specifically with your mentored exercise that indicates that this should be something done?

Kurlgargyey · May 6, 2024, 2:34pm

I was referring to this section of the commented out alternate implementation, which does not rely on HashSet:

// dedup relies on PartialEq. Tests pass without it, but it should be there
// T: PartialEq + Clone + Ord,
// {
// pub fn new(input: &[T]) → Self {
// let mut vec: Vec = input.into_iter().cloned().collect();
// vec.sort_unstable();
// vec.dedup();
// Self { set: vec }
// }

Sorry for being a bit hectic!

A solution omitting the “vec.dedup()” part of that function will have the resultant CustomSet still contain any duplicate data found in the input slice. This would not be acceptable behavior for a set, and so I think NMeulemann is correct in saying that dedup should be there and by extension the tests should check for it.

kotp · May 6, 2024, 2:37pm

I agree. Thanks for direction.

Because this is an exercise that is in the problem-specifications repository, the problem likely should be directed there as an addition where all tracks can benefit.

Kurlgargyey · May 6, 2024, 2:45pm

Unfortunately, I ran into an issue while testing the change. I can’t figure out how to make sure that the test fails even in the face of the solution implementing a custom Eq for CustomSet. I’m still pretty new to writing tests myself and pretty much limited to assert_eq!(), lol.

edit: Maybe it would be good to require CustomSet to have a way of returning the contained values as a Vec or something?

kotp · May 6, 2024, 2:52pm

Another good reason to request mentoring on the exercise, the mentor likely will be able to direct and guide you to a good test, as well.

I think that @ErikSchierboom could move this to Bugs & Feature Requests - Exercism so that it can be talked about there, if you want. (Not really the specific Rust test, but adding the apparently helpful test to the best effect for all tracks.)

senekor · May 6, 2024, 4:52pm

This test already exists.

Maybe it would be good to require CustomSet to have a way of returning the contained values as a Vec or something

I’m not fond of that. I don’t want to force users to implement an API just so we can test their implementation details.

If there are still duplicates loitering around somewhere in a vec, what does it matter? If the set behaves like a set, it’s a valid set. I guess this could be considered a memory leak, but teaching how to avoid that is not the goal of the exercise so I don’t want to test against it.

That being said, I can still think of a situation that’s not tested:

let set_1 = CustomSet::new(&[1, 1]);
let set_2 = CustomSet::new(&[1]);
let expected = CustomSet::new(&[]);
assert_eq!(set_1.difference(&set_2), expected);

I imagine it’s quite easy to write an implementation that doesn’t deduplicate internally and if the difference is taken, iterate over all elments of the left set and remove them once from the right set, possibly leaving duplicates around.

Kurlgargyey · May 7, 2024, 9:04am

Sorry, I did not see that test, and made some assumptions about what sort of implementations should or should not pass the tests. My bad for creating an unnecessary topic.

I think you bring up a good point with duplicates possibly showing up in incorrect differences.