The developers of the Manjaro Linux distribution, built on the basis of Arch Linux and aimed at beginners, announced the beginning of testing a new service MDD (Manjaro Data Donor), designed to collect statistics about the system and send it to the external server of the project. The author of the MDD intended to enable telemetry by default (opt-out), but the decision has not yet been approved and, judging by the objections of some developers and users, it is likely that telemetry will be offered as an option requiring prior consent of the user (a request to enable telemetry is proposed to be added to the greeting interface after the first download).

The report includes data such as host name, kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information, network device MAC addresses, disk serial numbers, disk partition data, information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.

The sent data is stored on the project server in the ClickHouse database and visualized using the Grafana platform. The IP addresses of users are not stored, and the hash from the /etc/machine-id file is used as the system identifier.

Аccording to the code https://github.com/manjaro/mdd/blob/master/mdd.py#L40 sends everything.

  • 0x0@programming.dev
    link
    fedilink
    arrow-up
    10
    ·
    2 hours ago

    I get the usefulness of technical telemetry such as kernel version, RAM, disk space, processor type, etc… but NIC MAC? HDD serial? WTF?

  • notprogrammer@programming.dev
    link
    fedilink
    arrow-up
    21
    ·
    3 hours ago

    The report includes data such as host name, kernel version, desktop component versions, detailed information about hardware and drivers involved, screen size and resolution information, network device MAC addresses, disk serial numbers, disk partition data, information about the number of running processes and installed packages, versions of basic packages such as systemd, gcc, bash and PipeWire.

    That’s insane

  • SavvyWolf@pawb.social
    link
    fedilink
    English
    arrow-up
    24
    ·
    4 hours ago

    Why do they need information about the hostname? Is it really valuable for them to know how many systems are named daves-pc?

  • MyNameIsRichard@lemmy.ml
    link
    fedilink
    arrow-up
    36
    ·
    edit-2
    5 hours ago

    enable telemetry by default … MAC addresses, disk serial numbers

    Another reason to not use Manjaro. Just use Endeavour instead.

    Edit: I’m not against telemetry pre se. I have the KDE feedback enabled for example but that was opt in and sends no unique data.

    • rtxn@lemmy.world
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      1
      ·
      5 hours ago

      It’s all about trust. Manjaro has given me reasons to distrust them.

      • exu@feditown.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        2 hours ago

        When?

        Edit: I misread, though it said “trust” instead of “distrust”

        • rtxn@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          4 hours ago

          They’ve let TLS certs expire on multiple occasions. They’ve made the decision to enable the AUR in the default installation, which can cause conflicts with out-of-date dependencies because of the delayed release schedule compared to Arch. They’ve shipped software on their stable branch that included unmerged upstream code. One of their developers temporarily broke Asahi Linux.

          I don’t hate the project, but I can’t trust the developers and management.

          • MyNameIsRichard@lemmy.ml
            link
            fedilink
            arrow-up
            4
            ·
            4 hours ago

            They’ve let TLS certs expire on multiple occasions.

            And they told their community to set their clocks back. As a workaround, it will work but all your created and modified data will have the wrong timestamps.

    • sovietknuckles [they/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      5
      ·
      5 hours ago

      Another reason to not use Manjaro. Just use Endeavour instead.

      Endeavour could be useful if it’s your first time running an Arch-based distro and you’re looking for software/configuration suggestions. Otherwise, Arch Linux is fine by itself and it doesn’t have telemetry

      • Handles@leminal.space
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 hours ago

        I don’t think anybody would say otherwise. Both Manjaro and Endeavour mean to make Arch more appealing to users who aren’t comfortable with command line configuration.

        Endeavour has arguably done better than Manjaro, but yeah. They’re just some configs on top of a system that does very well on its own.

      • MyNameIsRichard@lemmy.ml
        link
        fedilink
        arrow-up
        5
        ·
        3 hours ago

        Why?

        Let me put the question back to you. How do think the uniquely identifiable information will help them improve Manjaro?

        Do you think they’ve got a Russian satellite and will track down your HDD serial number from space?

        No.

        There’s lots of benefits to telemetry.

        As I basically said, if you bothered to read my comment.

    • Bezier@suppo.fi
      link
      fedilink
      arrow-up
      18
      ·
      5 hours ago

      Thought it’s probably fine after reading the title, but this shit isn’t fine. What the fuck.

    • Buffalox@lemmy.world
      link
      fedilink
      arrow-up
      1
      arrow-down
      4
      ·
      4 hours ago

      The MAC address is anonymized with sha256, and IP adresses aren’t stored.
      So this seems to me to be perfectly anonymous.

      • gnuhaut@lemmy.ml
        link
        fedilink
        arrow-up
        7
        ·
        2 hours ago

        MAC addresses are 48 bit, and half of that is just the manufacturer. So 24 bits really, and those bits aren’t random, I think manufacturers just assign these based on some scheme, like a serial number. Point is you could easily reverse the SHA by brute force.

        You can’t calculate any useful statistic from a hash so literally the only use this would have is some sort of tracking.

        • Buffalox@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          1 hour ago

          this would have is some sort of tracking.

          It’s right at the top of the announcement, that it’s mainly for more accurate stats on unique users.
          It’s not that I think this is a good idea, because I don’t, but some people are blowing it out of proportions. Especially since this isn’t at all decided. Which I seriously doubt it will.

          • gnuhaut@lemmy.ml
            link
            fedilink
            arrow-up
            4
            ·
            edit-2
            51 minutes ago

            You don’t need this to count unique users. You could just assign a random number on install or whatever. Or even more simply, just run the thing once per month, should be accurate enough. Do they expect the software to just randomly spam duplicate reports? Don’t write it that way.

            Best case they don’t care about collecting minimal data and don’t understand that hashed MACs are easily reversible. So incompetent fools with no sensitivity to privacy.

            Maybe this should be Manjaro’s tagline: Not purposely malicious, just grossly negligent and ignorant.

            • Buffalox@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              11 minutes ago

              You could just assign a random number on install or whatever.

              Funny, I thought the exact same thing.

      • GolfNovemberUniform@lemmy.ml
        link
        fedilink
        arrow-up
        14
        ·
        4 hours ago

        Why collect such data though? And you can call some Big Tech telemetry completely anonymous too if you trust their explanations.

        • Buffalox@lemmy.world
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          4 hours ago

          You can see the code of what is send.
          I’m not aware that Google claims they collect data anonymously, on everything where you are logged in.
          So that’s a false equivalence.

          • GolfNovemberUniform@lemmy.ml
            link
            fedilink
            arrow-up
            1
            ·
            4 hours ago

            I’m not aware that Google claims they collect data anonymously, on everything where you are logged in.

            I meant other companies but ok.

  • Destide@feddit.uk
    link
    fedilink
    English
    arrow-up
    16
    ·
    5 hours ago

    It amazes me it’s still as popular as it is and still own goaling at least once a year.

  • Buffalox@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    edit-2
    4 hours ago

    This may be illegal in EU if they don’t use opt in. Even then it may be illegal for under 18 year olds to collect MAC addresses and disk serial numbers, as those can potentially be used for identification.

    The data is anonymized, and the IP is NOT stored. So I’m not sure this violates GDPR?

    From the code we can see the machine ID is anonymized, sending only a SHA256 checksum.

    def get_hashed_device_id():
        # Read the machine ID
        with open("/etc/machine-id", "r") as f:
            machine_id = f.read().strip()
    
        # Hash the machine ID using SHA-256 to anonymize it
        hashed_id = hashlib.sha256(machine_id.encode()).digest()
    
        # Convert the first 16 bytes of the hash to a UUID (version 5 UUID format)
        return str(uuid.UUID(bytes=hashed_id[:16], version=5))
    
    

    This makes it somewhat a nothingburger IMO.

    • gnuhaut@lemmy.ml
      link
      fedilink
      arrow-up
      4
      ·
      2 hours ago

      That’s not anonymous, that’s pseudonymous.

      What is the point of this? The machine-id already looks to be some unique random number, so you’re calculating another unique random-looking number from that, might as well use the original number.

      You can’t glean any useful information from a unique random-looking number that would help with developing Manjaro. You can’t calculate any statistics from that. The only use is tracking.

  • thingsiplay@beehaw.org
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    4 hours ago
    • users can be identified
    • probably Opt-out (still in discussion)

    Two nogos combined makes nonogogos. Why do they need host name, MAC address and disk serial numbers? Why can’t people set how much they want to send in, like KDE Plasma does? Will the data be shown to the user before its send in? Steam does that perfectly (show data and its opt-in) and that is even a proprietary application. Telemetry is okay if its done right, without user identification, opt-in and not hiding whats sent, preferably in multiple levels of what is being send.

    I used Manjaro before and switched to EndeavorOS because I was not happy. Now I am. Manjaro can’t stop being stupid (not the users, I’m not attacking any user here, only the maintainers or developers of Manjaro).

    • r00ty@kbin.life
      link
      fedilink
      arrow-up
      3
      ·
      3 hours ago

      The way I read it, the developer wanted opt-out but it’s likely it will be opt-in. I’m find with opt-in and vehemently against opt-out for telemetry.

      I would prefer the information was statistical only. Rather than hostname (making the assumption they only want hostname to be able to somehow separate the data to follow changes over time), a much better idea would be some kind of hash based on information unlikely to change, but enough information that it would be unlikely possible to brute-force the original data out of the hash. So all they know is, this data came from the same machine, but cannot ID the machine. Maybe some kind of unique but otherwise untrackable unique ID is created at install time and ONLY used for this purpose and no other.

  • auzy@lemmy.world
    link
    fedilink
    arrow-up
    3
    arrow-down
    10
    ·
    4 hours ago

    Don’t like it, don’t opt in

    Even Debian has popcon

    There are lots of benefits for developers to gather telemetry.

    Don’t like that? Fork and do your own distro (presumably though you don’t contribute anything to open source, so id expect such people to simply whine and get angry at contributors)

    • gnuhaut@lemmy.ml
      link
      fedilink
      arrow-up
      3
      ·
      1 hour ago

      Debian popcon is opt-in, first of all.

      https://popcon.debian.org/FAQ

      Q) What information is reported by popularity-contest ?

      A) popularity-contest reports the system vendor [1], the system architecture you use, the version of popularity-contest you use and the list of packages installed on your system. For each package, popularity-contest looks at the most recently used (based on atime) files, and reports the filename, its last access time (atime) and last change time (ctime). However, some files are not considered, because they have unreliable atime. For privacy reasons, the times are truncated to multiple of twelve hours.

      [1] i.e. the dpkg Vendor field, see dpkg-vendor(1).

      So no fucking MAC addresses and machine-ids and harddrive serial numbers and stuff.

      They only want package statistics, the point being to have statistics about the popularity of packages, mainly so they can be prioritized for the CD/DVD isos. You know, information that actually has a use, not hardware identifiers that can only be used for tracking purposes.

    • r00ty@kbin.life
      link
      fedilink
      arrow-up
      4
      ·
      3 hours ago

      Yeah, my only concern here was if it was opt-out. That’d be bad.

      Now I completely understand the developer on this. This is useful info to have to help decide future changes/features and general direction, but balancing the right to privacy means this kind of data provision should ALWAYS be opt-in. Microsoft, you hearing me here?

  • ShittyBeatlesFCPres@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    ·
    5 hours ago

    Why do they need half that data for a derivative of a distro? Fuck off. I don’t care if someone collects the model number of my GPU or whatever but that sounds like personally identifiable tracking data, not basic “telemetry” data to set development priorities or whatever.